[Biopython-dev] Bio.SeqIO

Michiel de Hoon mdehoon at c2b2.columbia.edu
Tue Mar 6 22:04:52 EST 2007


Peter wrote:
> I was thinking tonight, after updating CVS, that perhaps we should try 
> and find some shorter (lower case) names for "SequencesToDict" and 
> "SequencesToAlignment"... something like "toDict" and "toAlignment", or 
> "as_dict" and "as_alignment" might looks nicer.  e.g.
> 
> from Bio import SeqIO
> my_dict = SeqIO.toDict(SeqIO.parse(handle, format))
...
There may be a simple solution to this.

Note that a dictionary can be created by specifying a list of [key, 
value] pairs:

 >>> dict([['a','A'],['b','B'],['c','C']])
{'a': 'A', 'c': 'C', 'b': 'B'}

This also works with an iterator:
 >>> def f(text):
         for character in text:
             yield [character, character.upper()]
 >>> dict(f("abcd"))
{'a': 'A', 'c': 'C', 'b': 'B', 'd': 'D'}

Now, if we let SeqRecord inherit from list, we can make it behave as a 
[record.id, record] list. Normally, this would not be visible to the 
user, in the sense that a user who doesn't know that SeqRecord inherits 
from list wouldn't notice that it does.

The upshot is that we can now create a dictionary like this:
 >>> d = dict(SeqIO.parse(handle, format))
without any changes to Bio.SeqIO.

Two things get lost here:
1) We can't have a key_function to change how to choose the key.
2) We're no longer checking if all keys are different. This can be fixed 
by saving the keys in the parser function and raising an exception if 
two identical keys are found. This implies though that the same 
exception is raised in all use cases of SeqIO.parse, which may not be 
what we want.

--Michiel


More information about the Biopython-dev mailing list