[Biopython-dev] Bio.SeqIO
Michiel de Hoon
mdehoon at c2b2.columbia.edu
Tue Mar 6 22:04:52 EST 2007
Peter wrote:
> I was thinking tonight, after updating CVS, that perhaps we should try
> and find some shorter (lower case) names for "SequencesToDict" and
> "SequencesToAlignment"... something like "toDict" and "toAlignment", or
> "as_dict" and "as_alignment" might looks nicer. e.g.
>
> from Bio import SeqIO
> my_dict = SeqIO.toDict(SeqIO.parse(handle, format))
...
There may be a simple solution to this.
Note that a dictionary can be created by specifying a list of [key,
value] pairs:
>>> dict([['a','A'],['b','B'],['c','C']])
{'a': 'A', 'c': 'C', 'b': 'B'}
This also works with an iterator:
>>> def f(text):
for character in text:
yield [character, character.upper()]
>>> dict(f("abcd"))
{'a': 'A', 'c': 'C', 'b': 'B', 'd': 'D'}
Now, if we let SeqRecord inherit from list, we can make it behave as a
[record.id, record] list. Normally, this would not be visible to the
user, in the sense that a user who doesn't know that SeqRecord inherits
from list wouldn't notice that it does.
The upshot is that we can now create a dictionary like this:
>>> d = dict(SeqIO.parse(handle, format))
without any changes to Bio.SeqIO.
Two things get lost here:
1) We can't have a key_function to change how to choose the key.
2) We're no longer checking if all keys are different. This can be fixed
by saving the keys in the parser function and raising an exception if
two identical keys are found. This implies though that the same
exception is raised in all use cases of SeqIO.parse, which may not be
what we want.
--Michiel
More information about the Biopython-dev
mailing list