[Biopython-dev] Bio.SeqIO
Peter
biopython-dev at maubp.freeserve.co.uk
Wed Mar 7 05:43:36 EST 2007
Michiel de Hoon wrote:
> Note that a dictionary can be created by specifying a list of [key,
> value] pairs:
>
> >>> dict([['a','A'],['b','B'],['c','C']])
> {'a': 'A', 'c': 'C', 'b': 'B'}
>
> This also works with an iterator:
> >>> def f(text):
> for character in text:
> yield [character, character.upper()]
> >>> dict(f("abcd"))
> {'a': 'A', 'c': 'C', 'b': 'B', 'd': 'D'}
>
> Now, if we let SeqRecord inherit from list, we can make it behave as a
> [record.id, record] list. Normally, this would not be visible to the
> user, in the sense that a user who doesn't know that SeqRecord inherits
> from list wouldn't notice that it does.
>
> The upshot is that we can now create a dictionary like this:
> >>> d = dict(SeqIO.parse(handle, format))
> without any changes to Bio.SeqIO.
That is clever...
> Two things get lost here:
> 1) We can't have a key_function to change how to choose the key.
> 2) We're no longer checking if all keys are different. This can be fixed
> by saving the keys in the parser function and raising an exception if
> two identical keys are found. This implies though that the same
> exception is raised in all use cases of SeqIO.parse, which may not be
> what we want.
Sadly not ideal. Also, wouldn't this prevent us making a SeqRecord
inherit from Seq (another interesting idea you proposed in the past)?
And for Seq objects, they could behave a little more like a string, or a
list of letters.
It might be nice to be able to splice a SeqRecord and get a new
SeqRecord with the appropriate sub-sequence... I have been thinking
about a "RichSeqRecord" subclass of SeqRecord which would support
sequence level annotation (e.g. secondary structure). In this situation,
when requesting a sub record, the appropriate sub set of the secondary
structure information should also be extracted.
e.g. The pfam/stockholm alignment format can hold strings the same
length as the sequences which contain "per sequence per character"
information like secondary structure.
We could also load a PDB file in this way, and provide a list of residue
objects (including the atom coordinates) in parallel with the sequence.
Peter
More information about the Biopython-dev
mailing list