[BioPython] I don't understand why SeqRecord.feature is a list

Thu Jun 28 13:45:28 UTC 2007

Hi!
In principle, when I can't decide which keys to use for a dictionary,
I just take simple numerical integers as keys, and it works quite
well.
It simplifies testing/debugging/organization a lot and I can decide
the meaning of every key later (so it's better for dictionaries which
have to contain very heterogeneous data).

I'm not sure I have understood the example you gave me on
http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features
, but it seems to work in a way similar to what I was saying before:
it saves all the features in a list (or is it a dictionary?) and
access them later by their positions.

Not to be silly but... how do you represent a gene with its
transcripts/exons/introns structure with biopython? With SeqRecord and
SeqFeature objects?
I still don't get it :(

Cheers!

2007/6/12, Peter <biopython at maubp.freeserve.co.uk>:
> Marc Colosimo wrote:
> > Additionally, for many formats you can have multiple features with
> > the same name; e.g., CDS, gene, etc... in GenBank Records.
>
> Indeed - and as the SeqRecord/SeqFeature is most heavily used by the
> GenBank parser, that does explain things well.
>
> The problem with using a dictionary is what to index on - you can't
> simply use the location string for example, as there usually entries for
> genes and CDS features with the same location.
>
> You can't depend on any other information like an identifier or name to
> be present in a GenBank file for all feature types.
>
> In general, the choice of index will depend on what you want to use it
> for - so the flippant answer is just index it yourself, for example like
> this:
>
> http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features
>
> > The same  rational doesn't fully apply to why the feature qualifiers
> > are dictionaries of lists.
>
> No it doesn't. The rational seems to have been that feature qualifiers
> in GenBank files can occur with no values (e.g. /pseudo and others), a
> single value (e.g. translation) or multiple values (by repeated keys,
> e.g. database cross references).  So using a list is a simple solution
> to cover all these cases - even if most entries only have a single
> entry.  (There are some old posts on the mailing list archive discussing
> this.)
>
> Peter
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com