[BioPython] I don't understand why SeqRecord.feature is a list

Peter biopython at maubp.freeserve.co.uk
Thu Jun 28 15:11:28 UTC 2007


Giovanni Marco Dall'Olio wrote:
> Hi!
> In principle, when I can't decide which keys to use for a dictionary,
> I just take simple numerical integers as keys, and it works quite
> well.
> It simplifies testing/debugging/organization a lot and I can decide
> the meaning of every key later (so it's better for dictionaries which
> have to contain very heterogeneous data).

It sounds like you don't need/want a dictionary at all.  If you are 
assigning increasing numerical integers "keys", then why not just use 
the list of features directly?

e.g. assuming record is a SeqRecord object:

first_feature = record.features[0]
second_feature = record.features[1]
third_feature = record.features[2]
etc

> I'm not sure I have understood the example you gave me on
> http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features
> , but it seems to work in a way similar to what I was saying before:
> it saves all the features in a list (or is it a dictionary?) and
> access them later by their positions.

That example stored integers (indices in the features list) in a 
dictionary using either the Locus tag, GI numbers or GeneID (e.g. keys 
like "NEQ010", "GI:41614806" or "GeneID:2654552").

The point being if you know in advance you want to find individual 
feature on the basis of their locus tag (for example), rather than the 
order in the file, then I would map the locus tag strings to positions 
in the list.

e.g.

locus_tag_cds_index = \ index_genbank_features(gb_record,"CDS","locus_tag")
my_feature = gb_record.features[locus_tag_index["NEQ010"]]

You could also build a dictionary which maps from the locus tag directly 
to the associated SeqFeature objects themselves.

> Not to be silly but... how do you represent a gene with its
> transcripts/exons/introns structure with biopython? With SeqRecord and
> SeqFeature objects?

If you loaded a GenBank or EMBL file using SeqIO you get one SeqRecord 
object (assuming there is only one LOCUS line in the file) which 
contains a list of SeqFeature objects which in turn may contain 
sub-features.

I work with bacteria so I don't have much experience with dealing with 
sub-features in a SeqFeature object.

Peter




More information about the Biopython mailing list