[BioPython] Bug in GenBank module - record.feature method ?

Mon Dec 26 12:05:20 EST 2005

Srinivas Iyyer wrote:
> Hi group, I have been working to parse out the GO annotations from
> FEATURE section of GenBank record.
...
> feature = record.features[2]
> golist = feature.qualifiers[1].value """extracts '/Note' part
 > go_list =  golist.split(';') #split by ';' to get GO secs."
...
> # feature = record.features[2] ### this gives the CDS part.
> record.features[2]. I see that this order is not always true. For
> many sequences in FEATURES section,  'gene' is always followed by 
> 'CDS'. However in some new RefSeq sequences, 'variation' sub-section
> is incorporated now.  this is the trouble, I guess.
...
> 1. Is there any more technical way to parse '/note' sub-section in
> CDS section of FEATURES.  Do you think what I am doing
> (record.features[2]) is more novice and not technical/correct. Please
> let me know what is the best process.

You are doing this:

feature = record.features[2]

It depends on the record you want being the third one in the file (zero
based counting: 0, 1, 2).  You might be better off doing something like:

for feature in record.features :
     if feature.type=="CDS" :
         #Do stuff...	

Also, once you have found the feature(s) you are interested in, the
qualifiers property is a python dictionary.  You should be able to
access the /note entry from the GenBank feature record by:

notes = feature.qualifiers['note']

This will be a list - for some things (like db_xref) there can be 
several different entries for a single feature.  For others, like the 
translation, there should be only one.  I'm note sure what happens with 
notes.

You could try something like:

go_list = []
for note in feature.qualifiers['note'] :
	go_list.extend(note.split(';'))

Peter