[BioPython] Bug in GenBank module - record.feature method ?
Peter
biopython at maubp.freeserve.co.uk
Mon Dec 26 12:05:20 EST 2005
Srinivas Iyyer wrote:
> Hi group, I have been working to parse out the GO annotations from
> FEATURE section of GenBank record.
...
> feature = record.features[2]
> golist = feature.qualifiers[1].value """extracts '/Note' part
> go_list = golist.split(';') #split by ';' to get GO secs."
...
> # feature = record.features[2] ### this gives the CDS part.
> record.features[2]. I see that this order is not always true. For
> many sequences in FEATURES section, 'gene' is always followed by
> 'CDS'. However in some new RefSeq sequences, 'variation' sub-section
> is incorporated now. this is the trouble, I guess.
...
> 1. Is there any more technical way to parse '/note' sub-section in
> CDS section of FEATURES. Do you think what I am doing
> (record.features[2]) is more novice and not technical/correct. Please
> let me know what is the best process.
You are doing this:
feature = record.features[2]
It depends on the record you want being the third one in the file (zero
based counting: 0, 1, 2). You might be better off doing something like:
for feature in record.features :
if feature.type=="CDS" :
#Do stuff...
Also, once you have found the feature(s) you are interested in, the
qualifiers property is a python dictionary. You should be able to
access the /note entry from the GenBank feature record by:
notes = feature.qualifiers['note']
This will be a list - for some things (like db_xref) there can be
several different entries for a single feature. For others, like the
translation, there should be only one. I'm note sure what happens with
notes.
You could try something like:
go_list = []
for note in feature.qualifiers['note'] :
go_list.extend(note.split(';'))
Peter
More information about the BioPython
mailing list