[Biopython] processing genbank file
Peter Cock
p.j.a.cock at googlemail.com
Thu Jun 16 13:24:02 UTC 2011
On Thu, Jun 16, 2011 at 1:28 PM, Sheila the angel
<from.d.putto at gmail.com> wrote:
> So if I have only one file which contains only 1 record (say
> 'NP_954888.1.gb' ) and I want to extract information like
> 'gene' name I can't do it in one line e.g.
> #------------------------------------------------------------------------------
> gene=gb_record.features['CDS'].qualifiers['gene'][0] #or
> something similar to this will not work
Supposing there was a neat built in way to filter the features by type,
in general there would still be multiple CDS features - often 1000s,
so you'd need to choose from them.
> #-----------------------------------------------------------------------------
> But I have to use loop as
> #-----------------------------------------------------------------------------
> gb_record = SeqIO.read('NP_954888.1.gb', 'gb')
> for gb_feature in gb_record.features:
> if gb_feature.type == 'CDS':
> gene=gb_feature.qualifiers['gene'][0]
> print gene
> #-----------------------------------------------------------------------------
> ?????
I've checked your example NP_954888 and it is actually a GenPept
file (a protein GenBank file), and it does have just one CDS feature.
Do you prefer this syntax?
gb_record = SeqIO.read('NP_954888.1.gb', 'gb')
cds_features = [f for f in gb_record.features if f.type=="CDS"]
assert len(cds_features)==1
print cds_features[0].qualifiers['gene'][0]
Peter
More information about the Biopython
mailing list