[Biopython] processing genbank file
Peter Cock
p.j.a.cock at googlemail.com
Thu Jun 16 11:52:02 UTC 2011
On Thu, Jun 16, 2011 at 12:43 PM, Sheila the angel
<from.d.putto at gmail.com> wrote:
> Hi to all,
> >From a genbank file I want to extract certain information. Here is my code
>
> #---------------------------------------------------------------------------------------------------------
> from Bio import SeqIO
> handle = open('NP_954888.1.gb', "rU")
> for gb_record in SeqIO.parse(handle, 'gb'):
If you've only got one record in the file, you can get rid of one loop:
gb_record = SeqIO.read('NP_954888.1.gb', 'gb')
Since there will in generally be many features in a GenBank file,
you do need this loop to look at each potential gene:
> for gb_feature in gb_record.features:
> if gb_feature.type == 'CDS':
> gene=gb_feature.qualifiers['gene'][0]
> db_xref=gb_feature.qualifiers['db_xref']
Note in the above not all CDS features will have a gene or db_xref
qualifier - you may get a KeyError exception with some files.
> print gene, db_xref
>
> print gb_record.annotations['organism']
>
> #====================================================
>
> Is there any simple way to print information like gene name, GeneID etc. or
> I have to use this loop method :( for an example to print organism name I
> need to do only gb_record.annotations['organism'] while to print 'gene' id I
> need the for loop !!!!
You will need some loops in general: One single GenBank file can hold
multiple records, each of which can hold multiple features, each of which
can have multiple names and database cross-references.
> Another problem is the db_xref=gb_feature.qualifiers['db_xref'] gives me
> all /db_xref entries in CDS field while I want only /db_xref="GeneID:309165"
> (or only the GeneID)...how to do that
>
> Thanks in Advance
Since you can get multiple /db_xref (or other qualifiers), when the parser
was designed a list was used for the values. You could filter on what the
entries start with, e.g. db_xref.startswith("GeneID:")
Peter
More information about the Biopython
mailing list