[Biopython] Lineage from GenBank files Question

Peter biopython at maubp.freeserve.co.uk
Sat Oct 23 14:43:31 UTC 2010


On Fri, Oct 22, 2010 at 7:22 PM, Ara Kooser <akooser at unm.edu> wrote:
> Hello all,
>
>  I've been working on a code to parse information from BLAST .xml files and
> GenBank files. I am interested in adding the taxonomy lineage information to
> the code.
>

There are two approaches here, firstly the (limited) lineage in the GenBank
flat files themselves, and secondly using the taxon ID or accession online
with the NCBI Entrez API to get the full lineage.

Taking an example,

LOCUS       NC_000932             154478 bp    DNA     circular PLN 15-APR-2009
DEFINITION  Arabidopsis thaliana chloroplast, complete genome.
ACCESSION   NC_000932
VERSION     NC_000932.1  GI:7525012
DBLINK      Project:116
KEYWORDS    .
SOURCE      chloroplast Arabidopsis thaliana (thale cress)
  ORGANISM  Arabidopsis thaliana
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
            rosids; eurosids II; Brassicales; Brassicaceae; Arabidopsis.
REFERENCE   1  (bases 1 to 154478)
...

The lineage is in the header, the lines following the SOURCE
and ORGANISM lines. This all gets recorded in the SeqRecord
annotations dictionary:

>>> from Bio import SeqIO
>>> record = SeqIO.read("", "genbank")
>>> record.annotations["source"]
'chloroplast Arabidopsis thaliana (thale cress)'
>>> record.annotations["organism"]
'Arabidopsis thaliana'
>>> record.annotations["taxonomy"]
['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta',
'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons',
'core eudicotyledons', 'rosids', 'eurosids II', 'Brassicales',
'Brassicaceae', 'Arabidopsis']

There is also some relevant information in any source feature (usually
there is one and only one, and this will be the first feature), such as the
taxon ID.

>
> I do have a second question. Once I have a chunk of code running
> and made pretty what is the best way to submit it so it can be posted
> up in the Cookbook section.
>

It is a wiki, just make sure you include [[Category:Cookbook]] and
it will appear here: http://biopython.org/wiki/Category:Cookbook

Peter




More information about the Biopython mailing list