[BioPython] taxonomic tree

Peter biopython at maubp.freeserve.co.uk
Wed Oct 8 16:38:31 UTC 2008


On Wed, Oct 8, 2008 at 5:23 PM, dr goettel <biopythonlist at gmail.com> wrote:
> Hello, I'm new in this list and in BioPython.

Hello :)

> I would like to create a NCBI-like taxonomic tree and then fill it with the
> organisms that I have in a file. Is there an easy way to do this? I started
> using biopython's function at 7.11.4 (finding the lineage of an organism) in
> the tutorial, ...

For anyone reading this later on, note that the tutorial section
numbers tend to change with each release of Biopython.  This section
just uses Bio.Entrez to fetch taxonomy information for a particular
NCBI taxon id.

> but I need to do this tens of thousands times so it spends too
> much time querying NCBI database.

Also calling Bio.Entrez 10000 times might annoy the NCBI ;)

> Therefore I built a taxonomic database
> locally and implemented something similar to 7.11.4 tutorial's function so I
> get, for every sequence, the lineage in the same way:
>
> 'cellular organisms; Eukaryota; Viridiplantae; Streptophyta; Streptophytina;
>  Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta;
>  Liliopsida; Asparagales; Orchidaceae'

I assume you used the NCBI provided taxdump files to populate the
database?  See ftp://ftp.ncbi.nih.gov/pub/taxonomy/

Personally rather than designing my own database just for this (and
writing a parser for the taxonomy files), I would have suggested
installing BioSQL, and using the BioSQL script load_ncbi_taxonomy.pl
to download and import the data for you.  This is a simple perl script
- you don't need BioPerl.  See http://www.biopython.org/wiki/BioSQL
for details.

> Now I need to create a tree, or fill an already created one. And then search
> it by some criteria.

What kind of tree do you mean?  Are you talking about creating a
Newick tree, or an in memory structure?  Perhaps the Bio.Nexus
module's tree functionality would help.

If you are interested, the BioSQL tables record the taxonomy tree
using two methods, each node has a parent node allowing you to walk up
the lineage.  There are also left/right values allowing selection of
all child nodes efficiently via an SQL select statement.

Peter



More information about the Biopython mailing list