[BioPython] taxonomic tree

dr goettel biopythonlist at gmail.com
Thu Oct 9 08:52:42 UTC 2008


On Wed, Oct 8, 2008 at 6:38 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Wed, Oct 8, 2008 at 5:23 PM, dr goettel <biopythonlist at gmail.com>
> wrote:
> > Hello, I'm new in this list and in BioPython.
>
> Hello :)
>
> > I would like to create a NCBI-like taxonomic tree and then fill it with
> the
> > organisms that I have in a file. Is there an easy way to do this? I
> started
> > using biopython's function at 7.11.4 (finding the lineage of an organism)
> in
> > the tutorial, ...
>
> For anyone reading this later on, note that the tutorial section
> numbers tend to change with each release of Biopython.  This section
> just uses Bio.Entrez to fetch taxonomy information for a particular
> NCBI taxon id.
>
> > but I need to do this tens of thousands times so it spends too
> > much time querying NCBI database.
>
> Also calling Bio.Entrez 10000 times might annoy the NCBI ;)
>
> > Therefore I built a taxonomic database
> > locally and implemented something similar to 7.11.4 tutorial's function
> so I
> > get, for every sequence, the lineage in the same way:
> >
> > 'cellular organisms; Eukaryota; Viridiplantae; Streptophyta;
> Streptophytina;
> >  Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta;
> >  Liliopsida; Asparagales; Orchidaceae'
>
> I assume you used the NCBI provided taxdump files to populate the
> database?  See ftp://ftp.ncbi.nih.gov/pub/taxonomy/
>

Yes I did.


>
> Personally rather than designing my own database just for this (and
> writing a parser for the taxonomy files), I would have suggested
> installing BioSQL, and using the BioSQL script load_ncbi_taxonomy.pl
> to download and import the data for you.  This is a simple perl script
> - you don't need BioPerl.  See http://www.biopython.org/wiki/BioSQL
> for details.
>

I also used the load_ncbi_taxonomy.pl script. It worked great!


>
> > Now I need to create a tree, or fill an already created one. And then
> search
> > it by some criteria.
>
> What kind of tree do you mean?  Are you talking about creating a
> Newick tree, or an in memory structure?  Perhaps the Bio.Nexus
> module's tree functionality would help.
>

Thankyou very much. I still don't know if I want Newick tree or the other
one. I'll take a look on Bio.Nexus module


>
> If you are interested, the BioSQL tables record the taxonomy tree
> using two methods, each node has a parent node allowing you to walk up
> the lineage.  There are also left/right values allowing selection of
> all child nodes efficiently via an SQL select statement.
>
> Peter
>

This is what I was trying to do, from the name of the organism (the leaf of
the tree) and getting every node using the parent_node field of the taxon
table, until reaching the root node. Once I have all the steps to the root
node then I have to create/filling the tree with my data in order to
examinate the number of organisms integrating certain
class/order/family/genus... etc
Any ideas will be very apreciated.

Thankyou very much for your answer and I'll take a look on Bio.Nexus module.

drG



More information about the Biopython mailing list