[Bioperl-l] taxonomy ID

Sendu Bala bix at sendu.me.uk
Wed Apr 1 12:00:59 UTC 2009


Smithies, Russell wrote:
> The taxonomy information isn't in the blast output unless you created
> custom fasta headers for your blast database. The easiest way to get
> the tax_id for your accessions would be to download the gi->tax_id
> list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz. 
> If you load that file into a hash, parse the accessions out of the
> blast hits then lookup the tax_id from that hash, I think it should
> be fairly fast.
> 
> Checking which are prokaryotes and which are eukaryotes based on
> tax_id is a separate problem  :-) If you grab the taxdump.tar.gz file
> from the same site, the nodes.dmp file contained within lists what
> division each tax_id belongs to (Bacteria, Invertebrates, Mammals,
> Phages, Plants, etc) so you can probably work it out from that.

Check out the synopsis for Bio::Taxon
http://doc.bioperl.org/bioperl-live/Bio/Taxon.html

If the division() function doesn't tell you what you need, you could use
get_lineage_nodes() and check the oldest ancestors to see if its a pro
or euk.



More information about the Bioperl-l mailing list