[Bioperl-l] taxonomy ID

Wed Apr 1 19:33:35 UTC 2009

There's always more than one way to do it.
I have no trouble loading it into a hash but you could just grep the file:

my(undef,$tax_id) = split("\s", `grep -w -P "^$accession" gi_taxid_prot.dmp`);

--Russell

> -----Original Message-----
> From: Florent Angly [mailto:florent.angly at gmail.com]
> Sent: Thursday, 2 April 2009 6:03 a.m.
> To: Smithies, Russell
> Cc: 'shalabh sharma'; 'bioperl-l'
> Subject: Re: [Bioperl-l] taxonomy ID
> 
> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you
> won't be able to put its information in a hash (unless you have a lot of
> memory).
> Florent
> 
> Smithies, Russell wrote:
> > The taxonomy information isn't in the blast output unless you created custom
> fasta headers for your blast database.
> > The easiest way to get the tax_id for your accessions would be to download
> the gi->tax_id list from
> ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz.
> > If you load that file into a hash, parse the accessions out of the blast
> hits then lookup the tax_id from that hash, I think it should be fairly fast.
> >
> > Checking which are prokaryotes and which are eukaryotes based on tax_id is a
> separate problem  :-)
> > If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file
> contained within lists what division each tax_id belongs to (Bacteria,
> Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out
> from that.
> >
> > It's not a very BioPerly solution but sometimes just looking up the answer
> from a file/table/hash is the simplest way.
> >
> > Hope this helps,
> >
> > Russell Smithies
> >
> > Bioinformatics Applications Developer
> > T +64 3 489 9085
> > E  russell.smithies at agresearch.co.nz
> >
> > Invermay  Research Centre
> > Puddle Alley,
> > Mosgiel,
> > New Zealand
> > T  +64 3 489 3809
> > F  +64 3 489 9174
> > www.agresearch.co.nz
> >
> >
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> >> Sent: Wednesday, 1 April 2009 7:43 a.m.
> >> To: bioperl-l
> >> Subject: [Bioperl-l] taxonomy ID
> >>
> >> Hi All,
> >>           I am writing a script, for one of its part i have to parse a
> blast
> >> report (refseq blast) and check how may organisms are eukaryotes and how
> >> namy of them are prokaryotes.
> >> I am using BIO::DB::taxinomy module:
> >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> >>
> >> But for this i need a taxonomyid (like '33090') given in the example.
> >> So is it possible to get a taxonomyid from refseq balst report?
> >> If not then how i can deal with this problem?
> >>
> >> i would really appreciate if anyone can help me out.
> >>
> >> Thanks
> >> Shalabh
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >