[Bioperl-l] taxonomy ID

Florent Angly florent.angly at gmail.com
Wed Apr 1 13:03:28 EDT 2009


FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you 
won't be able to put its information in a hash (unless you have a lot of 
memory).
Florent

Smithies, Russell wrote:
> The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database.
> The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz.
> If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that hash, I think it should be fairly fast. 
>
> Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem  :-)
> If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can probably work it out from that.
>
> It's not a very BioPerly solution but sometimes just looking up the answer from a file/table/hash is the simplest way. 
>
> Hope this helps,
>
> Russell Smithies 
>
> Bioinformatics Applications Developer 
> T +64 3 489 9085 
> E  russell.smithies at agresearch.co.nz 
>
> Invermay  Research Centre 
> Puddle Alley, 
> Mosgiel, 
> New Zealand 
> T  +64 3 489 3809   
> F  +64 3 489 9174  
> www.agresearch.co.nz 
>
>
>
>
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>> Sent: Wednesday, 1 April 2009 7:43 a.m.
>> To: bioperl-l
>> Subject: [Bioperl-l] taxonomy ID
>>
>> Hi All,
>>           I am writing a script, for one of its part i have to parse a blast
>> report (refseq blast) and check how may organisms are eukaryotes and how
>> namy of them are prokaryotes.
>> I am using BIO::DB::taxinomy module:
>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>>
>> But for this i need a taxonomyid (like '33090') given in the example.
>> So is it possible to get a taxonomyid from refseq balst report?
>> If not then how i can deal with this problem?
>>
>> i would really appreciate if anyone can help me out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   




More information about the Bioperl-l mailing list