[Bioperl-l] taxonomy ID
Chris Fields
cjfields at illinois.edu
Fri Apr 10 13:32:00 UTC 2009
I don't know if this has been pointed out, but Bio::DB::Taxonomy is
also capable of indexing and using the NCBI tax flat files.
use Bio::DB::Taxonomy;
my $db = Bio::DB::Taxonomy->new(-source => 'flatfile'
-nodesfile => $nodesfile,
-namesfile => $namefile);
# use other Bio::DB::Taxonomy methods
chris
On Apr 1, 2009, at 4:56 PM, Miguel Pignatelli wrote:
> You may find the attached Perl module useful. It solves the
> difficult parts of getting the taxonomy given a GI identifier or a
> taxID. It is designed to be able to process a high number of GIs
> very fast and with low memory usage.
>
> An example of usage would be:
>
> use taxbuild;
> #Build the taxonomyDB
> my $taxDB = taxbuild‐>new(
> nodes =>
> $nodes_file_from_taxonomyDB,
> names =>
> $names_file_from_taxonomyDB,
> dict => $dictFile,
> save_mem => 1
> );
>
> # Get the taxonomy given a GI identifier
> my @tax = $taxDB‐>get_taxonomy_from_gi("35961124");
>
> # Get the taxonomy term of a GI identifier at a given level
> my $term_at_level = $taxDB‐
> >get_term_at_level_from_gi("35961124","family");
>
> # Get the taxid of a GI identifier
> my $taxid = $taxDB‐>get_taxid("35961124");
>
> # Get the taxonomy given a taxid
> my @tax = $taxDB‐>get_taxonomy($taxid);
>
> # Get the taxonomy at a given level given a taxid
> my $taxid_at_level = $taxDB‐>get_term_at_level($taxid,"genus");
>
> # Get the level of a given taxonomical name
> my $level = $taxDB‐>get_level_from_name("Proteobacteria");
>
> The "dict file" is a processed version of the gi_taxid file from
> taxonomyDB. You can get this file by running the tax2bin2.pl script
> also attached:
>
> $ perl tax2bin2.pl gi_taxid_prot.dmp > gi_taxid_prot.bin
> or, if you are working with genes instead of proteins:
> $ perl tax2bin2.pl gi_taxid_nucl.dmp > gi_taxid_nucl.bin
>
> A possible solution to the original post using this module would be
> something like:
>
> # Initialize the taxonomyDB once.
> my $taxDB = taxbuild‐>new(
> nodes =>
> $nodes_file_from_taxonomyDB,
> names =>
> $names_file_from_taxonomyDB,
> dict => $dictFile,
> save_mem => 1
> );
>
> #For each blast result
> #Extract the GI
> my $superkingdom = $taxDB-
> >get_term_at_level_from_gi($gi,"superkingdom");
> if ($superkingdom eq "Bacteria") {
> # Do whatever you want
> } elsif ($superkingdom eq "Eukaryota")
> # Do whatever you want
> }
>
>
> The module has been tested mainly in Linux systems, but should run
> without problems in Windows and Mac too. If you encounter any
> problem with it don't hesitate to contact me.
>
> Hope this helps,
>
> M;
>
> <tax2bin2.pl><taxbuild.pm>
>
>
>
> El 01/04/2009, a las 19:03, Florent Angly escribió:
>
>> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that
>> you won't be able to put its information in a hash (unless you have
>> a lot of memory).
>> Florent
>>
>> Smithies, Russell wrote:
>>> The taxonomy information isn't in the blast output unless you
>>> created custom fasta headers for your blast database.
>>> The easiest way to get the tax_id for your accessions would be to
>>> download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz
>>> .
>>> If you load that file into a hash, parse the accessions out of the
>>> blast hits then lookup the tax_id from that hash, I think it
>>> should be fairly fast.
>>> Checking which are prokaryotes and which are eukaryotes based on
>>> tax_id is a separate problem :-)
>>> If you grab the taxdump.tar.gz file from the same site, the
>>> nodes.dmp file contained within lists what division each tax_id
>>> belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc)
>>> so you can probably work it out from that.
>>>
>>> It's not a very BioPerly solution but sometimes just looking up
>>> the answer from a file/table/hash is the simplest way.
>>> Hope this helps,
>>>
>>> Russell Smithies
>>> Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz
>>> Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T
>>> +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz
>>>
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
>>>> Sent: Wednesday, 1 April 2009 7:43 a.m.
>>>> To: bioperl-l
>>>> Subject: [Bioperl-l] taxonomy ID
>>>>
>>>> Hi All,
>>>> I am writing a script, for one of its part i have to
>>>> parse a blast
>>>> report (refseq blast) and check how may organisms are eukaryotes
>>>> and how
>>>> namy of them are prokaryotes.
>>>> I am using BIO::DB::taxinomy module:
>>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>>>>
>>>> But for this i need a taxonomyid (like '33090') given in the
>>>> example.
>>>> So is it possible to get a taxonomyid from refseq balst report?
>>>> If not then how i can deal with this problem?
>>>>
>>>> i would really appreciate if anyone can help me out.
>>>>
>>>> Thanks
>>>> Shalabh
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> =
>>> =
>>> =
>>> ====================================================================
>>> Attention: The information contained in this message and/or
>>> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or
>>> privileged
>>> material. Any review, retransmission, dissemination or other use
>>> of, or
>>> taking of any action in reliance upon, this information by persons
>>> or
>>> entities other than the intended recipients is prohibited by
>>> AgResearch
>>> Limited. If you have received this message in error, please notify
>>> the
>>> sender immediately.
>>> =
>>> =
>>> =
>>> ====================================================================
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list