[Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz

Jason Stajich jason at bioperl.org
Wed Aug 25 00:33:45 EDT 2010


hi - please keep questions on list.


I think one of your problem is your first use of $gi2taxidfile is wrong. 
when you call tie you want to specify an dbfile you want to store the 
index in.
So call it "/tmp/gi2taxid.idx" or something like that.

In my code here 
http://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS
you will see on line 97 we construct the name of the index file to be 
the folder, plus 'idx', plus the name gi2taxid which will be the name of 
index file.

Also it would be safer for the split to be whitespace matching and that 
you want the the two first columns from the file.  Doing this would 
eliminate the need for the chomp on the line above.

  my ($gi, $taxid) = split(/\s+/, $_);

instead of

  chomp;
  my ($gi, $taxid) = split(" ", $_,2);

There may be other problems but these should be fixed first -- and 
please send queries to the mailing list rather than to me directly so 
that others can answer questions.

-jason
Amali Thrimawithana wrote, On 8/24/10 8:13 PM:
> Dear Jason
>
> Thank you very much for the information. I manage to get the information on
> different taxonomic  levels with the help of one of your example code
> "local_taxonomydb_query". However I am having trouble with creating a local
> index file of the gi_taxid_nucl.dmp so that I am able to get the taxonomic
> id given the GI number of NCBI. At the moment I am using the tie() function
> with DB_file and then storing the detail into a hash. However when I try to
> retrieve a taxonomic ID given the GI number, it is not returning any thing
> but an error. Below is part of the code (borrowed from the example code
> classify kingdom), can you please let me know where I am going wrong?
> ...
> my $dbh2 = tie(%taxid4gi, 'DB_File', $gi2taxidfile);
>
> if( ! $done ) {
>      my $fh;
>     open(GI2TAXID, "$gi2taxidfile") or die $!; #here passing the unzipped
> gi_taxid_nucl.dmp
>     my$i=0;
>      while (<GI2TAXID>) {
>        chomp;
>         my ($gi, $taxid) = split(" ", $_, 2);
>         $taxid4gi{$gi} = $taxid
>         if exists $taxid4gi{$gi};
>         $i++;
>       unless( $DEBUG&&  $i % 100000  ) {
>          warn "$i\n";
>      }
>      }
>      $dbh2->sync;
> }
> my $gi2='183397240';
> my $taxd2=$taxid4gi{$gi2};
>   print $taxd2, " \n";
>
> Any help would be much appreciated
>
> Thanking you
> Amali
>
> On 23 August 2010 06:29, Jason Stajich<jason at bioperl.org>  wrote:
>
>    
>> Hi Amali -
>>
>> This is how I'd print out the full classification by using the Tree methods
>> (with probably a different way of initializing the $db object to your
>> flatfiles location).
>>
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::Taxonomy;
>>
>> my $db= Bio::DB::Taxonomy->new(-source =>  'flatfile',
>>                    -nodesfile =>  'taxonomy/nodes.dmp',
>>                    -namesfile =>  'taxonomy/names.dmp');
>>
>> my $taxonid = $db->get_taxonid('Homo sapiens');
>> my $taxon = $db->get_taxon(-taxonid =>  $taxonid);
>> my $tree = Bio::Tree::Tree->new(-node =>  $taxon);
>> my @taxa = $tree->get_nodes;
>> print join(",", map { $_->scientific_name } @taxa), "\n";
>>
>> -jason
>>
>> Amali Thrimawithana wrote, On 8/18/10 3:56 PM:
>>
>>   Dear Dr Stajich,
>>      
>>> I am a Masters student at Auckland university and my research is on
>>> identifying yeast species present in wine by the use of 454 sequencing. In
>>> order to carry out this research, a pipeline is being built in which at
>>> the
>>> final step each representative OTU need to be classified at different
>>> taxonomic levels (ie: at Phylum, family, class, genus and species) by
>>> using
>>> the results from BLAST. To identify the sequences at each taxonomic level,
>>> I
>>> have been trying out the Bio::DB::Taxonomy module in bioperl. Using this
>>> module, I am able to get the genus and species level by splitting the
>>> scientific name returned by the Bio::taxon object. But unfortunately I am
>>> uncertain on how to get the information for the other levels of the rank.
>>> I
>>> have tried several commands including "my @class =
>>> $node->classification;",
>>> but it does not work. Hence, could you please let me know how I might be
>>> able to get the higher levels of taxonomy such as class and phylum using
>>> bioperl?
>>>
>>> Look forward to hearing from you soon
>>>
>>> Thanking You
>>>
>>> Amali
>>>
>>>
>>>        


More information about the Bioperl-l mailing list