[Bioperl-l] Bio::*Taxonomy* changes
    Sendu Bala 
    bix at sendu.me.uk
       
    Mon Jul 17 12:31:37 EDT 2006
    
    
  
I see strange node names via Bio::DB::Taxonomy::flatfile:
use Bio::DB::Taxonomy;
my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -directory => 
$taxonomy_dir, -nodesfile => $taxonomy_dir.'nodes.dmp', -namesfile => 
$taxonomy_dir.'names.dmp');
my $tax_id = 89593;
my $node = $db->get_Taxonomy_Node($tax_id);
print "node $tax_id has name '", @{$node->name('common')}, "' and rank 
'", $node->rank, "'\n";
Results in:
node 89593 has name 'Craniata <chordata>' and rank 'subphylum'
Other examples:
node 2 has name 'Bacteria <bacteria>' and rank 'superkingdom'
node 1386 has name 'Bacillus <bacterium>' and rank 'genus'
node 7776 has name 'Gnathostomata <vertebrate>' and rank 'superclass'
etc.
For me the bits in <> are inappropriate and shouldn't be there. The NCBI 
website agrees, and you won't see these things if you use -source => 
'entrez'. Should they be removed by the flatfile parser as a matter of 
course, with no warnings or option? Or do people want them? Typically 
they are just the name of the parent node, so I don't see why anyone 
would /need/ them, and I argue it's invalid for parent node information 
to be duplicated here.
If there are no objections I'll strip the <> bits. I also plan to make 
$node->name('scientific', 'sapiens'); set and get the node name, and 
have flatfile and entrez store all common names with 
$obj->name('common', 'human', 'man');. As these changes will make the 
implementation match the docs I don't see any problems, except that 
flatfile users will now find the node name in a different place 
(@{$node->name('scientific')} instead of @{$node->name('common')}).
I'll also fix the problem with node names for ranks species and lower, 
as discussed in thread 'Bio::DB::Taxonomy:: mishandles species, 
subspecies/variant names', in the way I suggested there.
If anyone can see a problem with any of these changes, let me know asap.
    
    
More information about the Bioperl-l
mailing list