[Bioperl-l] Bio::*Taxonomy* changes

Hilmar Lapp hlapp at gmx.net
Mon Jul 17 17:53:17 UTC 2006


Sound good to me.

BTW NCBI guarantees (well, promises) that there will only be one node  
name of class 'scientific'.

	-hilmar

On Jul 17, 2006, at 12:31 PM, Sendu Bala wrote:

> I see strange node names via Bio::DB::Taxonomy::flatfile:
>
> use Bio::DB::Taxonomy;
>
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -directory =>
> $taxonomy_dir, -nodesfile => $taxonomy_dir.'nodes.dmp', -namesfile =>
> $taxonomy_dir.'names.dmp');
>
> my $tax_id = 89593;
> my $node = $db->get_Taxonomy_Node($tax_id);
>
> print "node $tax_id has name '", @{$node->name('common')}, "' and rank
> '", $node->rank, "'\n";
>
> Results in:
> node 89593 has name 'Craniata <chordata>' and rank 'subphylum'
>
> Other examples:
> node 2 has name 'Bacteria <bacteria>' and rank 'superkingdom'
> node 1386 has name 'Bacillus <bacterium>' and rank 'genus'
> node 7776 has name 'Gnathostomata <vertebrate>' and rank 'superclass'
> etc.
>
> For me the bits in <> are inappropriate and shouldn't be there. The  
> NCBI
> website agrees, and you won't see these things if you use -source =>
> 'entrez'. Should they be removed by the flatfile parser as a matter of
> course, with no warnings or option? Or do people want them? Typically
> they are just the name of the parent node, so I don't see why anyone
> would /need/ them, and I argue it's invalid for parent node  
> information
> to be duplicated here.
>
> If there are no objections I'll strip the <> bits. I also plan to make
> $node->name('scientific', 'sapiens'); set and get the node name, and
> have flatfile and entrez store all common names with
> $obj->name('common', 'human', 'man');. As these changes will make the
> implementation match the docs I don't see any problems, except that
> flatfile users will now find the node name in a different place
> (@{$node->name('scientific')} instead of @{$node->name('common')}).
>
> I'll also fix the problem with node names for ranks species and lower,
> as discussed in thread 'Bio::DB::Taxonomy:: mishandles species,
> subspecies/variant names', in the way I suggested there.
>
> If anyone can see a problem with any of these changes, let me know  
> asap.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list