[Bioperl-l] Bio::Taxonomy changes

Mon Jul 17 14:31:08 EDT 2006

I agree.  Would be nice to get this to play well with weird bacterial names!

I plan on doing some behind-the-scenes work on Bio::DB::Taxonomy::entrez at
some point soon to test out Bio::DB::EUtilities as the user agent; it
currently uses Bio::Root::HTTPget, I think.  Reason I'm doing this is to
quickly get tax info based on any primary ID, primarily for grabbing related
Tax information from the sequence GI w/o parsing the sequence for the TaxID;
this uses NCBI's ELink which I've now implemented.

I'll make sure everything passes tests before I commit.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, July 17, 2006 12:53 PM
> To: Sendu Bala
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::*Taxonomy* changes
> 
> Sound good to me.
> 
> BTW NCBI guarantees (well, promises) that there will only be one node
> name of class 'scientific'.
> 
> 	-hilmar
> 
> On Jul 17, 2006, at 12:31 PM, Sendu Bala wrote:
> 
> > I see strange node names via Bio::DB::Taxonomy::flatfile:
> >
> > use Bio::DB::Taxonomy;
> >
> > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -directory =>
> > $taxonomy_dir, -nodesfile => $taxonomy_dir.'nodes.dmp', -namesfile =>
> > $taxonomy_dir.'names.dmp');
> >
> > my $tax_id = 89593;
> > my $node = $db->get_Taxonomy_Node($tax_id);
> >
> > print "node $tax_id has name '", @{$node->name('common')}, "' and rank
> > '", $node->rank, "'\n";
> >
> > Results in:
> > node 89593 has name 'Craniata <chordata>' and rank 'subphylum'
> >
> > Other examples:
> > node 2 has name 'Bacteria <bacteria>' and rank 'superkingdom'
> > node 1386 has name 'Bacillus <bacterium>' and rank 'genus'
> > node 7776 has name 'Gnathostomata <vertebrate>' and rank 'superclass'
> > etc.
> >
> > For me the bits in <> are inappropriate and shouldn't be there. The
> > NCBI
> > website agrees, and you won't see these things if you use -source =>
> > 'entrez'. Should they be removed by the flatfile parser as a matter of
> > course, with no warnings or option? Or do people want them? Typically
> > they are just the name of the parent node, so I don't see why anyone
> > would /need/ them, and I argue it's invalid for parent node
> > information
> > to be duplicated here.
> >
> > If there are no objections I'll strip the <> bits. I also plan to make
> > $node->name('scientific', 'sapiens'); set and get the node name, and
> > have flatfile and entrez store all common names with
> > $obj->name('common', 'human', 'man');. As these changes will make the
> > implementation match the docs I don't see any problems, except that
> > flatfile users will now find the node name in a different place
> > (@{$node->name('scientific')} instead of @{$node->name('common')}).
> >
> > I'll also fix the problem with node names for ranks species and lower,
> > as discussed in thread 'Bio::DB::Taxonomy:: mishandles species,
> > subspecies/variant names', in the way I suggested there.
> >
> > If anyone can see a problem with any of these changes, let me know
> > asap.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l