[Bioperl-l] Bio::*Taxonomy* changes

Sendu Bala bix at sendu.me.uk
Mon Jul 24 15:45:09 EDT 2006


Chris Fields wrote:
>> Hilmar Lapp wrote:
>>> Sounds good to me, except there is no Bio::TaxonomyI yet,
>> Indeed, I propose making one.
> 
> So, Node would implement this, correct?  Naming it Bio::TaxonomyI makes me
> think that Bio::Taxonomy implements TaxonomyI, not that Bio::Taxonomy::Node
> implements it.  

No no, I guess the whole rest of you reply was confused by this one 
point. Bio::TaxonomyI would be the interface for Bio::Taxonomy. 
Definitely not a Node.


>> Yes, which is why Bio::Taxonomy is appropriate here. Assuming that
>> Bio::Species isa Bio::TaxonomyI:
>>
>> ...
>> SOURCE      Saccharomyces cerevisiae (baker's yeast)
>>     ORGANISM  Saccharomyces cerevisiae
>>               Eukaryota; Fungi; Ascomycota; Saccharomycotina;
>>               Saccharomycetes;
>>               Saccharomycetales; Saccharomycetaceae; Saccharomyces.
>>
>> ...
>>
>> ## the fully-manual way
>> my $species = new Bio::Species;
>> my $node = new Bio::Taxonomy::Node(-name => 'Saccharomyces cerevisiae',
>>                                     -rank => 'species', -object_id => 1,
>>                                     -parent_id => 2);
>> my $n2 = new Bio::Taxonomy::Node(-name => 'Saccharomyces',
>>                                   -object_id => 2, -parent_id => 3);
>> # (no assumption that 'Saccharomyces' is the genus, so rank() undefined)
>> my $n3 = [etc]
>> $species->add_node($node);
>> $species->add_node($n2);
>> [etc]
> 
> 
> Hrmm... why would you add multiple nodes to a species object?  A Species
> is-a Node, not a full Bio::Taxonomy.

In my proposal, a Bio::Species certainly is a full Bio::Taxonomy.


>> Bio::Species differs from Bio::Taxonomy only so it contains all the
>> legacy methods names that Bio::Species currently has, for backward
>> compatibility. Setting $species->classification() would delete all nodes
>> of self, use a GenbankFactory to make a new Bio::Species, then pull out
>> all its Nodes and add them to self.
> 
> The idea is to replace Bio::Species with something that works well, so
> having it implement a Node-like interface works since it is-a Node.  Having
> it implement a Taxonomy-like interface, though, doesn't make a lot of sense
> as a species is-not-a Taxonomy.

Right. So this is why we've been 'butting heads'. Up till now I had no 
idea why you were so adamant about keeping things the old 
Bio::Taxonomy::Node way.

Bio::Species very definitely has never been, nor do we want it to 
become, a single node of a taxonomy. It has always been a complete 
taxonomy. You can tell that by the fact it has a classification, and you 
could ask what its genus is.

This is why I'm proposing that Bio::Species become a Bio::Taxonomy. 
Because that's the correct object model for the kinds of things 
Bio::Species wants to do.


> Using a factory in Bio::DB::Taxonomy should solve any issues about what
> object type is returned, since that could simply be made based on the rank
> itself (species rank or below == Bio::Taxonomy::Species, genus and above ==
> Bio::Taxonomy::Node).

Frankly, that idea makes me ill. A Node, at the fundamental level, is 
just a very simple object that needs to associated a taxonomic rank with 
  a scientific name. If you start making different objects for different 
ranks, you've departed from any semblance of meaning in the object model.


> Nope.  Don't agree.  Sorry.  I can't see why you would force a Species to be
> a Taxonomy when it isn't.  The object hierarchy doesn't make sense to me.

Does it make sense now?


> I'll repeat:  a Node and a Species is-not-a Taxonomy.

I'll repeat: A Node is a Node and a Bio::Species is a Taxonomy ;)


> A Taxonomy object has-a Node or Species or combinations thereof ;

No, a Taxonomy contains Nodes. One of those Nodes might have a rank() of 
  'species'.
A Bio::Species contains Nodes. One of those Nodes definitely has a 
rank() of 'species'. It /must/ have other nodes, because the job of 
Bio::Species has in the past and will in the future be to store all the 
other taxonomic levels in a Genbank file. For the same reason 
Bio::Species can't be a Node itself, because you can't store other Nodes 
inside a Node.



More information about the Bioperl-l mailing list