[Bioperl-l] Bio::*Taxonomy* changes

Chris Fields cjfields at uiuc.edu
Mon Jul 24 19:24:23 UTC 2006

> Hilmar Lapp wrote:
> > Sounds good to me, except there is no Bio::TaxonomyI yet,
> Indeed, I propose making one.

So, Node would implement this, correct?  Naming it Bio::TaxonomyI makes me
think that Bio::Taxonomy implements TaxonomyI, not that Bio::Taxonomy::Node
implements it.  


> Yes, which is why Bio::Taxonomy is appropriate here. Assuming that
> Bio::Species isa Bio::TaxonomyI:
> ...
> SOURCE      Saccharomyces cerevisiae (baker's yeast)
>     ORGANISM  Saccharomyces cerevisiae
>               Eukaryota; Fungi; Ascomycota; Saccharomycotina;
>               Saccharomycetes;
>               Saccharomycetales; Saccharomycetaceae; Saccharomyces.
> ...
> ## the fully-manual way
> my $species = new Bio::Species;
> my $node = new Bio::Taxonomy::Node(-name => 'Saccharomyces cerevisiae',
>                                     -rank => 'species', -object_id => 1,
>                                     -parent_id => 2);
> my $n2 = new Bio::Taxonomy::Node(-name => 'Saccharomyces',
>                                   -object_id => 2, -parent_id => 3);
> # (no assumption that 'Saccharomyces' is the genus, so rank() undefined)
> my $n3 = [etc]
> $species->add_node($node);
> $species->add_node($n2);
> [etc]

Hrmm... why would you add multiple nodes to a species object?  A Species
is-a Node, not a full Bio::Taxonomy.  Taxonomy has-a Node (hence the
add_node() method).  So, you should be able to add a NodeI-implementing
object to a Taxonomy object (either a Node or a Species).   

Not sure I agree with what you propose here; doesn't seem right...


> We also solve Chris' earlier quandary:
> [ in a world where Bio::Taxonomy::Node and Bio::Taxonomy::SpeciesNode
> exist, and given that Bio::DB::Taxonomy* currently directly make Node
> objects ]
> > The only problem I can foresee is which class to use with
> > Bio::DB::Taxonomy*?  I guess one could settle on one class by default
> and
> > have the option to use another Bio::Taxonomy::NodeI-implementing class
> if
> > you wanted more data/methods available...
> The way to do it is to have the Bio::DB::Taxonomy* modules return only
> the information that a Bio::Taxonomy::FactoryI would need to make a
> NodeI. The specific Factory that you use could generate whatever type of
> Node you wanted.

Yes, using an object factory here makes a lot of sense, returning the
correct object type based on the rank.  

> Bio::Species differs from Bio::Taxonomy only so it contains all the
> legacy methods names that Bio::Species currently has, for backward
> compatibility. Setting $species->classification() would delete all nodes
> of self, use a GenbankFactory to make a new Bio::Species, then pull out
> all its Nodes and add them to self.

The idea is to replace Bio::Species with something that works well, so
having it implement a Node-like interface works since it is-a Node.  Having
it implement a Taxonomy-like interface, though, doesn't make a lot of sense
as a species is-not-a Taxonomy.  It should act just like a fancier node

Using a factory in Bio::DB::Taxonomy should solve any issues about what
object type is returned, since that could simply be made based on the rank
itself (species rank or below == Bio::Taxonomy::Species, genus and above ==

> Unless anyone can think of a better way of doing things, I'll explore
> the above ideas and start writing code. To summarise: major changes to
> Bio::DB::Taxonomy* (make them factory slaves), implementation of some
> Bio::Taxonomy::FactoryIs, tweak Bio::Taxonomy::FactoryI and make
> Bio::TaxonomyI, make Bio::Species a Bio::TaxonomyI.

Nope.  Don't agree.  Sorry.  I can't see why you would force a Species to be
a Taxonomy when it isn't.  The object hierarchy doesn't make sense to me.

I would just have a simple interface for Node (NodeI), and either convert
Bio::Species to an abstract interface or place its methods in

I like the interface idea as Bio::Taxonomy::Node is-a NodeI only, while
Bio::Taxonomy::Species is-a NodeI and SpeciesI; these checks can be run
using the UNIVERSAL object method 'isa' when using a Factory.  

I'll repeat:  a Node and a Species is-not-a Taxonomy.  A Taxonomy object
has-a Node or Species or combinations thereof ; all would be
NodeI-implementing.  That's the reason that add_node() is there, which could
be modified to allow only objects that isa->('Bio::Taxonomy::NodeI') (i.e. a
Node or a Species).

> Oh, Bio::Taxonomy might need some changes as well. It has a classify()
> method does something with a Bio::Species, which would be all wrong in
> the new way of doing things.

We'll have to make eventual changes to anything referencing Bio::Species to
get them to work correctly.  Getting the object hierarchy finalized and
worked out is priority one.  Getting Bio::SeqIO modules switched over to
Bio::Taxonomy::Species (pretty commonly used) and making sure that
Bio::DB::Taxonomy returns the correct objects from the factory is a close
second.  Any small issues that pop up along the way can be taken care of
when they reveal themselves.


More information about the Bioperl-l mailing list