[Bioperl-l] Bio::*Taxonomy* changes
Chris Fields
cjfields at uiuc.edu
Mon Jul 24 15:24:23 EDT 2006
> Hilmar Lapp wrote:
> > Sounds good to me, except there is no Bio::TaxonomyI yet,
>
> Indeed, I propose making one.
So, Node would implement this, correct? Naming it Bio::TaxonomyI makes me
think that Bio::Taxonomy implements TaxonomyI, not that Bio::Taxonomy::Node
implements it.
...
> Yes, which is why Bio::Taxonomy is appropriate here. Assuming that
> Bio::Species isa Bio::TaxonomyI:
>
> ...
> SOURCE Saccharomyces cerevisiae (baker's yeast)
> ORGANISM Saccharomyces cerevisiae
> Eukaryota; Fungi; Ascomycota; Saccharomycotina;
> Saccharomycetes;
> Saccharomycetales; Saccharomycetaceae; Saccharomyces.
>
> ...
>
> ## the fully-manual way
> my $species = new Bio::Species;
> my $node = new Bio::Taxonomy::Node(-name => 'Saccharomyces cerevisiae',
> -rank => 'species', -object_id => 1,
> -parent_id => 2);
> my $n2 = new Bio::Taxonomy::Node(-name => 'Saccharomyces',
> -object_id => 2, -parent_id => 3);
> # (no assumption that 'Saccharomyces' is the genus, so rank() undefined)
> my $n3 = [etc]
> $species->add_node($node);
> $species->add_node($n2);
> [etc]
Hrmm... why would you add multiple nodes to a species object? A Species
is-a Node, not a full Bio::Taxonomy. Taxonomy has-a Node (hence the
add_node() method). So, you should be able to add a NodeI-implementing
object to a Taxonomy object (either a Node or a Species).
Not sure I agree with what you propose here; doesn't seem right...
...
> We also solve Chris' earlier quandary:
>
> [ in a world where Bio::Taxonomy::Node and Bio::Taxonomy::SpeciesNode
> exist, and given that Bio::DB::Taxonomy* currently directly make Node
> objects ]
> > The only problem I can foresee is which class to use with
> > Bio::DB::Taxonomy*? I guess one could settle on one class by default
> and
> > have the option to use another Bio::Taxonomy::NodeI-implementing class
> if
> > you wanted more data/methods available...
>
> The way to do it is to have the Bio::DB::Taxonomy* modules return only
> the information that a Bio::Taxonomy::FactoryI would need to make a
> NodeI. The specific Factory that you use could generate whatever type of
> Node you wanted.
Yes, using an object factory here makes a lot of sense, returning the
correct object type based on the rank.
...
> Bio::Species differs from Bio::Taxonomy only so it contains all the
> legacy methods names that Bio::Species currently has, for backward
> compatibility. Setting $species->classification() would delete all nodes
> of self, use a GenbankFactory to make a new Bio::Species, then pull out
> all its Nodes and add them to self.
The idea is to replace Bio::Species with something that works well, so
having it implement a Node-like interface works since it is-a Node. Having
it implement a Taxonomy-like interface, though, doesn't make a lot of sense
as a species is-not-a Taxonomy. It should act just like a fancier node
object.
Using a factory in Bio::DB::Taxonomy should solve any issues about what
object type is returned, since that could simply be made based on the rank
itself (species rank or below == Bio::Taxonomy::Species, genus and above ==
Bio::Taxonomy::Node).
> Unless anyone can think of a better way of doing things, I'll explore
> the above ideas and start writing code. To summarise: major changes to
> Bio::DB::Taxonomy* (make them factory slaves), implementation of some
> Bio::Taxonomy::FactoryIs, tweak Bio::Taxonomy::FactoryI and make
> Bio::TaxonomyI, make Bio::Species a Bio::TaxonomyI.
Nope. Don't agree. Sorry. I can't see why you would force a Species to be
a Taxonomy when it isn't. The object hierarchy doesn't make sense to me.
I would just have a simple interface for Node (NodeI), and either convert
Bio::Species to an abstract interface or place its methods in
Bio::Taxonomy::Species/SpeciesNode.
I like the interface idea as Bio::Taxonomy::Node is-a NodeI only, while
Bio::Taxonomy::Species is-a NodeI and SpeciesI; these checks can be run
using the UNIVERSAL object method 'isa' when using a Factory.
I'll repeat: a Node and a Species is-not-a Taxonomy. A Taxonomy object
has-a Node or Species or combinations thereof ; all would be
NodeI-implementing. That's the reason that add_node() is there, which could
be modified to allow only objects that isa->('Bio::Taxonomy::NodeI') (i.e. a
Node or a Species).
> Oh, Bio::Taxonomy might need some changes as well. It has a classify()
> method does something with a Bio::Species, which would be all wrong in
> the new way of doing things.
We'll have to make eventual changes to anything referencing Bio::Species to
get them to work correctly. Getting the object hierarchy finalized and
worked out is priority one. Getting Bio::SeqIO modules switched over to
Bio::Taxonomy::Species (pretty commonly used) and making sure that
Bio::DB::Taxonomy returns the correct objects from the factory is a close
second. Any small issues that pop up along the way can be taken care of
when they reveal themselves.
Chris
More information about the Bioperl-l
mailing list