[Bioperl-l] Bio::*Taxonomy* changes

Sendu Bala bix at sendu.me.uk
Mon Jul 24 09:49:42 EDT 2006


Hilmar Lapp wrote:
> Sounds good to me, except there is no Bio::TaxonomyI yet,

Indeed, I propose making one.


> Bio::Species shouldn't fully depend on an internet connection or flat 
> file to do anything meaningful.
> 
> I.e., it should take advantage of a lookup database if there is one, but 
> in the absence of that one should also be able to statically set 
> attribute values to whatever one thinks can be gleaned from a parsed 
> text or whatever.

Yes, which is why Bio::Taxonomy is appropriate here. Assuming that 
Bio::Species isa Bio::TaxonomyI:

...
SOURCE      Saccharomyces cerevisiae (baker's yeast)
    ORGANISM  Saccharomyces cerevisiae
              Eukaryota; Fungi; Ascomycota; Saccharomycotina;
              Saccharomycetes;
              Saccharomycetales; Saccharomycetaceae; Saccharomyces.

...

## the fully-manual way
my $species = new Bio::Species;
my $node = new Bio::Taxonomy::Node(-name => 'Saccharomyces cerevisiae',
                                    -rank => 'species', -object_id => 1,
                                    -parent_id => 2);
my $n2 = new Bio::Taxonomy::Node(-name => 'Saccharomyces',
                                  -object_id => 2, -parent_id => 3);
# (no assumption that 'Saccharomyces' is the genus, so rank() undefined)
my $n3 = [etc]
$species->add_node($node);
$species->add_node($n2);
[etc]

## Using a factory without db access
# assume that Bio::Taxonomy::GenbankFactory implements
# some modified Bio::Taxonomy::FactoryI
my $factory = Bio::Taxonomy::GenbankFactory->new();
my $species = $factory->generate(-classification => ['Saccharomyces
             cerevisiae', 'Saccharomyces', 'Saccharomycetaceae' ...]);
# the generate() method above just does the fully-manual way for you

## Using a factory with db access
# assume that Bio::Taxonomy::EntrezFactory implements some
# modified Bio::Taxonomy::FactoryI and uses Bio::DB::Taxonomy::entrez
# to get the nodes
my $factory = Bio::Taxonomy::EntrezFactory->new();
my $species = $factory->fetch(-scientifc_name => 'Saccharomyces
                                                   cerevisiae');

# (would probably want to come up with a more generic name for the
#  fetch() and generate() methods, so that all Factories use the same
#  same method name)


It's very clean and flexible this way. Ultimately you always make your 
Bio::Species the same way - you add nodes to it. You can make those 
nodes yourself or use a factory.

We also solve Chris' earlier quandary:

[ in a world where Bio::Taxonomy::Node and Bio::Taxonomy::SpeciesNode 
exist, and given that Bio::DB::Taxonomy* currently directly make Node 
objects ]
> The only problem I can foresee is which class to use with
> Bio::DB::Taxonomy*?  I guess one could settle on one class by default and
> have the option to use another Bio::Taxonomy::NodeI-implementing class if
> you wanted more data/methods available...

The way to do it is to have the Bio::DB::Taxonomy* modules return only 
the information that a Bio::Taxonomy::FactoryI would need to make a 
NodeI. The specific Factory that you use could generate whatever type of 
Node you wanted.

But actually I propose there is only one Node and the specific Factory 
that you use determines the kind of Bio::TaxonomyI made; GenbankFactory 
might make a Bio::Species, while EntrezFactory might make a Bio::Taxonomy.

Bio::Species differs from Bio::Taxonomy only so it contains all the 
legacy methods names that Bio::Species currently has, for backward 
compatibility. Setting $species->classification() would delete all nodes 
of self, use a GenbankFactory to make a new Bio::Species, then pull out 
all its Nodes and add them to self.


Unless anyone can think of a better way of doing things, I'll explore 
the above ideas and start writing code. To summarise: major changes to 
Bio::DB::Taxonomy* (make them factory slaves), implementation of some 
Bio::Taxonomy::FactoryIs, tweak Bio::Taxonomy::FactoryI and make 
Bio::TaxonomyI, make Bio::Species a Bio::TaxonomyI.

Oh, Bio::Taxonomy might need some changes as well. It has a classify() 
method does something with a Bio::Species, which would be all wrong in 
the new way of doing things.


More information about the Bioperl-l mailing list