[Bioperl-l] Bio::*Taxonomy* changes
Sendu Bala
bix at sendu.me.uk
Mon Jul 24 09:49:42 EDT 2006
Hilmar Lapp wrote:
> Sounds good to me, except there is no Bio::TaxonomyI yet,
Indeed, I propose making one.
> Bio::Species shouldn't fully depend on an internet connection or flat
> file to do anything meaningful.
>
> I.e., it should take advantage of a lookup database if there is one, but
> in the absence of that one should also be able to statically set
> attribute values to whatever one thinks can be gleaned from a parsed
> text or whatever.
Yes, which is why Bio::Taxonomy is appropriate here. Assuming that
Bio::Species isa Bio::TaxonomyI:
...
SOURCE Saccharomyces cerevisiae (baker's yeast)
ORGANISM Saccharomyces cerevisiae
Eukaryota; Fungi; Ascomycota; Saccharomycotina;
Saccharomycetes;
Saccharomycetales; Saccharomycetaceae; Saccharomyces.
...
## the fully-manual way
my $species = new Bio::Species;
my $node = new Bio::Taxonomy::Node(-name => 'Saccharomyces cerevisiae',
-rank => 'species', -object_id => 1,
-parent_id => 2);
my $n2 = new Bio::Taxonomy::Node(-name => 'Saccharomyces',
-object_id => 2, -parent_id => 3);
# (no assumption that 'Saccharomyces' is the genus, so rank() undefined)
my $n3 = [etc]
$species->add_node($node);
$species->add_node($n2);
[etc]
## Using a factory without db access
# assume that Bio::Taxonomy::GenbankFactory implements
# some modified Bio::Taxonomy::FactoryI
my $factory = Bio::Taxonomy::GenbankFactory->new();
my $species = $factory->generate(-classification => ['Saccharomyces
cerevisiae', 'Saccharomyces', 'Saccharomycetaceae' ...]);
# the generate() method above just does the fully-manual way for you
## Using a factory with db access
# assume that Bio::Taxonomy::EntrezFactory implements some
# modified Bio::Taxonomy::FactoryI and uses Bio::DB::Taxonomy::entrez
# to get the nodes
my $factory = Bio::Taxonomy::EntrezFactory->new();
my $species = $factory->fetch(-scientifc_name => 'Saccharomyces
cerevisiae');
# (would probably want to come up with a more generic name for the
# fetch() and generate() methods, so that all Factories use the same
# same method name)
It's very clean and flexible this way. Ultimately you always make your
Bio::Species the same way - you add nodes to it. You can make those
nodes yourself or use a factory.
We also solve Chris' earlier quandary:
[ in a world where Bio::Taxonomy::Node and Bio::Taxonomy::SpeciesNode
exist, and given that Bio::DB::Taxonomy* currently directly make Node
objects ]
> The only problem I can foresee is which class to use with
> Bio::DB::Taxonomy*? I guess one could settle on one class by default and
> have the option to use another Bio::Taxonomy::NodeI-implementing class if
> you wanted more data/methods available...
The way to do it is to have the Bio::DB::Taxonomy* modules return only
the information that a Bio::Taxonomy::FactoryI would need to make a
NodeI. The specific Factory that you use could generate whatever type of
Node you wanted.
But actually I propose there is only one Node and the specific Factory
that you use determines the kind of Bio::TaxonomyI made; GenbankFactory
might make a Bio::Species, while EntrezFactory might make a Bio::Taxonomy.
Bio::Species differs from Bio::Taxonomy only so it contains all the
legacy methods names that Bio::Species currently has, for backward
compatibility. Setting $species->classification() would delete all nodes
of self, use a GenbankFactory to make a new Bio::Species, then pull out
all its Nodes and add them to self.
Unless anyone can think of a better way of doing things, I'll explore
the above ideas and start writing code. To summarise: major changes to
Bio::DB::Taxonomy* (make them factory slaves), implementation of some
Bio::Taxonomy::FactoryIs, tweak Bio::Taxonomy::FactoryI and make
Bio::TaxonomyI, make Bio::Species a Bio::TaxonomyI.
Oh, Bio::Taxonomy might need some changes as well. It has a classify()
method does something with a Bio::Species, which would be all wrong in
the new way of doing things.
More information about the Bioperl-l
mailing list