[Bioperl-l] Bio::*Taxonomy* changes

Sendu Bala bix at sendu.me.uk
Tue Jul 25 03:05:23 EDT 2006


Chris Fields wrote:
>
> There is one thing I will make perfectly clear here: there should  
> never, ever be enforced lookups for SeqIO (even using caches), though  
> I have no problem having optional ones.  This is something I have  
> stated before and what you propose below steers dangerously in that  
> direction.  Where, for instance, do you store the lineage from a  
> GenBank file?  Do you want to do a series of Tax lookups to restore  
> that data?  I think that the number one complaint for sequence  
> parsing is speed, which would only get slower with lookups (even  
> cached).

I already gave a code example of exactly how Bio::Taxonomy is perfect 
for storing the lineage data in a GenBank file with or without a 
database lookup. I think perhaps at the time you first read this you 
basically ignored it because you had trouble with the idea of adding 
nodes to a species. If you have been glossing over my argument, it may 
be instructive to go over what I've been saying with a clear eye. 
Anyway, here it is again, and remember in this example, Bio::Species isa 
Bio::Taxonomy:


## the fully-manual way
my $species = new Bio::Species;
my $node = new Bio::Taxonomy::Node(-name => 'Saccharomyces cerevisiae',
                                    -rank => 'species', -object_id => 1,
                                    -parent_id => 2);
my $n2 = new Bio::Taxonomy::Node(-name => 'Saccharomyces',
                                  -object_id => 2, -parent_id => 3);
# (no assumption that 'Saccharomyces' is the genus, so rank() undefined)
my $n3 = [etc]
$species->add_node($node);
$species->add_node($n2);
[etc]

## Using a factory without db access
# assume that Bio::Taxonomy::GenbankFactory implements
# some modified Bio::Taxonomy::FactoryI
my $factory = Bio::Taxonomy::GenbankFactory->new();
my $species = $factory->generate(-classification => ['Saccharomyces
              cerevisiae', 'Saccharomyces', 'Saccharomycetaceae' ...]);
# the generate() method above just does the fully-manual way for you

## Using a factory with db access
# assume that Bio::Taxonomy::EntrezFactory implements some
# modified Bio::Taxonomy::FactoryI and uses Bio::DB::Taxonomy::entrez
# to get the nodes
my $factory = Bio::Taxonomy::EntrezFactory->new();
my $species = $factory->fetch(-scientifc_name => 'Saccharomyces
                                                    cerevisiae');


So now do you see how we're able to do the Genbank no-db way and the 
db-using way with the same object model? We're able to do it the same, 
sane way because a Node is just a node; you can make them yourself 
manually, or retrieve them from a database. Once you stick them in a 
Taxonomy you can then (potentially) ask all the questions of the data 
that you can with existing Bio::Species. No cruft is required anywhere 
at all. All the Taxonomy classes can be 'pure', while only Bio::Species 
has to have backward-compatibility methods.


More information about the Bioperl-l mailing list