[Bioperl-l] Bio::*Taxonomy* changes

Hilmar Lapp hlapp at gmx.net
Tue Jul 25 09:47:47 EDT 2006


On Jul 25, 2006, at 3:05 AM, Sendu Bala wrote:

> [...]
> ## the fully-manual way
> my $species = new Bio::Species;
> my $node = new Bio::Taxonomy::Node(-name => 'Saccharomyces  
> cerevisiae',
>                                     -rank => 'species', -object_id  
> => 1,
>                                     -parent_id => 2);

If this is meant as an example for the use cases I enumerated, then  
you wouldn't have the parent_id from a Genbank file. However, you  
didn't have that before either, so no problem.

> my $n2 = new Bio::Taxonomy::Node(-name => 'Saccharomyces',
>                                   -object_id => 2, -parent_id => 3);
> # (no assumption that 'Saccharomyces' is the genus, so rank()  
> undefined)

I think in a confident parse you want to assign 'genus' if there's  
little doubt, for example 'Saccharomyces cerevisiae'. Not sure  
whether there are weird viri whose names look innocuous but in  
reality the name doesn't follow binomial convention.

> my $n3 = [etc]
> $species->add_node($node);
> $species->add_node($n2);

I know why you are doing this, but seeing this people will hit a  
mental snag. You should listen to Chris' refusal to see the sense in  
this as an indication that many people down the road won't see the  
sense either.

So instead, make the logical model in your design more obvious, which  
I think ultimately will help maintainability as well. For example:

my $taxonomy = Bio::Taxonomy->new();
my $node = new Bio::Taxonomy::Node(-name => 'Saccharomyces cerevisiae',
                                     -rank => 'species', -object_id  
=> 1,
                                     -parent_id => 2);
my $n2 = new Bio::Taxonomy::Node(-name => 'Saccharomyces',
                                   -object_id => 2, -parent_id => 3);
$taxonomy->add_node($node);
$taxonomy->add_node($n2);

my $species = Bio::Species->new(-lineage => $taxonomy);
print $species->binomial();
print $species->genus();
# this may trigger a lookup if a taxonomy db handle has been set, e.g.:
# $taxonomy->db_handle(Bio::DB::Taxonomy->new(-source => 'entrez'));
print $species->classification();


> [etc]
>
> ## Using a factory without db access
> # assume that Bio::Taxonomy::GenbankFactory implements
> # some modified Bio::Taxonomy::FactoryI
> my $factory = Bio::Taxonomy::GenbankFactory->new();
> my $species = $factory->generate(-classification => ['Saccharomyces
>               cerevisiae', 'Saccharomyces',  
> 'Saccharomycetaceae' ...]);
> # the generate() method above just does the fully-manual way for you

Except the method name would be create_object(), the parameter would  
be a hash ref, and the return value would be a Bio::TaxonomyI  
compliant object:

my $taxonomy = $factory->create_object({-classification =>  
['Saccharomyces
               cerevisiae', 'Saccharomyces',  
'Saccharomycetaceae' ...]});
my $species = Bio::Species->new(-lineage => $taxonomy);


>
> ## Using a factory with db access
> # assume that Bio::Taxonomy::EntrezFactory implements some
> # modified Bio::Taxonomy::FactoryI and uses Bio::DB::Taxonomy::entrez
> # to get the nodes
> my $factory = Bio::Taxonomy::EntrezFactory->new();

The logic where to do a lookup on should not be duplicated here. It  
only belongs under Bio::DB::Taxonomy::*.

> my $species = $factory->fetch(-scientifc_name => 'Saccharomyces
>                                                     cerevisiae');

Likewise, use the methods defined in Bio::DB::Taxonomy, and again,  
the return type is Bio::Taxonomy, which you would pass to  
Bio::Species->new().

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







More information about the Bioperl-l mailing list