[Bioperl-l] Bio::*Taxonomy* changes

Sendu Bala bix at sendu.me.uk
Wed Jul 26 17:13:43 UTC 2006


Hilmar Lapp wrote:
> On Jul 26, 2006, at 6:00 AM, Sendu Bala wrote:
> 
>> Hilmar Lapp wrote:
>>> Instead, create something like
>>>
>>> 	# return a Bio::Taxonomy::Node:
>>> 	my $taxon = $seq->taxon();
>> Yes, but $seq->species() would also
> 
> $seq->species() would return a Bio::Species object which may not be  
> more than a thin shell anymore around an implementation that  
> delegates almost everything to a lineage object (Bio::Taxonomy).

I actually forgot to finish that sentence. I was going to suggest 
Bio::Species isa Bio::Taxonomy::Node and would indeed delegate most of 
its implementation to Node.


>>> 	# alternative approach: return a lineage (taxonomy)
>>> 	# this would be Bio::TaxonomyI compliant
>>> 	my $lineage = $seq->lineage();
>> I've since come to the conclusion that anything Taxonomy-ish would be
>> inappropriate - see recent post.
 >
> The fact that it's confusing to return a taxonomy from a method called species()  
> doesn't mean it's equally bad to return a lineage (which is a limited  
> taxonomy) from a method called lineage().

You wouldn't need to though. If you want a lineage you could ask your 
node for its lineage. There's no point in having a whole other class 
that contains a node and all its ancestor nodes, when to get the 
ancestors of a node all you have to do is $node->get_Lineage_Nodes().


>> My proposed solution is that bioperl's taxonomy model always lets you
>> answer the same questions regardless of your source for taxonomic
>> information - see recent post.
> 
> See above ... And I'd rather see some code or API examples

The fine details of the following may be slightly off, but it's just to 
provide an example. I'll use Test.pm syntax.

my @human = qw('Homo sapiens' Homo Mammalia Eukaryota);
my @mouse = qw('Mus musculus' Mus Mammalia Eukaryota);


Old way with Node
-----------------

my $h_node = new Bio::Taxonomy::Node(-classification => @human);
my $m_node = new Bio::Taxonomy::Node(-classification => @mouse);

@human = map { $_->scientific_name } $h_node->get_Lineage_Nodes;
ok @human, 0; # failure to work as expected
@human = $h_node->classification;
ok join(", ", @human), "Homo sapiens, Homo, Mammalia, Eukaryota";

my $lca = $h_node->get_LCA_Node($m_node);
ok $lca, undef; # failure to do anything useful because our lineage data
                 # is in an array, not in nodes

# try again with entrez - must make brand new objects
my $db = new Bio::DB::Taxonomy(-source => 'entrez');
$h_node = $db->get_Taxonomy_Node(-name => 'Homo sapiens');
$m_node = $db->get_Taxonomy_Node(-name => 'Mus musculus');

@human = map { $_->scientific_name } $h_node->get_Lineage_Nodes;
ok join(", ", @human) eq "Homo sapiens, Homo, Homo/Pan/Gorilla group,
                           Hominidae, ..."; # now it works!

$lca = $h_node->get_LCA_Node($m_node);
ok $lca->scientific_name, 'Mammalia'; # and now this works!


Old way with Bio::Species
-------------------------

# forget about it, Species has nothing like a get_LCA_Node()


Proposed way with Node
----------------------

my $db = new Bio::DB::Taxonomy(-source => 'list', -lineage => @human);
my $h_node = $db->get_Taxonomy_Node(-name => 'Homo sapiens');
$db->add_lineage(@mouse); # or make a new db
my $m_node = $db->get_Taxonomy_Node(-name => 'Mus musculus');

@human = map { $_->scientific_name } $h_node->get_Lineage_Nodes;
ok join(", ", @human), "Homo sapiens, Homo, Mammalia, Eukaryota";
# works as expected

my $lca = $h_node->get_LCA_Node($m_node);
ok $lca->scientific_name, 'Mammalia'; # works first time

# try again with entrez - just change the db_handle
$h_node->db_handle(new Bio:DB::Taxonomy(-source => 'entrez');

@human = map { $_->scientific_name } $h_node->get_Lineage_Nodes;
ok join(", ", @human) eq "Homo sapiens, Homo, Homo/Pan/Gorilla group,
                           Hominidae, ...";

$lca = $h_node->get_LCA_Node($m_node);
ok $lca->scientific_name, 'Mammalia';


Proposed way with Bio::Species
------------------------------
# (Bio::Species isa Bio::Taxonomy::Node, implements its methods like
#  above)

my $h_species = new Bio::Species(-classification => @human);
my $m_species = new Bio::Species(-classification => @mouse);

@human = map { $_->scientific_name } $h_node->get_Lineage_Nodes;
ok join(", ", @human), "Homo sapiens, Homo, Mammalia, Eukaryota";
@human = $h_species->classification;
ok join(", ", @human), "Homo sapiens, Homo, Mammalia, Eukaryota";

my $lca = $h_species->get_LCA_Node($m_species);
ok $lca->scientific_name, 'Mammalia';

# trying again with entrez behaves as per proposed Node, above



More information about the Bioperl-l mailing list