[Bioperl-l] Bio::*Taxonomy* changes

Sendu Bala bix at sendu.me.uk
Wed Jul 26 12:11:25 EDT 2006


Chris Fields wrote:
>> No. Lineage information must be in the form of Nodes or you can't answer
>> lineage-related taxonomic questions.
> 
> You must have a way to store the 'horrible lineage information' data, as is,
> for those users who do not care about taxonomy and just want to convert seq
> streams.  You shouldn't burden the everyday user with something that is
> pretty specialized, this being finding correct taxonomic information based
> on DB lookups for a particular reason (screening sequences, as Hilmar
> pointed out, was one possibility).  

I am certainly not requiring that anyone find 'correct taxonomic 
information'. The whole reason I am backing my current proposal is that 
it works equally well with or without access to NCBI's taxonomy 
database. Your proposals work poorly without access to such.


> I don't care how, but store lineage information as it appears in the file
> (scalar string) or in a simple data structure (array, maybe?) capable of
> retaining the information in some way.  There are many many ways of doing
> this which I have previously pointed out; take your pick.

I've taken my pick.

To set:
my $db = new Bio::DB::Taxonomy(-source => 'list', -lineage => @lineage);
$node->db_handle($db);

To get:
@lineage = map { $_->scientific_name } $node->get_Lineage_Nodes;

That is as simple as it is going to get in a world where we have 'pure' 
Nodes or any other kind of pure taxonomic class.

If you want to hide the taxonomic complexity from end-users who want to 
make and store their own lineage of their species without having to know 
the details of how bioperl's taxonomy modules are supposed to work, tell 
them to use Bio::Species:

To set:
$species->classification(@lineage);

To get:
@lineage = $species->classification;

Of course in this example I propose that behind the scenes Bio::Species 
is a Bio:Taxonomy::Node and just implements classification() the pure 
Node way, given above.


Let me make my requirement very clear: the solution must allow you to 
find the most recent common ancestor of two solution-objects without 
access to the NCBI taxonomy database, using exactly the same method call 
you would use if you /did/ have access to the NCBI taxonomy database. 
The method in question shouldn't need any special-case code depending on 
the presence or absence of NCBI taxonomy database.

That's the litmus test. I'll tend to reject any solution that fails.


More information about the Bioperl-l mailing list