[Bioperl-l] Bio::*Taxonomy* changes
Sendu Bala
bix at sendu.me.uk
Wed Jul 26 12:11:25 EDT 2006
Chris Fields wrote:
>> No. Lineage information must be in the form of Nodes or you can't answer
>> lineage-related taxonomic questions.
>
> You must have a way to store the 'horrible lineage information' data, as is,
> for those users who do not care about taxonomy and just want to convert seq
> streams. You shouldn't burden the everyday user with something that is
> pretty specialized, this being finding correct taxonomic information based
> on DB lookups for a particular reason (screening sequences, as Hilmar
> pointed out, was one possibility).
I am certainly not requiring that anyone find 'correct taxonomic
information'. The whole reason I am backing my current proposal is that
it works equally well with or without access to NCBI's taxonomy
database. Your proposals work poorly without access to such.
> I don't care how, but store lineage information as it appears in the file
> (scalar string) or in a simple data structure (array, maybe?) capable of
> retaining the information in some way. There are many many ways of doing
> this which I have previously pointed out; take your pick.
I've taken my pick.
To set:
my $db = new Bio::DB::Taxonomy(-source => 'list', -lineage => @lineage);
$node->db_handle($db);
To get:
@lineage = map { $_->scientific_name } $node->get_Lineage_Nodes;
That is as simple as it is going to get in a world where we have 'pure'
Nodes or any other kind of pure taxonomic class.
If you want to hide the taxonomic complexity from end-users who want to
make and store their own lineage of their species without having to know
the details of how bioperl's taxonomy modules are supposed to work, tell
them to use Bio::Species:
To set:
$species->classification(@lineage);
To get:
@lineage = $species->classification;
Of course in this example I propose that behind the scenes Bio::Species
is a Bio:Taxonomy::Node and just implements classification() the pure
Node way, given above.
Let me make my requirement very clear: the solution must allow you to
find the most recent common ancestor of two solution-objects without
access to the NCBI taxonomy database, using exactly the same method call
you would use if you /did/ have access to the NCBI taxonomy database.
The method in question shouldn't need any special-case code depending on
the presence or absence of NCBI taxonomy database.
That's the litmus test. I'll tend to reject any solution that fails.
More information about the Bioperl-l
mailing list