[Bioperl-l] Bio::*Taxonomy* changes

Chris Fields cjfields at uiuc.edu
Tue Jul 25 02:29:57 UTC 2006


Look, we're just going back and forth on this stupid little thing,  
when the only point we really are divided on is what object type we  
should store certain items in a GenBank file (Bio::Species/ 
Bio::Tax::Node/Bio::Whatever).  In particular, the main sticking  
point is the lineage.

We could go back and forth on what Jason really intended.   
Personally, I think his past statements are quite clear on what his  
intent was (he's very clear in the wiki on what Bio::Taxonomy::Node  
was built to replace, in two separate posts and within the last four  
months).  The reality is he's not here and you're willing to do the job.

There is one thing I will make perfectly clear here: there should  
never, ever be enforced lookups for SeqIO (even using caches), though  
I have no problem having optional ones.  This is something I have  
stated before and what you propose below steers dangerously in that  
direction.  Where, for instance, do you store the lineage from a  
GenBank file?  Do you want to do a series of Tax lookups to restore  
that data?  I think that the number one complaint for sequence  
parsing is speed, which would only get slower with lookups (even  
cached).

What I propose is we make it as simple as possible.  Remove the  
unnecessary genus/species/subspecies parsing in genbank.pm, store the  
scientific name, common names, and lineage in some easily accessible  
way to make it easier for everyday users to use, have it tied to  
Bio::Taxonomy in some way (I propose Node, as it contains almost all  
the methods needed) so that you could get more information by moving  
up and down nodes, or retrieve more information.  I, personally,  
don't see the point in having Bio:Species around after this  
discussion as Node seems to do the job adequately.

My last word (I will be exiting this discussion and the group for two  
weeks):

This would have been MUCH easier if all three of us could have gone  
to the local bar for a beer and discussed it.  We should just take  
the time out to videoconference next time.

Chris


> Chris Fields wrote:
>>
>> Also, I'm trying to follow the original idea as proposed by Jason  
>> (this is
>> from perldoc Bio::Taxonomy::Node):
>>
>> Which, to me, indicated that this would eventually replace  
>> Bio::Species
>
> Well, we don't really know that Jason didn't later change his mind,  
> but
> in any case it doesn't make sense (anymore, given that we have
> Bio::Taxonomy).
>
> In a direct reply to me you point out specific passages in the current
> docs that explain why you have thought we should delegate or replace
> Bio::Species with Bio::Taxonomy::Node. With respect, the old plans are
> not something we are forced to blindly follow. We decide for ourselves
> if they make sense, we decide for ourselves if there is a better  
> way of
> doing it, and then we do it the best way.
>
> So if you ignore what those old bits of documentation say, just  
> pretend
> you never ever read them, would my proposals make sense or not? Since
> those old proposals were never implemented we have no reason to try  
> and
> stick with them if there is a better proposal.
>
> And for the record, '...Bio::Species which is able to represent only
> species-level' can (correctly) be interpreted as 'Bio::Species is only
> supposed to be used for representing a taxonomy that includes the
> species-level'. You can't interpret it literally because  
> Bio::Species is
> used for levels below species, and also represents all the levels  
> above
> species-level as well. Either Jason got it wrong when he wrote  
> that, or
> you have misinterpreted it.
>
> Likewise, let's play the interpretation game again: 'Previously all
> information was managed by a single object called Bio::Species. [the
> Bio::Taxonomy::Node] implementation allows representation of the
> intermediate nodes not just the species nodes'. Note the apposition of
> 'single object' vs implication of multiple Node objects to do the same
> job. I imagine at the time Jason wrote that there was no  
> Bio::Taxonomy,
> no holder for multiple Nodes.
>
>
>> I had originally wanted to start delegating everything over to
>> Taxonomy::Node about a month ago, when I found that it was  
>> remarkably easy
>> to do so.  However, when Sendu proposed making changes to remove  
>> methods in
>> Bio::Taxonomy::Node and make sweeping changes to Taxonomy which would
>> prevent an easy transition over to Node,
>
> But an equally easy transition to Bio::Taxonomy instead. I don't know
> why you would care about the name of the class we switch to. My  
> concern
> is that when the switch is made it makes sense.
>
>
>> If we think it would be better to completely toss all this out the  
>> window
>> and use only a bare-bones Node, then I'm fine with that.   But if  
>> we go that
>> route we should just get rid of the Bio::Species 'disease'  
>> completely and
>> have things be much simpler.  Simple is good!
>>
>> I think Node can still act as a viable container class for the tax  
>> data from
>> a GenBank file (it's original purpose) as long as it has the very  
>> basic
>> methods for doing so.  That would require:
>>
>> scientific_name() - ORGANISM line data
>> common_names() - which could hold common names (in parentheses on  
>> the SOURCE
>> line) and the abbreviated name (from the SOURCE line)
>> ncbi_taxid() - from the 'source' seqfeature (already there).
>>
>> The lineage information and organelle information could be stored  
>> in Node or
>> in SimpleValue objects.  My vote is for the latter as there's no  
>> need for a
>> classification() container for Node, which you have repeatedly  
>> pointed out.
>
> No, this is the whole point. The lineage information can NOT be stored
> in a Node (unless you absuse Node by having all those crufty methods
> like genus() and classification()), and why would we store it in
> SimpleValue objects when we have Bio::Taxonomy?
>
> Bio::Taxonomy is completely perfect for storing the taxonomic
> information from a GenBank file. That's all you need to worry  
> about. Can
> we represent the data correctly? Yes. Do we gain all the good things
> about a pure Bio::Taxonomy? Yes. Can we still do everything we used to
> be able to do? Yes.
>
>
>> I think we should just get rid of Bio::Species completely.
>
> There's no need to get rid of Bio::Species. It can be a Bio::Taxonomy
> with backward-compatible methods. No harm done, all good.
>
>
> I'll tell you what. This will be easier if I just write the code  
> for my
> proposals, including whatever changes would be needed in
> Bio::SeqIO::genbank et al. You'll see how easy and appropriate it is,
> and hopefully everyone will be happy.
>
> Perhaps you could just hold off doing any similar-but-contradictory  
> work
> until then.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list