[Bioperl-l] Bio::*Taxonomy* changes

Hilmar Lapp hlapp at gmx.net
Tue Jul 25 09:54:14 EDT 2006


We intend on having everyone who wants correct taxonomy parsing  
results for the entire kingdom of life to define his/her  
authoritative taxonomy database, be it local or not, be it HTTP or  
SQL queried.

If you don't care about the correctness of the taxonomy parse, or if  
the taxonomy information in the flat file is trivially parseable  
because it conforms to standard binomial convention, then whatever is  
to be put in place needs to work fine regardless of whether a  
taxonomy database is defined or not.

	-hilmar

On Jul 25, 2006, at 1:53 AM, Chris Fields wrote:

> So do we intend on having everyone who installs bioperl have a local
> copy of the taxonomy dumpfile?  Or perform a remote lookup via
> Entrez?  Seems a bit extreme.
>
> I would like the option of not having the lookup run; as I mentioned
> to Sendu, one of the biggest complaints about bioperl is speed.
> Additional lookups won't help on that end.
>
> Chris
>
> On Jul 24, 2006, at 10:31 PM, Hilmar Lapp wrote:
>
>>
>> On Jul 24, 2006, at 10:29 PM, Chris Fields wrote:
>>
>>> [...]
>>> We could go back and forth on what Jason really intended. [...] The
>>> reality is he's not here and you're willing to do the job.
>>
>> Right. And, knowing Jason, I think he'd be perfectly fine with seeing
>> his original idea develop in a possibly different direction, provided
>> it will all work nicely in the end. I'm willing to take the beating
>> on me if that doesn't turn out to be true ...
>>
>>>
>>> There is one thing I will make perfectly clear here: there should
>>> never, ever be enforced lookups for SeqIO (even using caches),
>>
>> You certainly don't want taxonomy lookups during the parsing stage,
>> and also not for the client requesting properties of the species that
>> have been parsed with high confidence, i.e.,  genus and species for a
>> straightforward binomial like 'Homo sapiens'.
>>
>> Writing sequences, IMHO, doesn't have to be as fast. It may be better
>> to emit strict format a bit slower rather than sloppy format a bit
>> faster.
>>
>> Upon parsing, one idea could be for the flat file parser to set a
>> dirty bit in the parsed out species if the parsed text didn't follow
>> strict binomial conventions, hence the parser may have made a mistake
>> and if a client requests the information it is better to lookup the
>> correct values from a taxonomy database. I.e., you could try with a
>> strict regex first that would imply a high-confidence result. If that
>> fails you don't give up but mark the result as untrustworthy.
>>
>>
>>> [...]
>>> This would have been MUCH easier if all three of us could have gone
>>> to the local bar for a beer and discussed it. We should just take
>>> the time out to videoconference next time.
>>
>> You're not honestly suggesting that a videoconference is better than
>> having beer together?
>>
>> Enjoy your trip, and thanks for hanging in there in the discussion, I
>> appreciate it.
>>
>> 	-hilmar
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







More information about the Bioperl-l mailing list