[Bioperl-l] Bio::SearchIO::hmmer hsp behaviour

Thu Jun 29 13:27:00 UTC 2006

On Jun 29, 2006, at 2:02 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> Personally, I don't think right now is the time to think about  
>> refactoring
>> this particular module, esp. since I find it essentially works.  I  
>> believe
>> that energy is better spent elsewhere, such as SeqIO::genbank/ 
>> swiss/embl for
>> instance, or refactoring SearchIO::blast etc to use hashes instead of
>> objects to speed things up.  Or creating something yourself.  Or  
>> doing what
>> you currently are doing (Bio::Map).  In other words, areas where  
>> use is
>> high, code is aging, and refactoring is more productive.
>
> Hmmer parsing happens to be important to me, in fact vital for my  
> work.
> I've been using my own parser up till now, so didn't know what the
> Bioperl one was like. I'd like to use Bioperl for more things,
> preferably everything.

We're not deterring you from setting up your own parser, something  
both Jason and I suggested.  I just don't see what the major issue  
is; hmmerpfam results never really contain the same number of hits  
per query that BLAST does (I get at the very most 30-40 and that is  
usually based on repeats).  I believe the best place to spend this  
energy first and foremost is fixing the bug.

>> I'll add that I'm not trying to dissuade you from trying to build  
>> your own
>> variation of a SearchIO HMMER parser; by all means go ahead.  The  
>> above is
>> how I feel.  You can build your own parser to do what you want;  
>> you can even
>> base it off the current SearchIO HMMER parser and see if you can  
>> set it up
>> to give you the results you want, using a different handler and so  
>> on.  Just
>> don't break the API or modify the current code based strictly on  
>> what your
>> opinion of how it should work is.  It was probably set up this way  
>> for a
>> particular reason.
>
> Well, I don't like the idea of there being multiple SearchIO  
> parsers for
> the same thing.

See, here's the thing: if the community-at-large decides to use your  
version of the parser then, by default it will become the only HMMER  
SearchIO parser and we'll deprecate the old one.  I just don't think  
this is the way I would go about it.  Jason has mentioned that object  
instantiation is a bigger issue with parsing (speed) than anything  
else; why not, if you plan on doing this, set up a Handler to return  
hashes, or do it completely under-the-hood?  Have it be the 'new,  
faster way to run SearchIO.'  Don't rehash (pardon the bad pun) the  
way things were esp. when proposals are out there to improve the  
toolkit.

> [...]
>> And, frankly, it's not up to the user when using code they didn't  
>> create.
>> You have to deal with it.  Or code something yourself to do things  
>> the way
>> you want.  You have the power to do that; most bioperl users don't  
>> simply
>> b/c they probably don't understand the class structure and OO  
>> nature of
>> Bioperl.  It's just a matter of where you want to spend your  
>> energy: dealing
>> with something that interests you or fixing other's people's  
>> broken code.
>
> My original question was essentially: does doing it my way make sense?
> And implicitly: would doing it my way be of any harm? Ie. can I go  
> ahead
> and change how the parser reports results and groups them together? I
> don't think it will involve an API change, but the results it  
> generates
> will obviously be very different.

And my point is that both ways make sense, at least to me (and it  
sounds like to Jason though I could be wrong).  Again, create a new  
version of the parser based on what you want to do and accomplish.   
Don't just modify something the community at-large uses based on your  
whims. Make the changes to a new module and let the community  
decide.  As an example, BioPerl, for the longest time, had several  
BLAST parsers; we directed everybody over to SearchIO and most people  
seem to like it; hence the others are deprecated.

And changing the results returned by some could be considered  
changing the API or a bug.  If someone using this module has an  
automated pipeline set up for annotation using Pfam, hmmpfam,  
Bioperl, and a database, and their setup expects single model/domain  
pairs, yeah, your changes will break that.  Maybe small,  
inconsequential even, but it's possible (and even true; many genome  
annotation pipelines are set up exactly how I describe).

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign