[Bioperl-l] hmmer3/hmmscan parser

Wed May 26 16:17:50 UTC 2010

On May 26, 2010, at 10:25 AM, Thomas Sharpton wrote:

>> ...
>> I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3.=
> 
>> But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption.
>> 
>> Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ?
>> 
> 
> I was not a part of that conversation either and I'm also operating under a similar assumption about what "integrating the hmmer.pm parser" means.  I too am confused about the statement regarding modularization; I assume Kai meant that next_result would leverage the HMMER version number (which it already grabs) to guide the appropriate parsing of the datafile.  Not thinking about this too carefully, it might be a simple as:
> 
> next_result{
> 	version = get_hmmer_version
> 	if version == 2
> 		parse V2 report file
> 	if version == 3
> 		parse V3 report file
> }
> 
> to make the code a bit more manageable, the various version parsers could be appropriated to independent subroutines.
> 
> Kai, is this along the lines of what you were thinking?
> 
> If this is correct (that is, merging the H2 and H3 parsers into a single hmmer.pm module), I see one primary benefit - the end user need not specify which HMMER module they want to implement, just use Bio::SearchIO::hmmer - and one secondary benefit - there's enough similarity between H2 and H3 reports that some from the H2 parser redundantly appears in the H3 parser.  There are certainly other benefits that I'm overlooking.
> 
> The only real downside I see at the moment is that the hmmer.pm parser becomes a bit more complicated and bloated. But I suspect this can be remedied with careful partitioning of the code into appropriate subroutines and thorough documentation. I am a bit concerned about how the aforementioned H3 specific methods are incorporated, but that should be manageable.
> 
> I wonder if anyone involved in the IRC discussion cares to weigh in?
> 
> Regardless, I'd advocate getting the H3 version fully flushed out to deal with the issues brought up in the first half of this message prior to an attempt to merge the two modules, as the merging process may be affected by the structure of the H3 parser.
> 
> Best,
> Tom

That's essentially the idea, though it can be cleaner than that if we're expecting the entire stream of reports will be of the same version (set the proper next_result method at instantiation).  SearchIO::infernal does something like this.  Or it can call out to a handler, like SearchIO::blastxml.  YMMV.

chris