[Bioperl-l] Bio::Tools::Glimmer

Mon Feb 12 23:13:09 UTC 2007

On 2/7/07, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     Well, each format has some unique features.  If the user declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just to see
> how nasty it would end up being.  I just can't stomach having the code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.

    I've got a 4-in-1 parser roughed in per Chris Fields' suggestion.   Two
actual parsing routines (prokaryotic and eukaryotic).  You can specify
-format as an arg to the constructor (Glimmer, GlimmerM, GlimmerHMM), or it
will look through the input until it can figure out what it is looking at.
    I've got one main issue to solve, the rest is just stuff like updating
the POD.  Torsten Seemann very helpfully added example output for all 4
formats to t/data.  Looking at GlimmerHMM.out, the first line is
'GlimmerHMM'.  However, I think there is a bug in the existing
_parse_predictions:

Shouldn't this:

} elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }

be this instead:

} elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }

I lifted that bit of code to do format detection...we don't have GlimmerHMM
installed locally, so I'm assuming Torsten's output is correct and the above
is a bug.  Guess I'll go check bugzilla...