[Bioperl-l] Re: GuessSeqFormat problems

Andreas Kahari andreas.kahari at ebi.ac.uk
Tue Aug 16 05:42:20 EDT 2005


On Tue, Aug 16, 2005 at 10:50:03AM +0200, Albert Vilella wrote:
> El dl 15 de 08 del 2005 a les 18:11 -0400, en/na Jason Stajich va
> escriure:
> > 
> > Albert - 
> > 
> > I think the new guessing changes for phylip are causing havoc.   Lots
> > of tests are failing t/GuessSeqFeature.t.  Can you take a look?
> 
> Uops, sorry about that.
> 
> I was trying to make the match for phylip more generic in $lineno=2. In
> my case it was returning an unexistent Bio::AlignIO::pir.
> 
> I have fixed it and now passes all the tests.
> 
> > I was looking over this module - it seems like we probably want to run
> > the tests in a particular order as some matches are ambiguous and we
> > probably need to have preferred order. At least we'll know when
> > something fails, what the order.
>
> As I understand from the DESCRIPTION, the more lines one checks, the
> better is determined, isn't it?
>
> Maybe it would help to add more line checks in some of the formats, that
> are loosely constricted in their first lines.

Yes, in some cases.  For a format to "win", its test needs to
be the "last one standing" after all the others have failed.
This naturally means that adding more formats will make the
guessing more uncertain, and the test rules need to be more and
more specific for them to be really useful.  On the other hand,
adding rules (or-parts to the if-statement) might make the test
push other tests out of the competition even though they might
be more deterministic of the actual format.

I was playing around with some "scoring" of the formats, so that
one could write a format test that would be allowed to sometimes
fail in one rule without disqualify that format as a possible
candidate.  This was too elaborate at the time and I settled for
a simple pass/fail system.

(Disclaimer :-) My aim in writing the module was to have a
*guessing* facility, not a routine that *determines* the format
of the input data.  I hope that this has been made clear.

> > Another thing is it uses open directly instead of allowing Root::IO to
> > open a filehandle.  If went to using Root::IO, it would allow peeking
> > at not only a file but a filehandle/stream and then use _pushback
> > after we have peeked over the first few lines, guess the format, then
> > pass it along to the SeqIO/AlignIO handle appropriately.

This is a good suggestion.  I will not have time to do this now
though, so if no-one else wants to supply this patch I'll look
at it at a later stage.


Regards,
Andreas


-- 
Andreas Kähäri
EMBL-EBI/ensembl

---{ www.embl.org }---{ www.ebi.ac.uk }---{ www.ensembl.org }---


More information about the Bioperl-l mailing list