[Bioperl-l] EMBL file with space before quoted, multi-line qualifier value

Hamish McWilliam hamish.mcwilliam at bioinfo-user.org.uk
Thu Apr 9 16:05:10 UTC 2015


On 9 April 2015 at 16:18, Fields, Christopher J <cjfields at illinois.edu>
wrote:

> On Apr 9, 2015, at 8:03 AM, Adam Sjøgren <adsj at novozymes.com> wrote:
> >
> > Adam writes:
> >
> >> -        my( $qualifier, $value ) = m{^/([^=]+)(?:=(.+))?}
> >> +        my( $qualifier, $value ) = m{^/([^=]+)*(?:=\s*(.+))?}
> >                                                 |
> > I got a '*' to many there -----------------------'
> >
> > I (only) meant to add the '\s*' after the '=':
> >
> > +        my( $qualifier, $value ) = m{^/([^=]+)(?:=\s*(.+))?}
> >
> > Sorry!
> >
> > --
> >                                                          Adam Sjøgren
> >                                                    adsj at novozymes.com
>
> As long as this passes current tests I don’t have a problem with adding it
> in.  I would suggest adding a simple test case for it; you could modify a
> current EMBL file in the data directory if needed for a test case, probably
> no need to add a new file.
>
> I’ve long felt there's a fine line between having a parser being a strict
> validation tool and having it be flexible enough to allow for
> idiosyncrasies from various tools (e.g. see any GenBank output from
> anywhere).  I tend to veer in the direction of flexibility within reason;
> having a test suite helps quite a bit.
>
>
In general I agree that having some wiggle room when reading is good.
However it is also good to have the option of stricter interpretations of
the data format specification, to catch errors like this and give users the
option of informing the source of such data that their output needs to be
adjusted to match the format specification. This makes it easier to ensure
that tools which write these formats use stricter interpretations than
those that read them, and outcome which makes everyone happier.

All the best,

Hamish


> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
----
"Saying the internet has changed dramatically over the last five years is
cliché – the internet is always changing dramatically" - Craig Labovitz,
Arbor Networks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/bioperl-l/attachments/20150409/acd4762b/attachment.html>


More information about the Bioperl-l mailing list