[Bioperl-l] EMBL file with space before quoted, multi-line qualifier value

Peter Cock p.j.a.cock at googlemail.com
Thu Apr 9 17:50:43 UTC 2015


On Thu, Apr 9, 2015 at 5:05 PM, Hamish McWilliam
<hamish.mcwilliam at bioinfo-user.org.uk> wrote:
> On 9 April 2015 at 16:18, Fields, Christopher J <cjfields at illinois.edu> wrote:
>>
>> As long as this passes current tests I don’t have a problem with adding it
>> in.  I would suggest adding a simple test case for it; you could modify a
>> current EMBL file in the data directory if needed for a test case, probably
>> no need to add a new file.
>>
>> I’ve long felt there's a fine line between having a parser being a strict
>> validation tool and having it be flexible enough to allow for idiosyncrasies
>> from various tools (e.g. see any GenBank output from anywhere).  I tend to
>> veer in the direction of flexibility within reason; having a test suite
>> helps quite a bit.
>>
>
> In general I agree that having some wiggle room when reading is good.
> However it is also good to have the option of stricter interpretations of
> the data format specification, to catch errors like this and give users the
> option of informing the source of such data that their output needs to be
> adjusted to match the format specification. This makes it easier to ensure
> that tools which write these formats use stricter interpretations than those
> that read them, and outcome which makes everyone happier.
>
> All the best,
>
> Hamish

What we're doing with the Biopython GenBank/EMBL parser
(and others) on 'problematic' things where we think we can
parse them unambiguously, is to parse them but give a
warning. The Python warnings framework lets the user
silence all our parser warning if they want to.

Where unambiguous parsing is a problem, I vote for an
error.

I've not checked this specific issue with the extra space in
a feature qualifier yet...

Peter



More information about the Bioperl-l mailing list