[Biopython] Bug in GenBank/EMBL parser?

Peter biopython at maubp.freeserve.co.uk
Tue Apr 27 09:45:20 UTC 2010


On Thu, Apr 22, 2010 at 9:56 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Apr 22, 2010 at 2:07 AM, Uri Laserson <laserson at mit.edu> wrote:
>> Hi,
>>
>> I am trying to use the EMBL parse to parse the IMGT/LIGM flatfile (which
>> supposedly conforms to the EMBL standard).
>>
>> The short story is that whenever there is a feature, the parser checks
>> whether there are qualifiers in the feature with an assert statement, and
>> does not allow features with no qualifiers.  However, the IMGT flatfile is
>> full of entries that have features with no qualifiers (only coordinates).
>>
>> Who is wrong here?  Does the EMBL specification require that a feature have
>> qualifiers?  Or is this a bug to be fixed in the parser.
>
> Hi Uri,
>
> Thank you for your detailed report,
>
> Since you have raised this, I went back over the EMBL documentation.
> All their example features qualifiers (and from personal experience all
> EMBL files from the EMBL and GenBank files from the NCBI) do have
> qualifiers. However, in Section 7.2 they are called "Optional qualifiers".
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#7.2
>
> So it does look like an unwarranted assumption in the Biopython
> parser (even though it has been a safe assumption on "official" EMBL
> and GenBank files thus far), which we should fix.

Bug filed and now fixed,
http://bugzilla.open-bio.org/show_bug.cgi?id=3062

It turned out to be an invalid EMBL file where the features were over-
indented. Biopython was quite happy to parse valid EMBL or GenBank
files with features without qualifiers (although I don't recall seeing any
examples from EMBL or the NCBI like this).

Peter




More information about the Biopython mailing list