[Biopython] Bug in GenBank/EMBL parser?

Peter biopython at maubp.freeserve.co.uk
Thu Apr 22 08:56:52 UTC 2010


On Thu, Apr 22, 2010 at 2:07 AM, Uri Laserson <laserson at mit.edu> wrote:
> Hi,
>
> I am trying to use the EMBL parse to parse the IMGT/LIGM flatfile (which
> supposedly conforms to the EMBL standard).
>
> The short story is that whenever there is a feature, the parser checks
> whether there are qualifiers in the feature with an assert statement, and
> does not allow features with no qualifiers.  However, the IMGT flatfile is
> full of entries that have features with no qualifiers (only coordinates).
>
> Who is wrong here?  Does the EMBL specification require that a feature have
> qualifiers?  Or is this a bug to be fixed in the parser.

Hi Uri,

Thank you for your detailed report,

Since you have raised this, I went back over the EMBL documentation.
All their example features qualifiers (and from personal experience all
EMBL files from the EMBL and GenBank files from the NCBI) do have
qualifiers. However, in Section 7.2 they are called "Optional qualifiers".
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#7.2

So it does look like an unwarranted assumption in the Biopython
parser (even though it has been a safe assumption on "official" EMBL
and GenBank files thus far), which we should fix.

Could you file a bug please?
http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

This also affect Biopython 1.54b (the latest release) and the current
code in the repository. I would hope we can solve this before
Biopython 1.54 proper is released.

Regards,

Peter




More information about the Biopython mailing list