[Biopython] Bug in GenBank/EMBL parser?

Uri Laserson laserson at mit.edu
Wed Apr 28 21:38:52 UTC 2010


This fixed the main problem with parsing IMGT files that have increased
indentation.  I also filed an additional bug/enhancement with a proposed
patch, which should make biopython compatible with IMGT and still conform to
the INSDC format: http://bugzilla.open-bio.org/show_bug.cgi?id=3069

<http://bugzilla.open-bio.org/show_bug.cgi?id=3069>Uri

On Tue, Apr 27, 2010 at 05:45, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Thu, Apr 22, 2010 at 9:56 AM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> > On Thu, Apr 22, 2010 at 2:07 AM, Uri Laserson <laserson at mit.edu> wrote:
> >> Hi,
> >>
> >> I am trying to use the EMBL parse to parse the IMGT/LIGM flatfile (which
> >> supposedly conforms to the EMBL standard).
> >>
> >> The short story is that whenever there is a feature, the parser checks
> >> whether there are qualifiers in the feature with an assert statement,
> and
> >> does not allow features with no qualifiers.  However, the IMGT flatfile
> is
> >> full of entries that have features with no qualifiers (only
> coordinates).
> >>
> >> Who is wrong here?  Does the EMBL specification require that a feature
> have
> >> qualifiers?  Or is this a bug to be fixed in the parser.
> >
> > Hi Uri,
> >
> > Thank you for your detailed report,
> >
> > Since you have raised this, I went back over the EMBL documentation.
> > All their example features qualifiers (and from personal experience all
> > EMBL files from the EMBL and GenBank files from the NCBI) do have
> > qualifiers. However, in Section 7.2 they are called "Optional
> qualifiers".
> >
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#7.2
> >
> > So it does look like an unwarranted assumption in the Biopython
> > parser (even though it has been a safe assumption on "official" EMBL
> > and GenBank files thus far), which we should fix.
>
> Bug filed and now fixed,
> http://bugzilla.open-bio.org/show_bug.cgi?id=3062
>
> It turned out to be an invalid EMBL file where the features were over-
> indented. Biopython was quite happy to parse valid EMBL or GenBank
> files with features without qualifiers (although I don't recall seeing any
> examples from EMBL or the NCBI like this).
>
> Peter
>



-- 
Uri Laserson
Graduate Student, Biomedical Engineering
Harvard-MIT Division of Health Sciences and Technology
M +1 917 742 8019
laserson at mit.edu



More information about the Biopython mailing list