[Biopython-dev] More relaxed parsing of wonky GenBank files

Tue Jan 8 10:28:31 UTC 2013

Hi folks,

I've recently pushed into production use a new version of my software
that uses BioPython parsers instead of our own hand-written parsers.

One big thing we noticed is that BioPython is waaay more picky as to
what a proper GenBank file is supposed to look like. Sadly, many of
our users seem to be creating their GenBank files with programs that
only have a rough understanding what the file format is supposed to
look like. Most of the invalid input can safely be ignored, and I
would propose to extend the GenBank parser to cope with the most
common errors I'm seeing in day to day use.

I'm happy to provide the patches, but before starting this work I'd
like to make sure that they would be acceptable in principle. So, any
reason to rather blow up in our user's face than to try and cope with
invalid input?

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universität Tübingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 Tübingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben