[Biopython-dev] More relaxed parsing of wonky GenBank files
Michiel de Hoon
mjldehoon at yahoo.com
Tue Jan 8 06:11:46 EST 2013
Entrez.parse has a "validate" argument to allow parsing of XML files that contain tags that are not represented in the corresponding DTD. If validate==True, the parser raises an Exception if any tags are missing. If False, then the parser will ignore missing tags.
Maybe SeqIO.parse could have a similar "validate" argument?
Best,
-Michiel.
--- On Tue, 1/8/13, Kai Blin <kai.blin at biotech.uni-tuebingen.de> wrote:
> From: Kai Blin <kai.blin at biotech.uni-tuebingen.de>
> Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, January 8, 2013, 5:28 AM
> Hi folks,
>
> I've recently pushed into production use a new version of my
> software
> that uses BioPython parsers instead of our own hand-written
> parsers.
>
> One big thing we noticed is that BioPython is waaay more
> picky as to
> what a proper GenBank file is supposed to look like. Sadly,
> many of
> our users seem to be creating their GenBank files with
> programs that
> only have a rough understanding what the file format is
> supposed to
> look like. Most of the invalid input can safely be ignored,
> and I
> would propose to extend the GenBank parser to cope with the
> most
> common errors I'm seeing in day to day use.
>
> I'm happy to provide the patches, but before starting this
> work I'd
> like to make sure that they would be acceptable in
> principle. So, any
> reason to rather blow up in our user's face than to try and
> cope with
> invalid input?
>
> Cheers,
> Kai
>
> --
> Dipl.-Inform. Kai Blin
> kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-Universität Tübingen
> Auf der Morgenstelle 28
> Phone : ++49 7071 29-78841
> D-72076 Tübingen
> Fax
> : ++49 7071 29-5979
> Germany
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
More information about the Biopython-dev
mailing list