[Biopython-dev] More relaxed parsing of wonky GenBank files

Michiel de Hoon mjldehoon at yahoo.com
Tue Jan 8 06:11:46 EST 2013


Entrez.parse has a "validate" argument to allow parsing of XML files that contain tags that are not represented in the corresponding DTD. If validate==True, the parser raises an Exception if any tags are missing. If False, then the parser will ignore missing tags.
Maybe SeqIO.parse could have a similar "validate" argument?

Best,
-Michiel.

--- On Tue, 1/8/13, Kai Blin <kai.blin at biotech.uni-tuebingen.de> wrote:

> From: Kai Blin <kai.blin at biotech.uni-tuebingen.de>
> Subject: [Biopython-dev] More relaxed parsing of wonky GenBank files
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, January 8, 2013, 5:28 AM
> Hi folks,
> 
> I've recently pushed into production use a new version of my
> software
> that uses BioPython parsers instead of our own hand-written
> parsers.
> 
> One big thing we noticed is that BioPython is waaay more
> picky as to
> what a proper GenBank file is supposed to look like. Sadly,
> many of
> our users seem to be creating their GenBank files with
> programs that
> only have a rough understanding what the file format is
> supposed to
> look like. Most of the invalid input can safely be ignored,
> and I
> would propose to extend the GenBank parser to cope with the
> most
> common errors I'm seeing in day to day use.
> 
> I'm happy to provide the patches, but before starting this
> work I'd
> like to make sure that they would be acceptable in
> principle. So, any
> reason to rather blow up in our user's face than to try and
> cope with
> invalid input?
> 
> Cheers,
> Kai
> 
> -- 
> Dipl.-Inform. Kai Blin     
>    kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-Universität Tübingen
> Auf der Morgenstelle 28         
>        Phone : ++49 7071 29-78841
> D-72076 Tübingen           
>             Fax
> :   ++49 7071 29-5979
> Germany
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 



More information about the Biopython-dev mailing list