[Biopython] Can the GenBank/EMBL parser recover from errors?

Peter biopython at maubp.freeserve.co.uk
Wed May 5 14:09:44 EDT 2010


Peter wrote:
>Uri wrote:
>> This way, whenever there is a parsing error, I just reinitialize the
>> iterator at the current file position, and it seeks to the beginning of the
>> next record.  However, this requires me to write out the for loop manually
>> (using StopIteration).  Does anyone know of a cleaner/more elegant way
>> of doing this?
>>
>> Thanks!
>
> Hi Uri,
>
> There is no obvious way to handle this within the Bio.SeqIO.parse framework.
>
> I'd suggest you use Bio.SeqIO.index instead (assuming the file isn't
> so corrupt that it can't be scanned to identify each record). Just
> wrap each record access in an error handler.

That approach should now work with the latest code on the trunk.
Up until recently the EMBL index code was not picking up on the
AC line which can be used for the record.id in the parser. This
didn't seem to matter for the EMBL files in our unit tests, but does
for those from the IMGT:

http://github.com/biopython/biopython/commit/e3fb9f7b643099042cb7188f383f256b36befb52

Peter



More information about the Biopython mailing list