[Biopython] Can the GenBank/EMBL parser recover from errors?

Uri Laserson laserson at mit.edu
Wed Apr 28 19:12:28 UTC 2010


Hi,

I am trying to parse a large file of EMBL records that I know has some
errors in it.  However, rather than having the parser break when it gets to
the error, I'd rather it just skip that record, and move on to the next one.
 I was wondering if this functionality is already built in somewhere.  One
way I can do this is like this:

iterator = SeqIO.parse(ip,'embl').__iter__()
while True:
    try:
        record = iterator.next()
    # Now I specify all the parsing errors I want to catch:
    except LocationParserError:
        # Reinitialize iterator at current file position. The iterator
        # then skips to the beginning of the next record and continues.
        iterator = SeqIO.parse(ip,'embl').__iter__()
    except StopIteration:
        break

This way, whenever there is a parsing error, I just reinitialize the
iterator at the current file position, and it seeks to the beginning of the
next record.  However, this requires me to write out the for loop manually
(using StopIteration).  Does anyone know of a cleaner/more elegant way of
doing this?

Thanks!
Uri

-- 
Uri Laserson
Graduate Student, Biomedical Engineering
Harvard-MIT Division of Health Sciences and Technology
M +1 917 742 8019
laserson at mit.edu



More information about the Biopython mailing list