[BioPython] Problem parsing genbank file

Brad Chapman chapmanb at uga.edu
Sat Mar 20 11:32:02 EST 2004


Hi Andy;

> I just updated from cvs and got this error when trying to parse a genbank
> file that had mutliple genbank files in it, I got this error :
> Traceback (most recent call last):
[...]
> Martel.Parser.ParserPositionException: error parsing at or beyond character
> 20805

Thanks for sending me the file separately. It actually looks like
one of the records in the file: AF124045, was somehow corrupted. The
region where the parser fails looks like:

     misc_feature    <38880..>39000
                     /note="putative breakpoint of recombination in orthologous
                     maize region, 
                     38875 bp is the end of homology, >38875-50877repeat_region   join(<38904..38924,38960..>39022)
                     /note="CT-rich stretches"
                     /evidence=not_experimental

where in the original file (from NCBI), it looks like:

     misc_feature    <38880..>39000
                     /note="putative breakpoint of recombination in orthologous
                     maize region,
                     38875 bp is the end of homology, >38875-50877< region
                     missing in maize; Region: Breakpoint"
                     /evidence=not_experimental
     repeat_region   join(<38904..38924,38960..>39022)
                     /note="CT-rich stretches"
                     /evidence=not_experimental

So somehow it looks like the text from "< region missing in" to the
next feature key (repeat_region) was deleted.

I've not seen something like this before, but the best solution
seems to be to re-download this record and try parsing it all again.
All of the other records in your file seem to parse fine.

Hope this helps.
Brad


More information about the BioPython mailing list