[BioPython] GenBank Parser Errors (Repost)

Mon Nov 8 05:28:29 EST 2004

Hi,

(I'm sorry if you get this twice, but I sent it to the list last week  
and didn't get a reply so I'm hoping someone with a suggestion will see  
it this time, thanks. )

I'm trying to use biopython to parse genbank files and it is working  
happily on some genbank files,  but not many others. So far the pattern  
appears to be

Prokaryotic complete genome => OK
Eukaryotic complete genome =>failure.

The failures are typically very early in the file and don't have  
wonderfully useful information in the traceback. It falls over in the  
Martel Parser giving the error

Martel.Parser.ParserPositionException: error parsing at or beyond  
character 191. As this genome is a bit large to attatch I've just  
included the +/- 10 lines around 191

The full file, should you want it is at:
<ftp://ftp.ensembl.org/pub/current_tetraodon/data/flatfiles/genbank/ 
Tetraodon_nigroviridis.0.dat.gz>

Does anyone have any ideas why this is failing, is it just the joy of  
tracking NCBI record formats and I need to start looking at the  
internals for a fix (or use something else) or?

Is it worth trying biopython cvs?

Mac OS X 10.3.5

Python 2.3.4, up to date biopython

thanks

Michael

-- 
Dr Michael Maibaum
Department of Biochemistry and Molecular Biology, UCL
email: maibaum at biochemistry.ucl.ac.uk