[BioPython] parsing error with GenBank.RecordParser

Peter biopython at maubp.freeserve.co.uk
Fri Jan 6 17:44:34 EST 2006


Hans Meier wrote:
>  Hi,
>  
>  parsing of NC_000913.gbk does not work.
>  
>  Greets, Harald

Sorry I didn't reply earlier, I was away for the New Year...

 From the trackback you provided, I would guess that the old GenBank 
parser (included with BioPython 1.41) didn't like the double quotes in 
that note:

/note="2'-(5"-phosphoribosyl)-3'-dephospho-CoA...

Interestingly enough, in the most recent version of NC_000913.gbk dated 
Dec 2005 (check the first line, starting LOCUS), the NCBI have switched 
the double quotes to single quotes in the note (gene citX):

/note="2'-(5'-phosphoribosyl)-3'-dephospho-CoA...

If you download this revised NC_000913.gbk the problem should go away 
(but note that as Escherichia coli genbank file is 11 MB you might be 
better off updating the GenBank parser).

The new GenBank parser (available in CVS now) should cope with either 
version of the file (and should use less memory, and be a lot faster too).

To try this, you just need to replace the file 
/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py with the latest 
version (but make a backup of the old one just in case).

Peter



More information about the BioPython mailing list