[BioPython] Cannot parse/convert embl formatted files

Peter (BioPython List) biopython at maubp.freeserve.co.uk
Thu Aug 17 14:41:54 UTC 2006


I've added a comment to the bug too:

http://bugzilla.open-bio.org/show_bug.cgi?id=2076

Martin MOKREJŠ wrote:
> No, the missing closing quotes should be added. Or better to say,
> the parser should terminate previous feature when it reaches beginning
> of the next feature. I wish this is feasible.

Missing closing quotes is a tricky issue.  I have seen valid files with 
text like /word= inside a quoted entry.

 > I think the recipe in
> http://biopython.org/DIST/docs/cookbook/genbank_to_fasta.html chokes on those
> unterminated lines.

The FormatIO system itself is very fragile with "broken" input files. 
It also doesn't work very well with large files.  We (the BioPython 
developers) have been talking about replacing it in a future release.

> Please add the missing import line to the above document. I have cleaned up
> my Trash so you have to get it from biopython archives from the very first
> message I think. ;)

Found it, you pointed out that in addition to this line:

from Bio import formats

we also need:

from Bio.FormatIO import FormatIO

> Sorry for the confusion. It took me a while to re-create the broken files
> and figure out all the steps again.
> Martin

Thanks Martin.

Have you been in touch with the Italian group to ask them if they can 
include the closing quotes in the EMBL files?

Peter


More information about the Biopython mailing list