[Biopython] Problem parsing embl files

Jaime Tovar jmtc21 at bath.ac.uk
Thu May 30 19:48:59 UTC 2013


Hi all,

Is the first time I try to parse embl files with biopython. I'm trying 
to get the gene ids and coordinates for start/end of each gene.

I thought it will be straight forward like with other annotation files, 
so I did a small script to test it.

from Bio import SeqIO
if __name__ == '__main__':
     handle = open("sctg_0.embl", "r")
     records = SeqIO.parse(handle, "embl")
     for record in records :
         print(record)

But when running the script I get an error which may suggest the embl 
files have an issue

ValueError: Premature end of features table, marker '//' found

I checked the source code of the parser and seems the embl file has 
problems, but when I checked embl file format seems they are ok. I have 
a few thousand files formatted in the same way. So can't think about 
other way to deal with the problem but to parse them.

The annotation files have only annotation info, no sequences. Here I 
uploaded an example.

http://depositfiles.com/files/481uob95e

I'm using python 2.7.4 and biopython 1.61 on a win x64 computer.

Any advice and suggestion will be greatly appreciated.

Jaime.





More information about the Biopython mailing list