[BioPython] Cannot parse/convert embl formatted files

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Wed Aug 9 11:22:51 UTC 2006


Hi,
  I am following the manual at
http://biopython.org/DIST/docs/cookbook/genbank_to_fasta.html
to convert EMBL-formatted file to Genbank and I see that in
the beginning of the document after the line:

from Bio import formats

should be one more line

from Bio.FormatIO import FormatIO




Still, conversion from embl format does not work:

#!/usr/bin/python

input_handle = open('wgs_baad_pro.dat') # from ftp://ftp.embl.de/pub/databases/embl/release/
output_handle = open('wgs_baad_pro.fa', "w")
from Bio import formats
from Bio.FormatIO import FormatIO
formatter = FormatIO("SeqRecord", formats["embl"], formats["fasta"])
formatter.convert(input_handle, output_handle)


Traceback (most recent call last):
  File "convertembl.py", line 8, in ?
    formatter.convert(input_handle, output_handle)
  File "/usr/lib/python2.4/site-packages/Bio/FormatIO.py", line 146, in convert
    raise TypeError("Could not not determine file type")
TypeError: Could not not determine file type



It seems this is already known since
http://lists.open-bio.org/pipermail/biopython-dev/2006-April/002343.html
I use biopython-1.42 on linux so was there no fix included in teh release?



In principle, I do need to convert the file, what I really need is
a parser from EMBL formatted data from
ftp://bighost.ba.itb.cnr.it/pub/Embnet/Database/UTR/data/
to parse out record with some feature. As I do not see an EMBL parser
in the Bio package I believe it is not available, right?


It seems there is a parser for EMBL format also outside biopython:
http://www.embl-heidelberg.de/~chenna/PySAT/
has anybody used that?

Thanks for help,
martin
-- 
Dr. Martin Mokrejs
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs



More information about the Biopython mailing list