[Biopython] Error parsing EMBL file

Nick Semenkovich semenko at alum.mit.edu
Mon Sep 17 17:01:00 UTC 2012


I'm trying to extract the peptide sequences from a large collection of
EMBL-formatted files (all phage & virus data from EBI).

EBI provides these as large, concatenated EMBL files, so I've been
using SeqIO.parse to read & then write the 'translation' key from
seq_feature.qualifiers.


Unfortunately, it looks like the parser dies on one input file:

http://www.ebi.ac.uk/ena/data/view/BK000583&display=txt&expanded=true

Traceback (most recent call last):
  File "gbk_to_faa.py", line 7, in <module>
    for seq_record in SeqIO.parse(input_handle, "embl") :
  File "/usr/lib/pymodules/python2.7/Bio/SeqIO/__init__.py", line 541, in parse
    for r in i:
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line
440, in parse_records
    record = self.parse(handle, do_features)
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 423, in parse
    if self.feed(handle, consumer, do_features):
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 391, in feed
    self._feed_header_lines(consumer, self.parse_header())
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line
692, in _feed_header_lines
    consumer.reference_bases("(bases %s)" % "; ".join(parts))
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/__init__.py", line
740, in reference_bases
    locations = self._split_reference_locations(ref_base_info)
  File "/usr/lib/pymodules/python2.7/Bio/GenBank/__init__.py", line
777, in _split_reference_locations
    start, end = base_info.split('to')
ValueError: need more than 1 value to unpack


* I might dig into this a bit more to patch, but does anyone more
familiar with EMBL files know what's going on?

* Also, is there are more straightforward (or even non-BioPython way)
to go from EMBL->FAA?


Best,
Nick

-- 
Nick Semenkovich
Laboratory of Dr. Jeffrey I. Gordon
Medical Scientist Training Program
School of Medicine
Washington University in St. Louis
314.362.3963 (Lab)
http://web.mit.edu/semenko/



More information about the Biopython mailing list