[BioPython] Cannot parse ApE plasmid editor GenBank file

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Tue Jun 5 15:24:05 UTC 2007



Peter wrote:
> Martin MOKREJŠ wrote:
>> Hi,
>>  I am trying to parse a GenBank file created by ApE plasmid editor 
>> (see Google for details) with biopython-1.43 and I get:
> 
> ...
> 
>> AssertionError: Did not recognise the LOCUS line layout:
>> LOCUS               6499 bp ds-DNA     linear       02-AUG-2006
>>
>> Is the number of spaces wrong?
> 
> Yes - fields don't line up with either of the GenBank variants Biopython 
> expects.  I suspect their files doesn't follow the current NCBI standard 
> for the locus line...
> 
> Could you make a set of different files (for different sequences) and 
> check if the spacing changes or is preserved?

OK, two types of errors, the first case is caused by files generated by VectorNTI,
the second type of error is caused by ApE editor-produced files:

>>> fhandle = open('/mnt/smartmedia/utrophinA/p-cmvbGalCAT.gb','r')
>>> genbank_entry = parser.parse(fhandle)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 187, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 361, in feed
    self._feed_header_lines(consumer, self.parse_header())
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 967, in _feed_header_lines
    getattr(consumer, consumer_dict[line_type])(data)
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 409, in source
    if content[-1] == '.':
IndexError: string index out of range
>>> 


>>> fhandle = open('/mnt/smartmedia/nrf/ok/PBCRLucPFLuc.gb','r')
>>> genbank_entry = parser.parse(fhandle)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 187, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 360, in feed
    self._feed_first_line(consumer, self.line)
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 835, in _feed_first_line
    assert False, \
AssertionError: Did not recognise the LOCUS line layout:
LOCUS               6988 bp ds-DNA     linear       20-DEC-2006

>>> 


I would appreciate if you could tell me then what was exactly wrong with the generated
files by ApE editor (author Cc:ed). 

Hope this helps,
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: genbank-formatted-testcases.zip
Type: application/zip
Size: 32571 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20070605/d942bf78/attachment-0002.zip>


More information about the Biopython mailing list