[BioPython] Cannot parse ApE plasmid editor GenBank file

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Wed Jun 27 11:22:53 EDT 2007


Hi Peter,

Peter wrote:
> Martin MOKREJŠ wrote:
>> OK, I have found the spacing problem with my LOCUS lines still to 
>> persist,
>> and after some scripting I got the lines fixed. 
> 
> Excellent. I've been away for a few days and haven't had a chance to 
> look at this yet.

thanks! No problem, I was busy as well. ;-)

> 
>> The file starts with:
>>
>> LOCUS       pBL-RLuc-GBB+3-III      5391 bp    ds-DNA  circular SYN 
>> 14-JUN-2007
>> DEFINITION  .
>> ACCESSION   .
>> VERSION     .
>> SOURCE      .
>>    ORGANISM  .
>> COMMENT     COMMENT     ApEinfo:methylated:0
>> FEATURES             Location/Qualifiers
>>
> 
> The ORGANISM line looks wrong (three leading spaces rather than two, so 
> the dot is pushed one column to the right).
> 
> There is a blank COMMENT line which is also odd.
> 
> Some of this may just be an email formatting issue, but I would expect 
> this instead:
> 
> ...
> DEFINITION  .
> ACCESSION   .
> VERSION     .
> SOURCE      .
>   ORGANISM  .
> COMMENT     ApEinfo:methylated:0
> FEATURES             Location/Qualifiers
> ...

OK, I have removed the COMMENT lines altogether and have fixed the ORGANISM
line. Still, I get:

python generate_image_from_genbank.py 
Traceback (most recent call last):
  File "generate_image_from_genbank.py", line 7, in ?
    genbank_entry = parser.parse(fhandle)
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 187, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 361, in feed
    self._feed_header_lines(consumer, self.parse_header())
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 978, in _feed_header_lines
    consumer.taxonomy(data.strip())
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 419, in taxonomy
    self.data.annotations['taxonomy'] = self._split_taxonomy(content)
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 250, in _split_taxonomy
    if taxonomy_string[-1] == '.':
IndexError: string index out of range


LOCUS       pBL-RLuc-GBB+3-III      5391 bp    ds-DNA  circular SYN 14-JUN-2007
DEFINITION  .
ACCESSION   .
VERSION     .
SOURCE      .
  ORGANISM  .

Thanks for your help,
M.




More information about the BioPython mailing list