[Biopython] Is this a valid Genbank feature description or a Biopython bug?

Marc Saric marc.saric at gmx.de
Wed Apr 18 16:58:18 EDT 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

sorry for crossposting (this has also been published on stackoverflow
<http://stackoverflow.com/questions/10195198/is-this-a-valid-genbank-feature-description-or-a-biopython-bug>):


I stumbled upon a Genbank-formatted file (shown here as a minimal
dummy example), which contains a nested feature like this:

FEATURES             Location/Qualifiers
     xxxx_domain     complement(complement(1..145))

Such a feature crashes the current Biopython Genbank parser (1.59
release), but it apparently did not in former releases (e.g. 1.55).
Apparently the behaviour was already in 1.57.

- From the Biopython bugtracker, it seems that the old locationparser
code got removed in 1.56:

- From what I could deduce from the format description on
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt and
http://www.insdc.org/documents/feature_table.html#3.4.2 this is most
likely invalid.

Can someone comment on this. I.e. is this a glitch in Biopython or in
the format of the Genbank file?

A full demo file:

LOCUS       XXXXXXXXXXXXXX         240 bp    DNA     circular
17-JAN-2012
DEFINITION  xxxxxx.
KEYWORDS    xx.
SOURCE
  ORGANISM
FEATURES             Location/Qualifiers
     xxxx_domain     complement(complement(1..145))
                     /vntifkey="1"
                     /label=A label
                     /note="A note"
BASE COUNT       75 a        57 c        42 g        66 t
ORIGIN
        1 tttacaaaac gcattttcaa accttgggta ctaccccctt ttaaatatcc
gaatacacta
       61 ataaacgctc tttcctttta ggtaaacccg ccaatatata ctgatacaca
ctgatagttt
      121 aaactagatg cagtggccga ccatcagatc tagtaggaaa cagctatgac
catgattacg
      181 cattacttat ttaagatcaa ccgtaccagt ataccctgcc agcatgatgg
aaacctccct
//

A minimum demo program to show the error (assumes Biopython 1.59 and
Python 2.7 are installed and the above mentioned file is available as
"test.gb":

#!/usr/bin/env python
from Bio import SeqIO
s = SeqIO.read(open("test.gb")), "r"), "genbank")

This crashes with

    raise LocationParserError(location_line)
Bio.GenBank.LocationParserError: complement(1..145)



- -- 
Bye,
Marc Saric  http://www.marcsaric.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+PKuoACgkQvKxJUF29wRLPGwCfaGI1+FzRZluJpjkfYBVdUtVq
5HIAn0ar1c2FK0eGIlekRtaQwGgJUk4U
=oI7n
-----END PGP SIGNATURE-----


More information about the Biopython mailing list