[BioPython] genbank parser breaking on huge genbank file

Scott T. Kelley kelleys@ucsu.colorado.edu
Wed, 15 Aug 2001 16:12:07 -0700


Hello biopythoneers,

I'm running Biopython on Windows 95 and I've had a lot of success using the
genbank and fasta parsers on some really big genbank files. However, I've
found a particular file that seems to break the code. This is a whole
microbial genome and things break down when I use the GenBank.Iterator

The genbank file is the Aeropyrum pernix genome I downloaded from:
ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Aeropyrum_pernix/BA000002.gb
k

(Not included in this e-mail for obvious reasons...;-)

The following code is part of the code I have been trying on this huge file
which works for other files:

----------------
from Bio import GenBank

def get_gene(genbank_file_name, gene):
        gb_handle = open(genbank_file_name, "r")
        feature_parser = GenBank.FeatureParser()
        iterator = GenBank.Iterator(gb_handle, feature_parser)

        while 1:
            cur_entry = iterator.next()

            if cur_entry is None:
              break
---------------

But when the code gets to the line "cur_entry=iterator.next()" I get this
(big) error:

Traceback (most recent call last):
  File "<pyshell#9>", line 1, in ?
    get_gene("Apy_genome", "APE0001")
  File "C:/Program Files/Python21/Splice/GenbankCOG.py", line 23, in get_COG
    cur_entry = iterator.next()
  File "C:\Program Files\Python21\Bio\GenBank\__init__.py", line 182, in
next
    return self._parser.parse(File.StringHandle(data))
  File "C:\Program Files\Python21\Bio\GenBank\__init__.py", line 260, in
parse
    self._scanner.feed(handle, self._consumer)
  File "C:\Program Files\Python21\Bio\GenBank\__init__.py", line 1108, in
feed
    self._parser.parseFile(handle)
  File "C:\Program Files\Python21\Martel\Parser.py", line 205, in parseFile
    self.parseString(fileobj.read())
  File "C:\Program Files\Python21\Martel\Parser.py", line 233, in
parseString
    self._err_handler.fatalError(result)
  File "C:\PROGRAM FILES\PYTHON21\lib\xml\sax\handler.py", line 38, in
fatalError
    raise exception
ParserPositionException: error parsing at or beyond character 42


I confess that I have no idea what is going on here or how to fix it. Is
this file to big? Is there something funny about it? How exactly is the code
breaking?

If anyone has any ideas on how to deal with this issue, I would be very
grateful for your help.

Thanks! -Scott

-------------------
Scott T. Kelley, Ph.D.
Campus Box 347
MCD Biology
University of Colorado
Boulder, CO 80309-0347
Phone: (303) 735-1808
Fax: (303) 492-7744
E-mail: Scott.Kelley@Colorado.edu