[BioPython] Fw: genbank parser breaking on huge genbank file

Scott T. Kelley kelleys@ucsu.colorado.edu
Fri, 17 Aug 2001 17:45:19 -0700


I'll try sending this again since I never received a biopython digest
containing this message. -Scott
> Hello biopythoneers,
>
> I'm running Biopython on Windows 95 and I've had a lot of success using
the
> genbank and fasta parsers on some really big genbank files. However, I've
> found a particular file that seems to break the code. This is a whole
> microbial genome and things break down when I use the GenBank.Iterator
>
> The genbank file is the Aeropyrum pernix genome I downloaded from:
>
ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Aeropyrum_pernix/BA000002.gb
> k
>
> (Not included in this e-mail for obvious reasons...;-)
>
> The following code is part of the code I have been trying on this huge
file
> which works for other files:
>
> ----------------
> from Bio import GenBank
>
> def get_gene(genbank_file_name, gene):
>         gb_handle = open(genbank_file_name, "r")
>         feature_parser = GenBank.FeatureParser()
>         iterator = GenBank.Iterator(gb_handle, feature_parser)
>
>         while 1:
>             cur_entry = iterator.next()
>
>             if cur_entry is None:
>               break
> ---------------
>
> But when the code gets to the line "cur_entry=iterator.next()" I get this
> (big) error:
>
> Traceback (most recent call last):
>   File "<pyshell#9>", line 1, in ?
>     get_gene("Apy_genome", "APE0001")
>   File "C:/Program Files/Python21/Splice/GenbankCOG.py", line 23, in
get_COG
>     cur_entry = iterator.next()
>   File "C:\Program Files\Python21\Bio\GenBank\__init__.py", line 182, in
> next
>     return self._parser.parse(File.StringHandle(data))
>   File "C:\Program Files\Python21\Bio\GenBank\__init__.py", line 260, in
> parse
>     self._scanner.feed(handle, self._consumer)
>   File "C:\Program Files\Python21\Bio\GenBank\__init__.py", line 1108, in
> feed
>     self._parser.parseFile(handle)
>   File "C:\Program Files\Python21\Martel\Parser.py", line 205, in
parseFile
>     self.parseString(fileobj.read())
>   File "C:\Program Files\Python21\Martel\Parser.py", line 233, in
> parseString
>     self._err_handler.fatalError(result)
>   File "C:\PROGRAM FILES\PYTHON21\lib\xml\sax\handler.py", line 38, in
> fatalError
>     raise exception
> ParserPositionException: error parsing at or beyond character 42
>
>
> I confess that I have no idea what is going on here or how to fix it. Is
> this file to big? Is there something funny about it? How exactly is the
code
> breaking?
>
> If anyone has any ideas on how to deal with this issue, I would be very
> grateful for your help.
>
> Thanks! -Scott
>
> -------------------
> Scott T. Kelley, Ph.D.
> Campus Box 347
> MCD Biology
> University of Colorado
> Boulder, CO 80309-0347
> Phone: (303) 735-1808
> Fax: (303) 492-7744
> E-mail: Scott.Kelley@Colorado.edu
>