[BioPython] Why would this GenBank file choke the GB parser?

Chris Lasher chris.lasher at gmail.com
Thu Aug 25 18:03:24 EDT 2005


Hello,

  I have a GenBank file, accession AY499671.gb, and 21 like it that I
would like to process through BioPython (I am using BioPython 1.40b
with Windows), but I am encountering trouble. It seems that the
GenBank parser is choking on something in the files themselves, but I
could really use help determining what this would be, and in
determining how to fix it. The error seems to be raised by the Martel
Parser, but exactly what is causing it to raise the error is beyond my
lack of knowledge and inexperience.

  I obtained the files from GenBank via the NCBI Entrez website pages,
i.e., http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=41080113
. From a page like this one, I selected "File" in the dialog box
labeled "Send to", and saved the file. I also tried obtaining the
files via BioEdit and saving those, but the parser still had
difficulty with those, as well.

  I am attaching my script "gbtoseq.py" that I'm trying to process my
GB files with. I have had success with this script from sequences
obtained from GenBank in the manner described above and can recreate
this success, and I am including one of those successful sequences,
AFU75647. I am also attaching the error output when it chokes on these
22 most recent sequences I've obtained.

  I sincerely appreciate any help anyone has to offer.

Thanks very much in advance,
Chris Lasher
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AY499671.gb
Type: pubmed/text
Size: 2663 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050825/78c4dd26/AY499671-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AFU75647.gb
Type: pubmed/text
Size: 3096 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050825/78c4dd26/AFU75647-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gbtoseq.py
Type: text/x-python
Size: 1134 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050825/78c4dd26/gbtoseq-0001.py
-------------- next part --------------
C:\Documents and Settings\chris\My Documents\scripts\pythonscripts\gbtoseq>gbtos
eq.py
        Now on AFU75647.gb
        Writing to AFU75647.seq
        Now on AY499671.gb
Traceback (most recent call last):
  File "C:\Documents and Settings\chris\My Documents\scripts\pythonscripts\gbtos
eq\gbtoseq.py", line 30, in ?
    parserecord = gbiterator.next()
  File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 129, in nex
t
    return self._parser.parse(File.StringHandle(data))
  File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 219, in par
se
    self._scanner.feed(handle, self._consumer)
  File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 1259, in fe
ed
    self._parser.parseFile(handle)
  File "C:\Python24\Lib\site-packages\Martel\Parser.py", line 328, in parseFile
    self.parseString(fileobj.read())
  File "C:\Python24\Lib\site-packages\Martel\Parser.py", line 356, in parseStrin
g
    self._err_handler.fatalError(result)
  File "C:\Python24\lib\xml\sax\handler.py", line 38, in fatalError
    raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond character 64


More information about the BioPython mailing list