[BioPython] GenBank parser stops at keyword-like entries in /note block
Wolfgang Schueler
wolfgang@proceryon.at
Wed, 10 Oct 2001 16:07:07 +0200
Hi BioPy(th)oneers,
on processing GenBank the NCBI nucleotide entry: AL138972 GI:6946668
parsed with the GenBank Parser like this:
feature_parser = GenBank.FeatureParser()
gb_iterator = GenBank.Iterator(gb_file, feature_parser)
while 1:
cur_record = gb_iterator.next()
...
results in following error (while other entries work fine):
Traceback (most recent call last):
File "/home/people/wolfgang/gag/EXT_DATA/bio_wsDB.py", line 683, in ?
present(extract_GEN_summary_by_keywords(sys.argv[2],sys.argv[3:]))
File "/home/people/wolfgang/gag/EXT_DATA/bio_wsDB.py", line 565, in
extract_GEN_summary_by_keywords
cur_record = gb_iterator.next()
File "/home/people/wolfgang/lib/python/Bio/GenBank/__init__.py", line
182, in next
return self._parser.parse(File.StringHandle(data))
File "/home/people/wolfgang/lib/python/Bio/GenBank/__init__.py", line
260, in parse
self._scanner.feed(handle, self._consumer)
File "/home/people/wolfgang/lib/python/Bio/GenBank/__init__.py", line
1108, in feed
self._parser.parseFile(handle)
File "/home/people/wolfgang/lib/python/Martel/Parser.py", line 226, in
parseFile
self.parseString(fileobj.read())
File "/home/people/wolfgang/lib/python/Martel/Parser.py", line 254, in
parseString
self._err_handler.fatalError(result)
File "/var/tmp/python-root//usr/lib/python2.0/xml/sax/handler.py",
line 38, in fatalError
Martel.Parser.ParserPositionException: error parsing at or beyond
character 3217
Examination of the record shows that the double quoted multiline data
block of /note
contains keyword-like entries like /prediction and /match which are
standing at the
beginning of some lines within the /note block.
In this position the parser reads them as keywords and stops, whereas if
you
shift these lines e.g. one position to the right by adding a blank, the
keyword-like
entries appear as text in the /note.
I couldn't figure out how to fix that in the code (these are my first
steps with Python and
Biopython) so I would be grateful for advice.
Wolfgang
attached below: part of the FEATURES part of the mentioned entry
...
FEATURES Location/Qualifiers
source 1..154329
/organism="Drosophila melanogaster"
/db_xref="taxon:7227"
/clone="BAC BACR25B3"
gene complement(22148..27773)
/gene="EG:BACR25B3.11"
CDS
complement(join(22148..22299,22375..22791,22860..23560,
23630..24555,24616..24888,25024..25178,26677..27009,
27623..27773))
/gene="EG:BACR25B3.11"
/note="/prediction=(method:''genefinder'',
=================== version:''084'', score:''105.71'');
parsing stops here=> /prediction=(method:''genscan'', version:''1.0'');
=================== /match=(desc:''BASEMENT MEMBRANE-SPECIFIC HEPARAN
SULFATE
PROTEOGLYCAN CORE PROTEIN PRECURSOR (HSPG)
(PERLECAN)
(PLC)'', species:''Homo sapiens (Human)'',
ranges:(query:24292..24549,
target:SWISS-PROT::P98160:3713..3628,
score:''201.00''),
(query:24016..24291,
target:SWISS-PROT::P98160:3815..3724,
score:''139.00''), (query:23857..24006,
...
/protein_id="CAB72284.1"
/db_xref="GI:6946669"
/translation="MACNCNQSMIYQSNERRDYNCPGAPQYPYNRFKGGVSLKDTPCM
...