[Biopython-dev] bug: swissprot record parser errors
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Mon Aug 19 14:08:54 EDT 2002
On Mon, 19 Aug 2002, louis_coilliot wrote:
> This little program:
>
>
>
> #!/usr/bin/env python
> # reading a SwissProt entry from a file
>
> from Bio.SwissProt import SProt
> from sys import *
>
> try:
> handle = open(argv[1])
> sp = SProt.Iterator(handle, SProt.RecordParser())
> record = sp.next()
> print record.entry_name
> print
> except:
> print "error"
>
>
>
> doesn't work with some records, for example:
> http://www.expasy.ch/cgi-bin/get-sprot-raw.pl?O75398
> http://www.expasy.ch/cgi-bin/get-sprot-raw.pl?P41964
>
> I don't know why. Any idea ?
I see the following error message when trying your test program on
075398, using BioPython 1.0:
###
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/tmp/python-26514l6q", line 10, in getRecordName
File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 168, in next
return self._parser.parse(File.StringHandle(data))
File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 289, in parse
self._scanner.feed(handle, self._consumer)
File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 332, in feed
self._scan_record(uhandle, consumer)
File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 337, in _scan_record
fn(self, uhandle, consumer)
File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 411, in _scan_reference
self._scan_ra(uhandle, consumer)
File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 433, in _scan_ra
one_or_more=1)
File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 359, in _scan_line
read_and_call(uhandle, event_fn, start=line_type)
File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/ParserSupport.py", line
326, in read_and_call
raise SyntaxError, errmsg
SyntaxError: Line does not start with 'RA':
RP LYS-304.
###
Record 075398 has two RP lines in it's first reference:
###
RN [1]
RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 3), AND MUTAGENESIS OF ARG-302 AND
RP LYS-304.
###
I'm staring at SProt.py's parser now to see how it handles consecutive
RP's. Hmmm.... ah!
This problem has been fixed in CVS already. Before, the parser tried
scanning RP's like this:
### Biopython 1.0,
def _scan_rp(self, uhandle, consumer):
self._scan_line('RP', uhandle, consumer.reference_position,
exactly_one=1)
###
but in CVS, this has been corrected to:
### Biopython CVS
def _scan_rp(self, uhandle, consumer):
self._scan_line('RP', uhandle, consumer.reference_position,
one_or_more=1)
###
to account for multiple RP lines. Try checking BioPython out from CVS:
your program should work then.
Good luck to you!
More information about the Biopython-dev
mailing list