[BioPython] Parstin a remote Blast output
Raul Guerra
colochera at gmail.com
Tue May 20 23:28:27 UTC 2008
Thank you to everyone who replied my last post. I am sorry to bother you
again with a question. Thank you in advance for your time.
I am trying to parse the output from:
result_handle = NCBIWWW.qblast("blastp", "nr", fastaStr,
entrez_query='"Arabidopsis thaliana" [ORGN]')
where fastaStr is a string in the fasta format. When I run the following
code:
b_parser = NCBIWWW.BlastParser()
blast_records = b_parser.parse(result_handle)
I get the following error:
ValueError: Unexpected end of stream.
I think that result_handle is a cStringIO.StringI data structure. I thought
the code for some reason was trying to invoke the readline() method on the
cStringIO.StringI data structure and maybe that was what is causing the
error. However, I already saved result_handle.read() in a file and opened it
with the open() function, so that I would get a file object with a
readline() function. But the code still did not work.
I tried to follow the logic of the program and I found that NCBIWWW.qblast()
is outputing a XML file, and for some reason NCBIWWW.BlastParser() is
expecting a HTML file. That is my guess of what is going wrong. So what I
did was to use the parser in NCBIXML. So I ran the following
result_handle = NCBIWWW.qblast("blastp", "nr", fastaStr,
entrez_query='"Arabidopsis thaliana" [ORGN]')
blast_records = NCBIXML.parse(result_handle)
and it works fine (at least I do not get errors), but I have no idea on what
type of object blast_records is. I tried the following
next = blast_records.next()
and got the following error:
Traceback (most recent call last):
File "/home/rguerra/workspace/Summer2008/src/testingBioPython.py", line
28, in <module>
next = blast_records.next()
File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 574, in
parse
expat_parser.Parse(text, False)
File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 98, in
endElement
eval("self.%s()" % method)
File "<string>", line 1, in <module>
File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 214, in
_end_BlastOutput_version
self._header.date = self._value.split()[2][1:-1]
IndexError: list index out of range
I have not been able to understand what is going on here. I just want to
parse the results I get from:
result_handle = NCBIWWW.qblast("blastp", "nr", fastaStr,
entrez_query='"Arabidopsis thaliana" [ORGN]')
Any ideas?
Raul Guerra
More information about the Biopython
mailing list