[BioPython] Parstin a remote Blast output

Raul Guerra colochera at gmail.com
Tue May 20 23:28:27 UTC 2008


Thank you to everyone who replied my last post. I am sorry to bother you
again with a question. Thank you in advance for your time.

I am trying to parse the output from:

result_handle = NCBIWWW.qblast("blastp", "nr", fastaStr,
entrez_query='"Arabidopsis thaliana" [ORGN]')

where fastaStr is a string in the fasta format. When I run the following
code:

b_parser = NCBIWWW.BlastParser()
blast_records = b_parser.parse(result_handle)

I get the following error:

ValueError: Unexpected end of stream.

I think that result_handle is a cStringIO.StringI data structure. I thought
the code for some reason was trying to invoke the readline() method on the
cStringIO.StringI data structure and maybe  that was what is causing the
error. However, I already saved result_handle.read() in a file and opened it
with the open() function, so that I would get a file object with a
readline() function. But the code still did not work.

I tried to follow the logic of the program and I found that NCBIWWW.qblast()
is outputing a XML file, and for some reason NCBIWWW.BlastParser() is
expecting a HTML file. That is my guess of what is going wrong. So what I
did was to use the parser in NCBIXML. So I ran the following

    result_handle = NCBIWWW.qblast("blastp", "nr", fastaStr,
entrez_query='"Arabidopsis thaliana" [ORGN]')

    blast_records = NCBIXML.parse(result_handle)

and it works fine (at least I do not get errors), but I have no idea on what
type of object blast_records is. I tried the following

next = blast_records.next()

and got the following error:

Traceback (most recent call last):
  File "/home/rguerra/workspace/Summer2008/src/testingBioPython.py", line
28, in <module>
    next = blast_records.next()
  File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 574, in
parse
    expat_parser.Parse(text, False)
  File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 98, in
endElement
    eval("self.%s()" % method)
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 214, in
_end_BlastOutput_version
    self._header.date = self._value.split()[2][1:-1]
IndexError: list index out of range


I have not been able to understand what is going on here. I just want to
parse the results I get from:


result_handle = NCBIWWW.qblast("blastp", "nr", fastaStr,
entrez_query='"Arabidopsis thaliana" [ORGN]')

Any ideas?

Raul Guerra



More information about the Biopython mailing list