[BioPython] Help a clueless newbie with BLAST queries?

bp dubious@2xtreme.net (bp)
Fri, 1 Jun 2001 12:37:36 -0700


I got tired of manually sending my queries through the webpage...
The biopython project gives me the perfect excuse to finally break down and
learn Python, so I'm now trying to whip up a script to send a series of
fasta sequences (piped in from stdin) to ncbi and get the results for me.

I'm running into two problems though.  Forgive me for posting the source 
code, but there's not much to it - I'll append it to this message.

The two problems I'm having are:

1)The parser (b_parser) craps out with an "unexpected end of stream"

2)I can't seem to get the optional settings to work (I'm trying to limit
the number of 'hits' returned from the BLAST search).  Setting alignment=10
and descriptions=10 in the NCBIWWW.blast instantiation seems to be ignored,
and trying to set an "expect" value causes the script to die with "no PID 
found".

Getting advice from someone that actually knows what they're doing seems
to be my only hope now :-)  I'm using the 1.00a release of biopython.

Thanks!

Here's what it looks like when I run it:
-------------------------------------------------------------------
bash-2.04$ cat sequences.seq | python blasttest.py Opened file for reading  
Defined f_iterator  
read first f_record entering While loop  
Getting BLAST results  
Saving results to my_blast.out  
copying the file handle to save data with  
writing from handle copy to output file  
closing my_blast.out  
creating b_parser  
reading b_record from b_parser analysis of b_results  
Traceback (most recent call last):   File "blasttest.py", line 29, in ?     
b_record = b_parser.parse(b_results)   File 
"/usr/lib/python2.0/site-packages/Bio/Blast/NCBIWWW.py", line 45, in parse    
 self._scanner.feed(handle, self._consumer)   File 
"/usr/lib/python2.0/site-packages/Bio/Blast/NCBIWWW.py", line 94, in feed     
contains='BLAST')   File 
"/usr/lib/python2.0/site-packages/Bio/ParserSupport.py", line 187, in 
read_and_call_until     line = safe_readline(uhandle)   File 
"/usr/lib/python2.0/site-packages/Bio/ParserSupport.py", line 263, in 
safe_readline     raise SyntaxError, "Unexpected end of stream." SyntaxError: 
Unexpected end of stream. 
bash-2.04$
-------------------------------------------------------------------

Here's the script I've got so far
(borrowing heavily from the tutorial):
--------------------------------------------------------------------
from Bio import Fasta
from Bio.Blast import NCBIWWW
import copy
import sys

#file_for_blast = open('sequences.seq', 'r')
file_for_blast=sys.stdin
print 'Opened file for reading\n'
f_iterator = Fasta.Iterator(file_for_blast)
print 'Defined f_iterator\n'
f_record = f_iterator.next()
print 'read first f_record'
while f_record is not None: 
	print 'entering While loop\n'
	print 'Getting BLAST results\n'
	b_results = NCBIWWW.blast('blastn', 'nr', f_record, 
descriptions='10',alignments='10')
	print 'Saving results to my_blast.out\n'
	save_file = open('my_blast.out', 'w')
	#copy the handle so we can write it out and still have something for parsing
	print 'copying the file handle to save data with\n'
	save_handle = copy.deepcopy(b_results)
	print 'writing from handle copy to output file\n'
	save_file.write(save_handle.read())
	print 'closing my_blast.out\n'
	save_file.close()
	print 'creating b_parser\n'
	b_parser = NCBIWWW.BlastParser()
	print 'reading b_record from b_parser analysis of b_results\n'
	b_record = b_parser.parse(b_results)
	E_VALUE_THRESH = 0.0
	print 'threshold set to '+E_VALUE_THRESH+' and beginning iteration through 
alignments\n'
	for alignment in b_record.alignments:
		for hsp in alignment.hsps:
	    		if hsp.expect < E_VALUE_THRESH:
				print '****Alignment****'
	         		print 'sequence:', alignment.title
	         		print 'length:', alignment.length
	         		print 'e value:', hsp.expect
	         		print hsp.query[0:75] + '...'
	         		print hsp.match[0:75] + '...'
	         		print hsp.sbjct[0:75] + '...'
	print 'reading next record from f_iterator'		
	f_record = f_iterator.next()
		
else:
	print 'no more f_records.  Done.'
--------------------------------------------------------------------------------------
-- 
"Given the pace of technology, I propose we leave math to the machines
and go play outside." - Calvin