[BioPython] blastall does not flush buffers due to biopython buffering?
Martin MOKREJŠ
mmokrejs at ribosome.natur.cuni.cz
Mon May 12 06:10:07 EDT 2008
Michiel de Hoon wrote:
> Can you show an example script that causes the UndoHandle to block? Just to understand better what is going on.
Maybe it is related to it, but sometimes blast process is probably close to exit:
mmokrejs 3343 3329 0 10:37 pts/2 00:00:00 [blastall] <defunct>
But sometimes I see it is still running with all its arguments and I can attach to it
by strace(1). So there are two issues.
I have hacked NCBIStandalone.blastall() like this:
+ print "Executing %s" % " ".join([blastcmd] + params)
w, r, e = os.popen3(" ".join([blastcmd] + params))
w.close()
return File.UndoHandle(r), File.UndoHandle(e)
$ python test38a.py
Testing Bio.Blast.NCBIStandalone.blastall() functionality
Executing /usr/bin/blastall -p blastn -d /tmp/sequence_nucleic_acids_all.fa -i /tmp/blast_8zjEnKfa -m 0 -M NUC4.2 -S 1 -e 1 -W 4 -E 1 -G 2 -b 10 -v 10
BLASTN 2.2.18 [Mar-02-2008]
...
$ python test38b.py
Testing blasttest.blastall() functionality
Executing /usr/bin/blastall -p blastn -d /tmp/sequence_nucleic_acids_all.fa -i /tmp/blast_TR0YHefa -m 0 -M NUC.4.2 -S 1 -e 1000 -W 4 -E 1 -G 1 -b 9999999 -v 999
Fetching blast results
Traceback (most recent call last):
File "test38b.py", line 15, in <module>
print ''.join(_error_info.readlines())
File "/usr/lib/python2.5/site-packages/Bio/File.py", line 37, in readlines
lines = self._saved + self._handle.readlines(*args,**keywds)
KeyboardInterrupt
$ python test38b.py
Testing blasttest.blastall() functionality
Executing /usr/bin/blastall -p blastn -d /tmp/sequence_nucleic_acids_all.fa -i /tmp/blast_meaeD7fa -m 0 -M NUC.4.2 -S 1 -e 1 -W 4 -E 1 -G 2 -b 10 -v 10
Fetching blast results
BLASTN 2.2.18 [Mar-02-2008]
...
$ python test38b.py
Testing blasttest.blastall() functionality
Executing /usr/bin/blastall -p blastn -d /tmp/sequence_nucleic_acids_all.fa -i /tmp/blast_MRrSAWfa -m 0 -M NUC.4.2 -S 1 -e 1 -W 4 -E 1 -G 1 -b 9999999 -v 999
Fetching blast results
BLASTN 2.2.18 [Mar-02-2008]
...
$ python test38b.py
Testing blasttest.blastall() functionality
Executing /usr/bin/blastall -p blastn -d /tmp/sequence_nucleic_acids_all.fa -i /tmp/blast_VtMzeDfa -m 0 -M NUC.4.2 -S 1 -e 1000 -W 4 -E 1 -G 1 -b 9999999 -v 999
Fetching blast results
BLASTN 2.2.18 [Mar-02-2008]
...
So, it is not much reproducible. However, it is clear it has something
to do with the length of the output. As I have already said, I saw
by strace(1) many time that blastall did not write all its output.
I propose to ensure that the open3() or equivalent does not use buffered
outputs ("man perlopentut") and the UndoHandle avoided altogether.
$ cat test38a.py
#! /usr/bin/env python
from sys import path, argv
path.append('..')
from Bio.Blast import NCBIStandalone
print "Testing Bio.Blast.NCBIStandalone.blastall() functionality"
_we_got_some_results = 0
_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall', 'blastn', '/tmp/sequence_nucleic_acids_all.fa', '/tmp/blast_8zjEnKfa', matrix='NUC4.2', wordsize=4, gap_open=2, gap_extend=1, strands=1, alignments=9999999, descriptions=999, expectation=1, align_view=0)
while 1:
_line = _blast_out.readline()
if _line:
print _line,
_we_got_some_results = 1
else:
break
if _we_got_some_results:
while 1:
_line = _error_info.readline()
if _line:
print _line,
else:
break
$ cat test38b.py
#! /usr/bin/env python
import os
from sys import path, argv
path.append('..')
print "Testing blasttest.blastall() functionality"
import blasttest
_blast_out, _error_info, _blast_file = blasttest.blastall('/tmp/sequence_nucleic_acids_all.fa', 'CCGCCGTCGCGGGCAGTGTCTAGCCAGGCCTTGACAAGCTA', '/tmp', 'sequence')
# we have to read first from stdout of blast otherwise reading from stderr blocks
print "Fetching blast results"
print ''.join(_blast_out.readlines())
print "Fetching blast error messages"
print ''.join(_error_info.readlines())
os.remove(_blast_file)
$ cat blasttest.py
#! /usr/bin/env python
import os
import file_io
import tempfile
from Bio.Blast import NCBIStandalone
def blastall(blast_db, blast_query_string, tmpdir, mode, align_view=0, matrix='NUC.4.2'):
_fd, _blast_file = tempfile.mkstemp(suffix='fa', prefix='blast_', dir=tmpdir, text=False)
if blast_query_string[0] != '>':
os.write(_fd, '>your_query\n' + blast_query_string.replace('\n','') + '\n')
else:
os.write(_fd, blast_query_string + '\n')
os.close(_fd)
del(_fd)
if not file_io.file_size(_blast_file):
raise ValueError, "No input sequence provided by user"
return None, None
if mode == 'sequence':
_wordsize = 4
_gap_open = 1
_gap_extend = 1
_strands = 1
_alignments = 9999999 # -b
_descriptions = 999 # -v
_expectation = 1000 # -e 1e-20
_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall', 'blastn', blast_db, _blast_file, matrix=matrix, wordsize=_wordsize, gap_open=_gap_open, gap_extend=_gap_extend, strands=_strands, alignments=_alignments, descriptions=_descriptions, expectation=_expectation, align_view=align_view)
return _blast_out, _error_info, _blast_file
Martin
More information about the BioPython
mailing list