[BioPython] Problems with the standalone blast parser.
humberto@hpcf.upr.edu
humberto@hpcf.upr.edu
Thu, 04 Apr 2002 15:25:24 -0400
--==_Exmh_84453588P
Content-Type: text/plain; charset=us-ascii
I've got a problem with the Bio.Blast.NCBIStandalone module in
biopython-1.00a4 on a redhat 6.2 system. It was crashing partway through a
script that runs a platefull of fasta formatted sequences through blast, I've
tracked down the error to a single sequence. What I would like is for my
script to continue, perhaps printing an error message, instead of crashing.
The following nucleotide sequence:
>A6BFLYA09
ACCCTCCCCCCCCCCTCCTTGCATACTGATTATCCCCACCTGTTACCCTC
CCCCCACACCCTCCCCCCCATCCCACCNCACACCCCACACCNCTNCTCCC
TCCACCACCCCCACNCCCCCCACCCACCCCCCCACCCCCCCACCCN
when run through this command line:
/usr/local/blast/blastall -p blastx -d /usr/local/blast/nr -i badsequence -o
badblast.out
produces these warnings and errors:
[blastall] WARNING: [000.000] A6BFLYA09: SetUpBlastSearch failed.
[blastall] ERROR: [000.000] A6BFLYA09: BLASTSetUpSearch: Unable to calculate
Karlin-Altschul params, check query sequence
[blastall] ERROR: [000.000] A6BFLYA09: BLASTSetUpSearch: Unable to calculate
Karlin-Altschul params, check query sequence
NCBI has this to say about the error message:
Q: Why do I get the message "ERROR: BLASTSetUpSearch: Unable to calculate
Karlin-Altschul params, check query sequence" ?
This will happen if your entire query sequence has been masked by low
complexity filtering. You
will need to turn filtering off to get hits. For further information on
filtering, please read the
sections of the BLAST FAQs on Q: What is low-complexity sequence? and
also Q: After
running a search why do I see a string of "X"s (or "N"s) in my query
sequence that I did not put
there?
But turning off the filtering gives me a bunch of bad quality hits.
the badblast.out file is produced, and is missing a section at the end:
--
BLASTX 2.2.2 [Dec-14-2001]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= A6BFLYA09
(146 letters)
Database: /usr/local/blast/nr
855,904 sequences; 269,501,083 total letters
Searchingdone
***** No hits found ******
Database: /usr/local/blast/nr
Posted date: Jan 22, 2002 7:32 AM
Number of letters in database: 269,501,083
Number of sequences in database: 855,904
Lambda K H
0.311 0.135 0.440
Gapped
Lambda K H
0.267 0.0410 0.140
--
Running this blast output through a simple biopython script crashes with a
parse error:
python2.2 justparse badblast.out
Traceback (most recent call last):
File "justparse", line 42, in ?
justparse(my_blast_file)
File "justparse", line 25, in justparse
b_record = b_iterator.next()
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 1353, in next
return self._parser.parse(File.StringHandle(data))
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 515, in parse
self._scanner.feed(handle, self._consumer)
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 85, in feed
self._scan_database_report(uhandle, consumer)
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 408, in _scan_database_report
read_and_call_while(uhandle, consumer.noevent, blank=1)
File "/usr/local/lib/python2.2/site-packages/Bio/ParserSupport.py", line
345, in read_and_call_while
line = safe_readline(uhandle)
File "/usr/local/lib/python2.2/site-packages/Bio/ParserSupport.py", line
442, in safe_readline
raise SyntaxError, "Unexpected end of stream."
SyntaxError: Unexpected end of stream.
This is the script:
#!/usr/bin/env python2.2
# standard library
import sys
# biopython
from Bio.Blast import NCBIStandalone
my_blast_file = sys.argv[1]
def justparse(blastfile):
blast_out = open(blastfile, "r")
b_parser = NCBIStandalone.BlastParser()
b_iterator = NCBIStandalone.Iterator(blast_out, b_parser)
while 1:
b_record = b_iterator.next()
if b_record is None:
break
if len(b_record.alignments) == 0:
print "No hits found"
else:
print "Hits"
# Main
justparse(my_blast_file)
--
Is there a way of writing this script so that it skips the bad reports instead
of crashing?
When I run my blast inside biopython:
blast_out, error_info = NCBIStandalone.blastall(my_blast_exe, progname,
blastdb, blastfile,
descriptions=3, alignments=3,
show_gi="T", strands=1)
I guess the errors are in the error_info variable, but the biopython tutorial
doesn't show how to recover from an error in the parser (actually, it avoids
getting into the topic).
--
Humberto Ortiz Zuazaga
Programmer-Archaeologist
High Performance Computing facility
University of Puerto Rico
http://www.hpcf.upr.edu/~humberto/
--==_Exmh_84453588P
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.3 01/15/2001
iD8DBQE8rKikO8aX8Tqx8vgRAsiuAJ9Co6n23+GOFX5bf2MSW1tfZ23oFQCeNTiE
3Vm3+PtNDv8/NQkYDUYxp6A=
=ri8V
-----END PGP SIGNATURE-----
--==_Exmh_84453588P--