[BioPython] blastpgp

Jeffrey Chang jchang@SMI.Stanford.EDU
Mon, 13 Dec 1999 21:46:14 -0800 (PST)


Speaking of PSI-BLAST, I've been running across a Bus Error while running
blastpgp on certain inputs.  Has anyone run across this?  The email I sent
to NCBI follows...

Thanks,
Jeff




---------- Forwarded message ----------
Date: Wed, 24 Nov 1999 12:44:49 -0800
From: Jeffrey Chang <jefftc@leland.Stanford.EDU>
To: blast-help@ncbi.nlm.nih.gov
Newsgroups: bionet.software
Subject: NCBI blastpgp bus error

Hello,

I am encountering bus errors when running the NCBI-compiled blastpgp
2.0.10 binaries for Solaris.  It seems to be the most current version, and
the VERSION file included in the distribution reads: 
Sun Sep 19 11:16:40 EDT 1999

My machine is an Ultra-4 running Solaris 2.6 with 1Gb of RAM.

I am running blastpgp with 10 iterations on the swissprot database made
available by NCBI, formatted with formatdb using default parameters.  

taiyang:~/projects/parsers> ./blastpgp -i test.seq -d data/swissprot -j 10

On certain input sequences, blastpgp will encounter a bus error while it
is on the "Searching" line.  For some sequences, this happens after
blastpgp has displayed a string of .'s.  On others, it happens immediately
after the "Searching" message:
Searching..................................................Bus error
SearchingBus error

The point at which the error occurs appears to be reproducible, e.g. the
same sequence and database seems to always cause the bus error at the same
point in the search.

Here is a brief list of the input sequences that cause Bus errors.
sp:P07175
sp:P32823
sp:P54929
sp:P06564
sp:P12311
sp:Q03069
sp:Q05203

Unfortunately, this leaves no core file, so I am unable to trace the
problem without doing a careful audit of the code.  I'm hoping someone
more familiar with it has encountered this problem and can help me
diagnose it.


Just for fun, I tried running some of these queries against the PSI-BLAST
web server at NCBI.  This is currently running a slightly older version of
blastpgp (BLASTP 2.0.6).
http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-psi_blast

I ran some sequences on this server against swissprot and did not get
any results for any of them.  For example, I went to the web site, pasted
in a sequence:
>gi|132349|sp|P15394|REPA_AGRTU REPLICATING PROTEIN
MFQQIGAVQAKSGTDEPAHPCEKFPPERKCEAVFWKPLPRHEAREILLAARKYELAMKQPGKRTGPLGHV
ALEVLDYLTNLVDFGNGRLDPSISTIMEKIGRAESCMSALYPNRQRGRRPAGAADKQRLPLEPAGARPRA
LLGKYVRKAAPLPDDAAQARQERHDTIKAHMDSLSPADRLRETVEDRTRAEQLAGYVERAAQNRPSGPRK
AARRRQQSRCSFTTPNRPRRTLPSSHPQKFGGTKGRKAFE

chose to run it against swissprot, and pressed "Submit Query".  I only got
back part of the results.  The header came through fine, but there was no
graphical overview, descriptions, or alignments.

=======

Commencing search, please wait for results.

BLASTP 2.0.6 [Sept-16-1998]


Reference:
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schdffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|132349|sp|P15394|REPA_AGRTU REPLICATING PROTEIN
         (250 letters)

Database: Non-redundant SwissProt sequences
           82,258 sequences; 29,652,561 total letters

Searching..................................................



E-value threshold for inclusion in PSI-Blast iteration 1: 0.001 
E-value threshold for inclusion in PSI-Blast iteration 2: 

======


I suspect blastpgp may be crashing on the server end as well.  The one on
the web server was broken on every query I tried against swissprot,
whereas the one I have locally was only broken for selected inputs. 
Unclicking the option to disable the graphical overview did not help.

However, if I run the same query sequence against nr or pdb, I do get the
graphical overview, descriptions, and alignments, as expected.

Could there be something in swissprot that is triggering a failure in
blastpgp?  I believe I am using the same swissprot database that is
installed at NCBI's server.  Both contain 82258 sequences and 29652561
total letters.

Has anyone encountered these errors before?

Thanks,
Jeff