[Biojava-dev] NullPointerException from BlastSAXParser.java
W. Eric Trull
wetrull at yahoo.com
Thu Oct 6 11:01:01 EDT 2005
Hello Mark,
Here is what I've done, using NCBI Blast 2.0.11, Windows XP, JDK 1.4.2
1. Downloaded the PDB's pdb_seqres.txt
2. Created a blast database (after changing the deflines):
C:\blast-2.0.11\formatdb.exe
-t "PDB"
-i blast\pdb_seqres.txt
-l blast\pdb_formatdb.log
-o T
-n blast\pdb
3. BLASTed 26SPS9_Hs:
C:\blast-2.0.11\blastall.exe
-p blastp
-d blast\pdb
-i 26SPS9_Hs.fasta
-o 26SPS9_Hs.blast
4. Tried to parse 26SPS9_Hs.blast using the class shown in BioJava in Anger
and BlastEcho, both of which give me the NullPointerException. The beginning
of 26SPS9_Hs.blast file is shown below, the entire file is attached.
Please let me know if you see anything obviously wrong with the way I'm doing
the BLAST. I'm going to cvs checkout the BioJava source code and have a look
at the JUnit test later today.
Thanks!
-Eric Trull
-------- 26SPS9_Hs.blast --------
BLASTP 2.0.11 [Jan-20-2000]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= 26SPS9_Hs
(176 letters)
Database: PDB
78,094 sequences; 17,596,117 total letters
Score
E
Sequences producing significant alignments: (bits)
Value
pdb|1UFM|A Cop9 Complex Subunit 4 39
0.003
.
.
.
-------- 26SPS9_Hs.blast --------
--- mark.schreiber at novartis.com wrote:
> Hello -
>
> This is very odd.
>
> The JUnit tests currently pass using the files in
> /tests/files/org/biojava/bio/programs/ssbind These BLAST files all have
> the string "Searching....". Maybe there is a variation in the windows
> output?
>
> Can you post at least the header of your output to the list (preferably an
> entire example output)?
>
> - Mark
>
>
>
>
>
> "W. Eric Trull" <wetrull at yahoo.com>
> Sent by: biojava-dev-bounces at portal.open-bio.org
> 10/06/2005 06:11 AM
>
>
> To: biojava-dev at biojava.org
> cc: (bcc: Mark Schreiber/GP/Novartis)
> Subject: [Biojava-dev] NullPointerException from
> BlastSAXParser.java
>
>
> Hello all,
>
> I'm new to the list, but have done as much archive searching, Google
> searching, and debugging as I can on the problem I describe here.
>
> I'm trying to parse NCBI BLAST output (as shown in BioJava in Anger), but
> keep getting a NullPointerException. One of my searches turned up using
> BlastEcho to debug the problem, but that also throws the
> NullPointerException:
>
> startSearch
> SearchProp: program: ncbi-blastp
> SearchProp: version: 2.0.11
> java.lang.NullPointerException
> at
>
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215)
> at
> org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
> at
>
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311)
> at
>
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274)
> at
>
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160)
> at
> com.pfizer.search.sequence.BlastEcho.echo(BlastEcho.java:42)
> at
> com.pfizer.search.sequence.BlastEcho.main(BlastEcho.java:88)
> Exception in thread "main"
>
> Stepping through the code in a debugger shows that the while loop added in
> revision 1.13 of
> /biojava-live/src/org/biojava/bio/program/sax/BlastSAXParser.java (fixed
> truncation of database id) reads all the lines without ever matching the
> "Searching" string. At first I thought it was because I was using a later
> version of BLAST, but then I tried 2.0.11 and 2.2.3 (supported version)
> but
> they also result in a NullPointerException. In the BLAST output for the
> various versions I never see a "Searching" string anywhere. I've tried
> all
> the -m options as well, without success.
>
> Is there a NCBI BLAST option that I need to be using? I'm running on
> Windows
> XP (during development) - is the UNIX version output different?
>
> Thanks.
>
> -Eric Trull
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
>
>
>
>
-------------- next part --------------
BLASTP 2.0.11 [Jan-20-2000]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= 26SPS9_Hs
(176 letters)
Database: PDB
78,094 sequences; 17,596,117 total letters
Score E
Sequences producing significant alignments: (bits) Value
pdb|1UFM|A Cop9 Complex Subunit 4 39 0.003
pdb|1YM7|D Beta-Adrenergic Receptor Kinase 1 29 3.3
pdb|1YM7|C Beta-Adrenergic Receptor Kinase 1 29 3.3
pdb|1YM7|B Beta-Adrenergic Receptor Kinase 1 29 3.3
pdb|1YM7|A Beta-Adrenergic Receptor Kinase 1 29 3.3
pdb|1OMW|A G-Protein Coupled Receptor Kinase 2 29 3.3
>pdb|1UFM|A Cop9 Complex Subunit 4
Length = 84
Score = 39.1 bits (89), Expect = 0.003
Identities = 15/56 (26%), Positives = 35/56 (61%)
Query: 114 LLEQNLIRVIEPFSRVQIEHISSLIKLSKADVERKLSQMILDKKFHGILDQGEGVL 169
++E NL+ + ++ + E + +L+++ A E+ SQMI + + +G +DQ +G++
Sbjct: 16 VIEHNLLSASKLYNNITFEELGALLEIPAAKAEKIASQMITEGRMNGFIDQIDGIV 71
>pdb|1YM7|D Beta-Adrenergic Receptor Kinase 1
Length = 689
Score = 29.0 bits (63), Expect = 3.3
Identities = 18/85 (21%), Positives = 41/85 (48%), Gaps = 5/85 (5%)
Query: 73 CVAQASKNRSLADFEKALTDY-RAELRDDPIISTHLAKLYDNLLEQNLIRVIEPFSRVQI 131
C+ + + L +F + + Y + E ++ ++ + +++D + + L+ PFS+ I
Sbjct: 72 CLKHLEEAKPLVEFYEEIKKYEKLETEEERLVCSR--EIFDTYIMKELLACSHPFSKSAI 129
Query: 132 EHISSLIKLSKADVERKLSQMILDK 156
EH+ L K V L Q +++
Sbjct: 130 EHVQG--HLVKKQVPPDLFQPYIEE 152
>pdb|1YM7|C Beta-Adrenergic Receptor Kinase 1
Length = 689
Score = 29.0 bits (63), Expect = 3.3
Identities = 18/85 (21%), Positives = 41/85 (48%), Gaps = 5/85 (5%)
Query: 73 CVAQASKNRSLADFEKALTDY-RAELRDDPIISTHLAKLYDNLLEQNLIRVIEPFSRVQI 131
C+ + + L +F + + Y + E ++ ++ + +++D + + L+ PFS+ I
Sbjct: 72 CLKHLEEAKPLVEFYEEIKKYEKLETEEERLVCSR--EIFDTYIMKELLACSHPFSKSAI 129
Query: 132 EHISSLIKLSKADVERKLSQMILDK 156
EH+ L K V L Q +++
Sbjct: 130 EHVQG--HLVKKQVPPDLFQPYIEE 152
>pdb|1YM7|B Beta-Adrenergic Receptor Kinase 1
Length = 689
Score = 29.0 bits (63), Expect = 3.3
Identities = 18/85 (21%), Positives = 41/85 (48%), Gaps = 5/85 (5%)
Query: 73 CVAQASKNRSLADFEKALTDY-RAELRDDPIISTHLAKLYDNLLEQNLIRVIEPFSRVQI 131
C+ + + L +F + + Y + E ++ ++ + +++D + + L+ PFS+ I
Sbjct: 72 CLKHLEEAKPLVEFYEEIKKYEKLETEEERLVCSR--EIFDTYIMKELLACSHPFSKSAI 129
Query: 132 EHISSLIKLSKADVERKLSQMILDK 156
EH+ L K V L Q +++
Sbjct: 130 EHVQG--HLVKKQVPPDLFQPYIEE 152
>pdb|1YM7|A Beta-Adrenergic Receptor Kinase 1
Length = 689
Score = 29.0 bits (63), Expect = 3.3
Identities = 18/85 (21%), Positives = 41/85 (48%), Gaps = 5/85 (5%)
Query: 73 CVAQASKNRSLADFEKALTDY-RAELRDDPIISTHLAKLYDNLLEQNLIRVIEPFSRVQI 131
C+ + + L +F + + Y + E ++ ++ + +++D + + L+ PFS+ I
Sbjct: 72 CLKHLEEAKPLVEFYEEIKKYEKLETEEERLVCSR--EIFDTYIMKELLACSHPFSKSAI 129
Query: 132 EHISSLIKLSKADVERKLSQMILDK 156
EH+ L K V L Q +++
Sbjct: 130 EHVQG--HLVKKQVPPDLFQPYIEE 152
>pdb|1OMW|A G-Protein Coupled Receptor Kinase 2
Length = 689
Score = 29.0 bits (63), Expect = 3.3
Identities = 18/85 (21%), Positives = 41/85 (48%), Gaps = 5/85 (5%)
Query: 73 CVAQASKNRSLADFEKALTDY-RAELRDDPIISTHLAKLYDNLLEQNLIRVIEPFSRVQI 131
C+ + + L +F + + Y + E ++ ++ + +++D + + L+ PFS+ I
Sbjct: 72 CLKHLEEAKPLVEFYEEIKKYEKLETEEERLVCSR--EIFDTYIMKELLACSHPFSKSAI 129
Query: 132 EHISSLIKLSKADVERKLSQMILDK 156
EH+ L K V L Q +++
Sbjct: 130 EHVQG--HLVKKQVPPDLFQPYIEE 152
Database: PDB
Posted date: Oct 6, 2005 7:42 AM
Number of letters in database: 17,596,117
Number of sequences in database: 78,094
Lambda K H
0.319 0.136 0.379
Gapped
Lambda K H
0.270 0.0470 0.230
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 5635599
Number of Sequences: 78094
Number of extensions: 193971
Number of successful extensions: 758
Number of sequences better than 10.0: 6
Number of HSP's better than 10.0 without gapping: 1
Number of HSP's successfully gapped in prelim test: 5
Number of HSP's that attempted gapping in prelim test: 757
Number of HSP's gapped (non-prelim): 6
length of query: 176
length of database: 17,596,117
effective HSP length: 50
effective length of query: 126
effective length of database: 13,691,417
effective search space: 1725118542
effective search space used: 1725118542
T: 11
A: 40
X1: 16 ( 7.4 bits)
X2: 38 (14.8 bits)
X3: 64 (24.9 bits)
S1: 41 (21.7 bits)
S2: 59 (27.4 bits)
More information about the biojava-dev
mailing list