[Biojava-l] SAXException with BLAST errors
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Mon Dec 12 20:37:59 EST 2005
Not exactly sure what the problem is here but it looks like your input is
not in FASTA format so that might be causing a problem??
"W. Eric Trull" <wetrull at yahoo.com>
Sent by: biojava-l-bounces at portal.open-bio.org
12/13/2005 08:22 AM
To: biojava-l at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] SAXException with BLAST errors
Hello all,
Some of you may remember that I've been creating a Java application to
front
a BLAST web service. Everything is working great except some user found
the
random sequence that causes problems (gotta love those users). I'm using
the
BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I
have
two problems; one is a NCBI BLAST problem and the other is with BioJava's
BlastXMLParserFacade. Any help/advice would be appreciated, especially if
I
have to explain the problem to NCBI - biology is not my strong suit.
Here is the relevant BioJava stack trace:
org.xml.sax.SAXException: <Hsp> is non-compliant.
at
org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362)
at
org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235)
at
org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153)
at
org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403)
at
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456)
at
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260)
at
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
at
org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
at
org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180)
Here is STDERR from NCBI BLAST on Sun Solaris:
[blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263)
>=
len(256)
[blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263)
>=
len(256)
[blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput
BlastOutput.iterations.E.hits.E.hsps.E.<hseq>
Invalid value(s) [-3] in VisibleString
[ýýýýýýýýýýýýýýýýý----------ýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýý
...]
Here is what I get from NCBI BLAST on Windows XP:
[NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
start(263)
>=
len(256)
[NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
start(263)
>=
len(256)
[NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
start(280)
>=
len(256)
[NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
start(313)
>=
len(256)
Here is how I started BLAST:
/home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp
-d
/home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7
-o
/var/tmp/blast39961.tmp -b 0
Here is my input sequence:
MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV
GAAPHPFLHR
YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI
NGSNWEGILG
LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID
HSLYTGSLWY
TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA
ASSTEKFPDG
FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY
KFAISQSSTG
TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME
DCGYN
Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me that
the
identities and positives are both zero - why is this even showing up as a
similar sequence?
>pdb|1ML5|E 30S Ribosomal Protein S2
Length = 256
Score = 28.1 bits (61), Expect = 5.8
Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%)
Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD
158
Sbjct: 264 ---------- 313
Query: 159 SLVKQTHVPNL 169
Sbjct: 314 324
Here is the XML BLAST output for pdb|1ML5|E. Notice the second <Hsp_hseq>
has a bunch of "#" signs. Is this valid in BioJava?
<Hit>
<Hit_num>146</Hit_num>
<Hit_id>pdb|1ML5|E</Hit_id>
<Hit_def>30S Ribosomal Protein S2</Hit_def>
<Hit_accession>1ML5_E</Hit_accession>
<Hit_len>256</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>28.1054</Hsp_bit-score>
<Hsp_score>61</Hsp_score>
<Hsp_evalue>5.76848</Hsp_evalue>
<Hsp_query-from>99</Hsp_query-from>
<Hsp_query-to>169</Hsp_query-to>
<Hsp_hit-from>264</Hsp_hit-from>
<Hsp_hit-to>324</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_gaps>10</Hsp_gaps>
<Hsp_align-len>71</Hsp_align-len>
<Hsp_qseq>ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL</Hsp_qseq>
<Hsp_hseq>#################----------############################################</Hsp_hseq>
<Hsp_midline>
</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
Thanks.
-Eric Trull
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list