[Biojava-l] SAXException with BLAST errors
W. Eric Trull
wetrull at yahoo.com
Mon Dec 12 20:42:30 EST 2005
No, I use BioJava to write the user's query sequence as a fasta file before
feeding it to BLAST. I just copied a differently formatted sequence into my
post.
Thanks.
-Eric Trull
--- mark.schreiber at novartis.com wrote:
> Not exactly sure what the problem is here but it looks like your input is
> not in FASTA format so that might be causing a problem??
>
>
>
>
>
> "W. Eric Trull" <wetrull at yahoo.com>
> Sent by: biojava-l-bounces at portal.open-bio.org
> 12/13/2005 08:22 AM
>
>
> To: biojava-l at biojava.org
> cc: (bcc: Mark Schreiber/GP/Novartis)
> Subject: [Biojava-l] SAXException with BLAST errors
>
>
> Hello all,
>
> Some of you may remember that I've been creating a Java application to
> front
> a BLAST web service. Everything is working great except some user found
> the
> random sequence that causes problems (gotta love those users). I'm using
> the
> BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I
> have
> two problems; one is a NCBI BLAST problem and the other is with BioJava's
> BlastXMLParserFacade. Any help/advice would be appreciated, especially if
> I
> have to explain the problem to NCBI - biology is not my strong suit.
>
> Here is the relevant BioJava stack trace:
>
> org.xml.sax.SAXException: <Hsp> is non-compliant.
> at
>
org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362)
> at
>
org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235)
> at
> org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153)
> at
> org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403)
> at
>
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456)
> at
>
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260)
> at
>
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
> at
> org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
> at
>
org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180)
>
> Here is STDERR from NCBI BLAST on Sun Solaris:
>
> [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263)
> >=
> len(256)
> [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263)
> >=
> len(256)
> [blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput
> BlastOutput.iterations.E.hits.E.hsps.E.<hseq>
> Invalid value(s) [-3] in VisibleString
> [ýýýýýýýýýýýýýýýýý----------ýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýý
> ...]
>
> Here is what I get from NCBI BLAST on Windows XP:
>
> [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
> start(263)
> >=
> len(256)
> [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
> start(263)
> >=
> len(256)
> [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
> start(280)
> >=
> len(256)
> [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
> start(313)
> >=
> len(256)
>
> Here is how I started BLAST:
>
> /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp
> -d
> /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7
> -o
> /var/tmp/blast39961.tmp -b 0
>
> Here is my input sequence:
>
> MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV
> GAAPHPFLHR
> YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI
> NGSNWEGILG
> LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID
> HSLYTGSLWY
> TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA
> ASSTEKFPDG
> FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY
> KFAISQSSTG
> TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME
> DCGYN
>
> Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me that
> the
> identities and positives are both zero - why is this even showing up as a
> similar sequence?
>
> >pdb|1ML5|E 30S Ribosomal Protein S2
> Length = 256
>
> Score = 28.1 bits (61), Expect = 5.8
> Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%)
>
> Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD
> 158
>
> Sbjct: 264 ---------- 313
>
> Query: 159 SLVKQTHVPNL 169
>
> Sbjct: 314 324
>
>
> Here is the XML BLAST output for pdb|1ML5|E. Notice the second <Hsp_hseq>
> has a bunch of "#" signs. Is this valid in BioJava?
>
> <Hit>
> <Hit_num>146</Hit_num>
> <Hit_id>pdb|1ML5|E</Hit_id>
> <Hit_def>30S Ribosomal Protein S2</Hit_def>
> <Hit_accession>1ML5_E</Hit_accession>
> <Hit_len>256</Hit_len>
> <Hit_hsps>
> <Hsp>
> <Hsp_num>1</Hsp_num>
> <Hsp_bit-score>28.1054</Hsp_bit-score>
> <Hsp_score>61</Hsp_score>
> <Hsp_evalue>5.76848</Hsp_evalue>
> <Hsp_query-from>99</Hsp_query-from>
> <Hsp_query-to>169</Hsp_query-to>
> <Hsp_hit-from>264</Hsp_hit-from>
> <Hsp_hit-to>324</Hsp_hit-to>
> <Hsp_query-frame>1</Hsp_query-frame>
> <Hsp_hit-frame>1</Hsp_hit-frame>
> <Hsp_gaps>10</Hsp_gaps>
> <Hsp_align-len>71</Hsp_align-len>
>
>
<Hsp_qseq>ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL</Hsp_qseq>
>
>
<Hsp_hseq>#################----------############################################</Hsp_hseq>
> <Hsp_midline>
> </Hsp_midline>
> </Hsp>
> </Hit_hsps>
> </Hit>
>
> Thanks.
>
> -Eric Trull
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>
>
Thanks.
-W. Eric Trull
More information about the Biojava-l
mailing list