[Biojava-l] SAXException with BLAST errors

W. Eric Trull wetrull at yahoo.com
Mon Dec 12 20:42:30 EST 2005


No, I use BioJava to write the user's query sequence as a fasta file before
feeding it to BLAST.  I just copied a differently formatted sequence into my
post.

Thanks.

-Eric Trull

--- mark.schreiber at novartis.com wrote:

> Not exactly sure what the problem is here but it looks like your input is 
> not in FASTA format so that might be causing a problem??
> 
> 
> 
> 
> 
> "W. Eric Trull" <wetrull at yahoo.com>
> Sent by: biojava-l-bounces at portal.open-bio.org
> 12/13/2005 08:22 AM
> 
>  
>         To:     biojava-l at biojava.org
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] SAXException with BLAST errors
> 
> 
> Hello all,
> 
> Some of you may remember that I've been creating a Java application to 
> front
> a BLAST web service.  Everything is working great except some user found 
> the
> random sequence that causes problems (gotta love those users).  I'm using 
> the
> BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output.  I think I 
> have
> two problems; one is a NCBI BLAST problem and the other is with BioJava's
> BlastXMLParserFacade.  Any help/advice would be appreciated, especially if 
> I
> have to explain the problem to NCBI - biology is not my strong suit.
> 
> Here is the relevant BioJava stack trace:
> 
> org.xml.sax.SAXException: <Hsp> is non-compliant.
>                  at
>
org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362)
>                  at
>
org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235)
>                  at
> org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153)
>                  at 
> org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403)
>                  at
>
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456)
>                  at
>
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260)
>                  at
>
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
>                  at 
> org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
>                  at
>
org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180)
> 
> Here is STDERR from NCBI BLAST on Sun Solaris:
> 
> [blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) 
> >=
> len(256)
> [blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) 
> >=
> len(256)
> [blastall] ERROR:  [065.106]  : /var/tmp/blast39961.tmpOutput
> BlastOutput.iterations.E.hits.E.hsps.E.<hseq>
> Invalid value(s) [-3] in VisibleString
> [ýýýýýýýýýýýýýýýýý----------ýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýý 
> ...]
> 
> Here is what I get from NCBI BLAST on Windows XP:
> 
> [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> start(263)
> >=
> len(256)
> [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> start(263)
> >=
> len(256)
> [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> start(280)
> >=
> len(256)
> [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> start(313)
> >=
> len(256)
> 
> Here is how I started BLAST:
> 
> /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp 
> -d
> /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 
> -o
> /var/tmp/blast39961.tmp -b 0
> 
> Here is my input sequence:
> 
> MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV 
> GAAPHPFLHR
> YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI 
> NGSNWEGILG
> LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID 
> HSLYTGSLWY
> TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA 
> ASSTEKFPDG
> FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY 
> KFAISQSSTG
> TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME
> DCGYN
> 
> Here is the regular BLAST output for pdb|1ML5|E.  It seems odd to me that 
> the
> identities and positives are both zero - why is this even showing up as a
> similar sequence?
> 
> >pdb|1ML5|E 30S Ribosomal Protein S2
>           Length = 256
> 
>  Score = 28.1 bits (61), Expect = 5.8
>  Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%)
> 
> Query: 99  ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD 
> 158
> 
> Sbjct: 264 ---------- 313
> 
> Query: 159 SLVKQTHVPNL 169
> 
> Sbjct: 314             324
> 
> 
> Here is the XML BLAST output for pdb|1ML5|E.  Notice the second <Hsp_hseq>
> has a bunch of "#" signs.  Is this valid in BioJava?
> 
>         <Hit>
>           <Hit_num>146</Hit_num>
>           <Hit_id>pdb|1ML5|E</Hit_id>
>           <Hit_def>30S Ribosomal Protein S2</Hit_def>
>           <Hit_accession>1ML5_E</Hit_accession>
>           <Hit_len>256</Hit_len>
>           <Hit_hsps>
>             <Hsp>
>               <Hsp_num>1</Hsp_num>
>               <Hsp_bit-score>28.1054</Hsp_bit-score>
>               <Hsp_score>61</Hsp_score>
>               <Hsp_evalue>5.76848</Hsp_evalue>
>               <Hsp_query-from>99</Hsp_query-from>
>               <Hsp_query-to>169</Hsp_query-to>
>               <Hsp_hit-from>264</Hsp_hit-from>
>               <Hsp_hit-to>324</Hsp_hit-to>
>               <Hsp_query-frame>1</Hsp_query-frame>
>               <Hsp_hit-frame>1</Hsp_hit-frame>
>               <Hsp_gaps>10</Hsp_gaps>
>               <Hsp_align-len>71</Hsp_align-len>
>  
>
<Hsp_qseq>ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL</Hsp_qseq>
>  
>
<Hsp_hseq>#################----------############################################</Hsp_hseq>
>               <Hsp_midline>  
>                     </Hsp_midline>
>             </Hsp>
>           </Hit_hsps>
>         </Hit>
> 
> Thanks.
> 
> -Eric Trull
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 
> 
> 
> 


Thanks.

-W. Eric Trull


More information about the Biojava-l mailing list