[Biojava-l] SAXException with BLAST errors
W. Eric Trull
wetrull at yahoo.com
Wed Dec 14 11:58:28 EST 2005
Thanks for the suggestion Mark. I emailed NCBI and the jist of the reply
was:
These SeqPortNew errors usually indicate a problem in the formatting process;
the #'s are certainly not normal. Is this the only database entry that
generates errors?
So I dug a little deeper on 1ML5 to discover that it has a chain 'e' and a
chain 'E'. When I created my FASTA file to feed to formatdb I made the
deflines of the form pdb|<id>|<chain>, but in uppercase. So I had two
entries with the same defline but different sequences. I think this is my
problem and am working on fixing it now.
Thanks.
-Eric Trull
--- mark.schreiber at novartis.com wrote:
> I would send NCBI your test sequence, the blast output and the version of
> BLAST and ask them if this is "normal". I have found them to be very
> responsive in the past. If it is normal then we need to fix biojava to
> cope.
>
> - Mark
>
>
>
>
>
> "W. Eric Trull" <wetrull at yahoo.com>
> 12/13/2005 09:42 AM
>
>
> To: Mark Schreiber/GP/Novartis at PH
> cc: biojava-l at biojava.org,
> biojava-l-bounces at portal.open-bio.org
> Subject: Re: [Biojava-l] SAXException with BLAST errors
>
>
> No, I use BioJava to write the user's query sequence as a fasta file
> before
> feeding it to BLAST. I just copied a differently formatted sequence into
> my
> post.
>
> Thanks.
>
> -Eric Trull
>
> --- mark.schreiber at novartis.com wrote:
>
> > Not exactly sure what the problem is here but it looks like your input
> is
> > not in FASTA format so that might be causing a problem??
> >
> >
> >
> >
> >
> > "W. Eric Trull" <wetrull at yahoo.com>
> > Sent by: biojava-l-bounces at portal.open-bio.org
> > 12/13/2005 08:22 AM
> >
> >
> > To: biojava-l at biojava.org
> > cc: (bcc: Mark Schreiber/GP/Novartis)
> > Subject: [Biojava-l] SAXException with BLAST errors
> >
> >
> > Hello all,
> >
> > Some of you may remember that I've been creating a Java application to
> > front
> > a BLAST web service. Everything is working great except some user found
>
> > the
> > random sequence that causes problems (gotta love those users). I'm
> using
> > the
> > BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I
>
> > have
> > two problems; one is a NCBI BLAST problem and the other is with
> BioJava's
> > BlastXMLParserFacade. Any help/advice would be appreciated, especially
> if
> > I
> > have to explain the problem to NCBI - biology is not my strong suit.
> >
> > Here is the relevant BioJava stack trace:
> >
> > org.xml.sax.SAXException: <Hsp> is non-compliant.
> > at
> >
>
org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362)
> > at
> >
>
org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235)
> > at
> >
> org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153)
> > at
> > org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403)
> > at
> >
>
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456)
> > at
> >
>
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260)
> > at
> >
>
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
> > at
> > org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
> > at
> >
>
org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180)
> >
> > Here is STDERR from NCBI BLAST on Sun Solaris:
> >
> > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263)
>
> > >=
> > len(256)
> > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263)
>
> > >=
> > len(256)
> > [blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput
> > BlastOutput.iterations.E.hits.E.hsps.E.<hseq>
> > Invalid value(s) [-3] in VisibleString
> > [ýýýýýýýýýýýýýýýýý----------ýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýý
>
> > ...]
> >
> > Here is what I get from NCBI BLAST on Windows XP:
> >
> > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
> > start(263)
> > >=
> > len(256)
> > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
> > start(263)
> > >=
> > len(256)
> > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
> > start(280)
> > >=
> > len(256)
> > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E
> > start(313)
> > >=
> > len(256)
> >
> > Here is how I started BLAST:
> >
> > /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p
> blastp
> > -d
> > /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7
>
> > -o
> > /var/tmp/blast39961.tmp -b 0
> >
> > Here is my input sequence:
> >
> > MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV
> > GAAPHPFLHR
> > YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI
> > NGSNWEGILG
> > LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID
> > HSLYTGSLWY
> > TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA
> > ASSTEKFPDG
> > FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY
> > KFAISQSSTG
> > TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME
> > DCGYN
> >
> > Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me
> that
> > the
> > identities and positives are both zero - why is this even showing up as
> a
> > similar sequence?
> >
> > >pdb|1ML5|E 30S Ribosomal Protein S2
> > Length = 256
> >
> > Score = 28.1 bits (61), Expect = 5.8
> > Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%)
> >
> > Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD
> > 158
> >
> > Sbjct: 264 ---------- 313
> >
> > Query: 159 SLVKQTHVPNL 169
> >
> > Sbjct: 314 324
> >
> >
> > Here is the XML BLAST output for pdb|1ML5|E. Notice the second
> <Hsp_hseq>
> > has a bunch of "#" signs. Is this valid in BioJava?
> >
> > <Hit>
> > <Hit_num>146</Hit_num>
> > <Hit_id>pdb|1ML5|E</Hit_id>
> > <Hit_def>30S Ribosomal Protein S2</Hit_def>
> > <Hit_accession>1ML5_E</Hit_accession>
> > <Hit_len>256</Hit_len>
> > <Hit_hsps>
> > <Hsp>
> > <Hsp_num>1</Hsp_num>
> > <Hsp_bit-score>28.1054</Hsp_bit-score>
> > <Hsp_score>61</Hsp_score>
> > <Hsp_evalue>5.76848</Hsp_evalue>
> > <Hsp_query-from>99</Hsp_query-from>
>
=== message truncated ===
More information about the Biojava-l
mailing list