[Biojava-dev] Error parsing XEMBL
Sicotte, Hugues (NIH/NCI)
sicotteh at mail.nih.gov
Mon Aug 11 11:15:52 EDT 2003
You're not going to like my answer.
The only easy fix to that is to have the parser
have a more explicit error message.. or another
error category.
The feature parser does not know how to handle
segmented sets. (e.g. when the annotation refers
to sequence outside the current one).
Segmented sets are an horror that has plagued bioinformatics
for years.
mRNA join(<1..197,Y13288.1:41..148,Y13289.1:41..140,
Y13290.1:41..175,Y13291.1:41..239,Y13292.1:41..172,
Y13293.1:41..140,Y13294.1:41..212,Y13295.1:41..185,
Y13296.1:41..95,Y13297.1:41..>971)
There is no easy fix to this for a parser, (e.g. how is the
parser supposed to figure out where the missing files are)
but there should be some logic in the parser to reject such features.. and
exit
more elegantly.
------------------------------------------
Here is the long-term NCBI-type solution; To deal with that at NCBI we
had something called the seg-set. I wrote network code that would package
all such records together in a single ASN.1 file (could do the same
thing in XML) and then the sequence information would be "local" to the
feature
parser.
Hugues
-----Original Message-----
From: Carlisia P. Campos [mailto:carlisia at bu.edu]
Sent: Monday, August 11, 2003 8:47 AM
To: biojava-dev at biojava.org
Subject: [Biojava-dev] Error parsing XEMBL
Hello there,
I have encountered what seems to be a problem with biojava. I adapted a
piece of code offered by Mark Schreiber to parse the XEMBL xml string
that corresponds to the accession number "Y13287" and got the error
below. The code also goes below. If anyone knows of a work around
please let me know.
Best,
--Carlisia
public static String convert(String agaveString) {
String seq_S = "";
try {
AGAVEHandler handler = new AGAVEHandler();
SeqIOListener siol = new SeqIOAdapter();
handler.setFeatureListener(siol);
SAX2StAXAdaptor adaptor = new SAX2StAXAdaptor(handler);
//XMLReader xmlReader =
SAXParserFactory.newInstance().newSAXParser();
SAXParser saxParser =
SAXParserFactory.newInstance().newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
// XMLReader xmlReader = new SAXParser();
//FileReader fr = new FileReader(args[0]);
InputSource is = new InputSource(new StringReader(agaveString));
xmlReader.setContentHandler(adaptor);
xmlReader.parse(is);
for(Iterator i = handler.getSequences();i.hasNext();){
Sequence s = (Sequence)i.next();
seq_S = s.seqString();
}
}
catch(SAXException se) {
se.printStackTrace();
}
catch(ParserConfigurationException pce) {
pce.printStackTrace();
}
catch(IOException ioe) {
ioe.printStackTrace();
}
return seq_S;
}
This happens once the program reaches the line that contains:
xmlReader.parse(is):
java.lang.IllegalArgumentException: Location [41,239] is outside 1..237
at
org.biojava.bio.seq.impl.SimpleFeature.<init>(SimpleFeature.java:306)
at
org.biojava.bio.seq.impl.SimpleStrandedFeature.<init>(SimpleStrandedFeat
ure.java:74)
at java.lang.reflect.Constructor.newInstance(Native Method)
at
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(SimpleFea
tureRealizer.java:138)
rethrown as org.biojava.bio.BioException: Couldn't realize feature
at
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(SimpleFea
tureRealizer.java:144)
at
org.biojava.bio.seq.SimpleFeatureRealizer.realizeFeature(SimpleFeatureRe
alizer.java:94)
at
org.biojava.bio.seq.impl.SimpleSequence.realizeFeature(SimpleSequence.ja
va:198)
at
org.biojava.bio.seq.impl.SimpleFeature.realizeFeature(SimpleFeature.java
:328)
at
org.biojava.bio.seq.impl.SimpleFeature.createFeature(SimpleFeature.java:
337)
at
org.biojava.bio.seq.io.agave.StAXFeatureHandler.realizeSubFeatures(StAXF
eatureHandler.java:349)
at
org.biojava.bio.seq.io.agave.StAXFeatureHandler.addFeatureToSequence(StA
XFeatureHandler.java:377)
at
org.biojava.bio.seq.io.agave.AGAVEBioSeqHandler.endElementHandler(AGAVEB
ioSeqHandler.java:192)
at
org.biojava.bio.seq.io.agave.StAXFeatureHandler.endElement(StAXFeatureHa
ndler.java:807)
at
org.biojava.bio.seq.io.agave.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.
java:161)
at
oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParse
r.java:1203)
at
oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingP
arser.java:294)
at
oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingPars
er.java:261)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:147)
at
com.carlisia.ws.bio.XEMBL2Sequence.convert(XEMBL2Sequence.java:78)
at
com.carlisia.ws.bio.XEMBL2Sequence.getSequence(XEMBL2Sequence.java:42)
at
com.carlisia.ws.bio.TranscribeDNAtoRNA.transcribeDNAtoRNAEMBL(Transcribe
DNAtoRNA.java:63)
at
com.carlisia.ws.bio.TranscribeDNAtoRNA.main(TranscribeDNAtoRNA.java:71)
Debugger disconnected from local process.
ggcggtcggtctcgccttgtcgccagctccattttcctctctttctcttcccctttccttcgcgcccaagag
cgcctcccagcctcgtagggtggtcacggagcccctgcgccttttccttgctcgggtcctgcgtccgcgcct
gccccgccatgaatgaggagtacgacgtgatcgtgctgggcaccggcctgacggtgggcgccagggctgagg
ggccggggctgagcagccggg
Process exited with exit code 0.
_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list