[Biojava-l] [biojavax] EMBL parser error
Morgane THOMAS-CHOLLIER
mthomasc at vub.ac.be
Fri Apr 7 12:18:36 UTC 2006
I now get another error message with the same file :
Exception in thread "main" org.biojava.bio.BioException: Could not read
sequence
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
at
org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
Caused by: java.lang.IndexOutOfBoundsException: No group 5
at java.util.regex.Matcher.group(Matcher.java:355)
at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
... 1 more
Here is the complete file, for info:
ID DQ158013 standard; genomic DNA; VRT; 118 BP.
XX
AC DQ158013;
XX
SV DQ158013.1
XX
DT 19-JAN-2006 (Rel. 86, Created)
DT 19-JAN-2006 (Rel. 86, Last updated, Version 1)
XX
DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
XX
KW .
XX
OS Triturus helveticus (palmate newt)
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Amphibia;
OC Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
XX
RN [1]
RP 1-118
RX DOI; 10.1016/j.ympev.2005.08.012.
RX PUBMED; 16198128.
RA Mannaert A., Roelants K., Bossuyt F., Leyns L.;
RT "A PCR survey for posterior Hox genes in amphibians";
RL Mol. Phylogenet. Evol. 38(2):449-458(2006).
XX
RN [2]
RP 1-118
RA Mannaert A., Roelants K., Bossuyt F., Leyns L.;
RT ;
RL Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
RL Biology Department, Vrije Universiteit Brussel, Pleinlaan 2,
Brussels 1050,
RL Belgium
XX
FH Key Location/Qualifiers
FH
FT source 1..118
FT /organism="Triturus helveticus"
FT /mol_type="genomic DNA"
FT /clone="Thel.b9"
FT /db_xref="taxon:256425"
FT gene <1..>118
FT /gene="Hoxb9"
FT /note="Hoxb-9"
FT mRNA <1..>118
FT /gene="Hoxb9"
FT /product="HOXB9"
FT CDS <1..>118
FT /codon_start=2
FT /gene="Hoxb9"
FT /product="HOXB9"
FT /db_xref="UniProtKB/TrEMBL:Q2LK47"
FT /protein_id="ABA39736.1"
FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
XX
SQ Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc
tcacccggga 60
ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca
agatctgg 118
//
Thanks for helping,
Morgane.
Richard Holland wrote:
>That was indeed a bug. I have made a change to the date parsing in
>EMBLFormat and committed it to CVS. Could you test it for me please?
>
>cheers,
>Richard
>
>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
>
>
>>Hello,
>>
>>I am currently using biojavax that I checked out today from CVS to parse
>>an EMBL file, exported from EBI SRS server.
>>
>>I ran into this error :
>>
>>Exception in thread "main" org.biojava.bio.BioException: Could not read
>>sequence
>> at
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>> at
>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
>> at
>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
>> at
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>> ... 1 more
>>
>>The EMBL file is :
>>
>>ID DQ158013 standard; genomic DNA; VRT; 118 BP.
>>XX
>>AC DQ158013;
>>XX
>>SV DQ158013.1
>>XX
>>DT 19-JAN-2006 (Rel. 86, Created)
>>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1)
>>XX
>>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
>>
>>Removing the two lines that comprise the date information resolves the
>>problem.
>>
>>Thanks,
>>
>>Morgane.
>>
>>
>>
--
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student
Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium
More information about the Biojava-l
mailing list