[Biojava-l] Genbank parser error [biojavax]
Morgane THOMAS-CHOLLIER
mthomasc at vub.ac.be
Mon Feb 13 15:36:59 EST 2006
Hello,
I have tried biojavax today with a view to use the Genbank file parser.
My test file is a Genbank formatted file which has been produced by
Ensembl export system.
The head of the file is as follow :
LOCUS 6 489671 bp DNA HTG 13-FEB-2006
DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence
52296503..52786173 reannotated via EnsEMBL
ACCESSION chromosome:NCBIM34:6:52296503:52786173:1
VERSION chromosome:NCBIM34:6:52296503:52786173:1
I used the code provided in biojavax docbook to parse this file.
I get the following error :
Exception in thread "main" org.biojava.bio.BioException: Could not read
sequence
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
at
org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31)
Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found:
6 489671 bp DNA HTG 13-FEB-2006
at
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229)
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
... 1 more
I had a look at GenbankFormat.java, and I guess the problem comes from
the regular expression that do not recognize the LOCUS as a standard
Genbank file LOCUS tag.
Am I wrong ? Have biojavax Genbank parser been tested on Ensembl
exported files ?
Morgane.
--
*************************************
Morgane THOMAS-CHOLLIER, PHD Student
Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium
More information about the Biojava-l
mailing list