[Biojava-l] Genbank parser error [biojavax]
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Mon Feb 13 20:11:07 EST 2006
Hi Morgane -
I have to say that doesn't look much like Genbank : )
The biojavax parser are possibly a bit brittle due to their use of regexps
to recognize key elements. It should be fixable, I think the problem is
that the parser expects a word after LOCUS not a number. This may not be
the only problem though. Could you post the entire file? Or if it is large
then a representative file of smaller size.
- Mark
Morgane THOMAS-CHOLLIER <mthomasc at vub.ac.be>
Sent by: biojava-l-bounces at portal.open-bio.org
02/14/2006 04:36 AM
To: biojava-l at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] Genbank parser error [biojavax]
Hello,
I have tried biojavax today with a view to use the Genbank file parser.
My test file is a Genbank formatted file which has been produced by
Ensembl export system.
The head of the file is as follow :
LOCUS 6 489671 bp DNA HTG 13-FEB-2006
DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence
52296503..52786173 reannotated via EnsEMBL
ACCESSION chromosome:NCBIM34:6:52296503:52786173:1
VERSION chromosome:NCBIM34:6:52296503:52786173:1
I used the code provided in biojavax docbook to parse this file.
I get the following error :
Exception in thread "main" org.biojava.bio.BioException: Could not read
sequence
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
at
org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31)
Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found:
6 489671 bp DNA HTG 13-FEB-2006
at
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229)
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
... 1 more
I had a look at GenbankFormat.java, and I guess the problem comes from
the regular expression that do not recognize the LOCUS as a standard
Genbank file LOCUS tag.
Am I wrong ? Have biojavax Genbank parser been tested on Ensembl
exported files ?
Morgane.
--
*************************************
Morgane THOMAS-CHOLLIER, PHD Student
Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list