[Biojava-l] Ensembl gene parsing
Stein Aerts
stein.aerts at esat.kuleuven.ac.be
Wed Jan 29 09:57:10 EST 2003
Hi,
When currently parsing an exported sequence of an Ensembl mouse gene
(using the Export Data function at www.ensembl.org) there appear to be 3
problems:
I tried to attach an example of an exported sequence of the Igf1 gene
but then the message was bounced because of a suspicious header...
1. Some of the exon locations start with .0:
I think this is a bug of the EMBL formatting at Ensembl?
FT exon .0:44020..44364
FT /exon_id="ENSMUSE00000233709"
FT /start_phase=0
FT /end_phase=0
2. The first annotation of a CDS feature is written on the next line
after CDS. This is not found by the EMBL parser.
I think that is is also a bug at Ensembl?
FT CDS
FT /gene="ENSMUSG00000020053"
3. Some of the lines cannot be parsed, for example the parser writes to
System.out: "This line could not be parsed: exon 2001..2159"
This one I don't understand, I cannot see a problem for these features?
FT exon 2001..2159
FT /exon_id="ENSMUSE00000248454"
FT /start_phase=0
FT /end_phase=0
Thank you in advance!
Stein.
--
Stein Aerts BioI at SISTA
K.U.Leuven ESAT-SCD Belgium
http://www.esat.kuleuven.ac.be/~dna/BioI
More information about the Biojava-l
mailing list