[Biojava-l] [biojavax] EMBL parser : features parsing
Morgane THOMAS-CHOLLIER
mthomasc at vub.ac.be
Wed Apr 12 08:34:43 UTC 2006
Hello again,
I am currently using biojavax to parse EMBL files exported from Ensembl
website.
Compared to the EBI files I have, they show a difference in the Features
lines :
sometimes, only one "/word" is present. ie:
EBI file :
FT gene <1..>118
FT /gene="Hoxb9"
FT /note="Hoxb-9"
Ensembl file;
FT gene complement(1..3218)
FT /gene="ENSMUSG00000038227"
The problem I encounter is that the parser correctly convert the "/word"
into a Note, but the Note is then in relation with the immediate
following feature (ie: mRNA).
The current gene feature thus has no annotation.
This behavior is reproducible when removing one "/word" of an EBI file.
Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a
feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up
with an incomplete Note, as the parser seems to split on "=" to separate
the Key and the Value.
Thanks for your help,
Morgane.
--
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student
Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium
More information about the Biojava-l
mailing list