[Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
Jolyon Holdstock
jolyon.holdstock at ogt.co.uk
Thu Apr 13 16:42:36 UTC 2006
Hi Morgane,
I have amended the EmblFormat readSection method as below and the
parsing seems to work; please test it.
I think that the last bit of annotation is carried over into the next
feature so before adding the new feature I dump the annotation and reset
currentTag and currentVal.
if (!line.startsWith(" ")) {
//--------- new code starts ---------------------------
if (currentTag!=null) {
section.add(new String[]{currentTag,currentVal.toString()});
currentTag = null;
currentVal = null;
}
//--------- new code ends -----------------------------
// case 1 : word value - splits into key-value on its own
section.add(line.split("\\s+"));
}
Cheers,
Jolyon
-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org
[mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
THOMAS-CHOLLIER
Sent: 12 April 2006 09:35
To: biojava-l at open-bio.org
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
Hello again,
I am currently using biojavax to parse EMBL files exported from Ensembl
website.
Compared to the EBI files I have, they show a difference in the Features
lines :
sometimes, only one "/word" is present. ie:
EBI file :
FT gene <1..>118
FT /gene="Hoxb9"
FT /note="Hoxb-9"
Ensembl file;
FT gene complement(1..3218)
FT /gene="ENSMUSG00000038227"
The problem I encounter is that the parser correctly convert the "/word"
into a Note, but the Note is then in relation with the immediate
following feature (ie: mRNA).
The current gene feature thus has no annotation.
This behavior is reproducible when removing one "/word" of an EBI file.
Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a
feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up
with an incomplete Note, as the parser seems to split on "=" to separate
the Key and the Value.
Thanks for your help,
Morgane.
--
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student
Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium
_______________________________________________
Biojava-l mailing list - Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.
More information about the Biojava-l
mailing list