From anderson.moura at telemar-rj.com.br  Mon Apr  3 10:09:23 2006
From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva)
Date: Mon, 3 Apr 2006 11:09:23 -0300
Subject: [Biojava-l] Get a sequence from internet
Message-ID: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net>

Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava?
 
Can anybody help?
 
Thanks,


Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a.


From anderson.moura at telemar-rj.com.br  Mon Apr  3 11:54:01 2006
From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva)
Date: Mon, 3 Apr 2006 12:54:01 -0300
Subject: [Biojava-l] RES:  Get a sequence from internet
Message-ID: <3C39C09ED334F243838953854BE43FB6025C7F40@MAILBX02.telemar.corp.net>

Nice!!

It work only with the sequence ID? Can I search by the name of the sequence?

Thanks a lot!

-----Mensagem original-----
De: Dickson S. Guedes [mailto:guedes at unisul.br]
Enviada em: segunda-feira, 3 de abril de 2006 12:10
Para: Anderson Moura da Silva
Cc: biojava-l at lists.open-bio.org
Assunto: Re: [Biojava-l] Get a sequence from internet


Yes,
Hi Anderson,

You can use the NCBISequenceDB:


(...)

NCBISequenceDB ncbiDB = new NCBISequenceDB();
Sequence sequenceFromGenbank = ncbiDB.getSequence("sequence_id");

System.out.println(sequenceFromGenbank.getName());

(...)

Change "sequence_id" for a ID from Genbank.

:)


Anderson Moura da Silva escreveu:
> Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava?
>  
> Can anybody help?
>  
> Thanks,
> 
> 
> Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a.
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 


-- 
Dickson S. Guedes
/*
  * UNISUL - Universidade do Sul de Santa Catarina
  * ATI - Assessoria de Tecnologia da Informa??o
  * (0xx48) 621-3200 - http://www.unisul.br
  *
  *    "Quis custodiet ipsos custodes?"
  */


From guedes at unisul.br  Mon Apr  3 11:09:43 2006
From: guedes at unisul.br (Dickson S. Guedes)
Date: Mon, 03 Apr 2006 12:09:43 -0300
Subject: [Biojava-l] Get a sequence from internet
In-Reply-To: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net>
References: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net>
Message-ID: <44313AB7.7080309@unisul.br>

Yes,
Hi Anderson,

You can use the NCBISequenceDB:


(...)

NCBISequenceDB ncbiDB = new NCBISequenceDB();
Sequence sequenceFromGenbank = ncbiDB.getSequence("sequence_id");

System.out.println(sequenceFromGenbank.getName());

(...)

Change "sequence_id" for a ID from Genbank.

:)


Anderson Moura da Silva escreveu:
> Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava?
>  
> Can anybody help?
>  
> Thanks,
> 
> 
> Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a.
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 


-- 
Dickson S. Guedes
/*
  * UNISUL - Universidade do Sul de Santa Catarina
  * ATI - Assessoria de Tecnologia da Informa??o
  * (0xx48) 621-3200 - http://www.unisul.br
  *
  *    "Quis custodiet ipsos custodes?"
  */

From wendy.wong at gmail.com  Tue Apr  4 14:22:00 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Tue, 4 Apr 2006 19:22:00 +0100
Subject: [Biojava-l] unsupervised training of transition weights
In-Reply-To: <200603311805.25861.matthew.pocock@ncl.ac.uk>
References: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
	<5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>
	<200603311805.25861.matthew.pocock@ncl.ac.uk>
Message-ID: <e554425b0604041122y2cbf4012g57f3d2069e7ca219@mail.gmail.com>

Thanks for your advice! I am able to train a subset of transition
probabilities now!

I found something strange, first I changed my emission distributions
to untrainabledistributions and  the trainer didn't seem to be doing
anything, all cycles have the same score. I then changed it back to
SimpleDistribution (still keepting my getWeightImp in my custom
distribution). this time it works and it doesn't seem to be modifying
my emission probabilities. So it works for me - I am just curious if
it is a bug or if I was doing something wrong?

Thanks again!
wendy


On 3/31/06, Matthew Pocock <matthew.pocock at ncl.ac.uk> wrote:
> > The DP code does some caching of probabilities, I don't think there's
> > any way to turn this off without modifying the DP implementations.
> >
> >            Thomas.
>
> My reccolection is that if you did turn this off, the algorithm would run
> very, very much more slowly. Internally to the DP objects, the distribution
> probabilities (in fact, they aren't even probabilities by this stage) are
> stored in a data-structure optimized for the type of lookups performed during
> the dynamic programming recursions.
>
> Matthew
>


From wendy.wong at gmail.com  Tue Apr  4 14:22:00 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Tue, 4 Apr 2006 19:22:00 +0100
Subject: [Biojava-l] unsupervised training of transition weights
In-Reply-To: <200603311805.25861.matthew.pocock@ncl.ac.uk>
References: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
	<5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>
	<200603311805.25861.matthew.pocock@ncl.ac.uk>
Message-ID: <e554425b0604041122y2cbf4012g57f3d2069e7ca219@mail.gmail.com>

Thanks for your advice! I am able to train a subset of transition
probabilities now!

I found something strange, first I changed my emission distributions
to untrainabledistributions and  the trainer didn't seem to be doing
anything, all cycles have the same score. I then changed it back to
SimpleDistribution (still keepting my getWeightImp in my custom
distribution). this time it works and it doesn't seem to be modifying
my emission probabilities. So it works for me - I am just curious if
it is a bug or if I was doing something wrong?

Thanks again!
wendy


On 3/31/06, Matthew Pocock <matthew.pocock at ncl.ac.uk> wrote:
> > The DP code does some caching of probabilities, I don't think there's
> > any way to turn this off without modifying the DP implementations.
> >
> >            Thomas.
>
> My reccolection is that if you did turn this off, the algorithm would run
> very, very much more slowly. Internally to the DP objects, the distribution
> probabilities (in fact, they aren't even probabilities by this stage) are
> stored in a data-structure optimized for the type of lookups performed during
> the dynamic programming recursions.
>
> Matthew
>


From mthomasc at vub.ac.be  Fri Apr  7 05:20:33 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Fri, 07 Apr 2006 11:20:33 +0200
Subject: [Biojava-l] [biojavax] EMBL parser error
Message-ID: <44362EE1.5060804@vub.ac.be>

Hello,

I am currently using biojavax that I checked out today from CVS to parse 
an EMBL file, exported from EBI SRS server.

I ran into this error :

Exception in thread "main" org.biojava.bio.BioException: Could not read 
sequence
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
    at 
org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
    at 
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
    ... 1 more

The EMBL file is :

ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
XX
AC   DQ158013;
XX
SV   DQ158013.1
XX
DT   19-JAN-2006 (Rel. 86, Created)
DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
XX
DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.

Removing the two lines that comprise the date information resolves the 
problem.

Thanks,

Morgane.

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student
Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From richard.holland at ebi.ac.uk  Fri Apr  7 05:56:57 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 07 Apr 2006 10:56:57 +0100
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <44362EE1.5060804@vub.ac.be>
References: <44362EE1.5060804@vub.ac.be>
Message-ID: <1144403817.3958.30.camel@texas.ebi.ac.uk>

That was indeed a bug. I have made a change to the date parsing in
EMBLFormat and committed it to CVS. Could you test it for me please?

cheers,
Richard

On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
> Hello,
> 
> I am currently using biojavax that I checked out today from CVS to parse 
> an EMBL file, exported from EBI SRS server.
> 
> I ran into this error :
> 
> Exception in thread "main" org.biojava.bio.BioException: Could not read 
> sequence
>     at 
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>     at 
> org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
>     at 
> org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
>     at 
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>     ... 1 more
> 
> The EMBL file is :
> 
> ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> XX
> AC   DQ158013;
> XX
> SV   DQ158013.1
> XX
> DT   19-JAN-2006 (Rel. 86, Created)
> DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> XX
> DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> 
> Removing the two lines that comprise the date information resolves the 
> problem.
> 
> Thanks,
> 
> Morgane.
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From mthomasc at vub.ac.be  Fri Apr  7 08:18:36 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Fri, 07 Apr 2006 14:18:36 +0200
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <1144403817.3958.30.camel@texas.ebi.ac.uk>
References: <44362EE1.5060804@vub.ac.be>
	<1144403817.3958.30.camel@texas.ebi.ac.uk>
Message-ID: <4436589C.8010501@vub.ac.be>

I now get another error message with the same file :

Exception in thread "main" org.biojava.bio.BioException: Could not read 
sequence
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
    at 
org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
Caused by: java.lang.IndexOutOfBoundsException: No group 5
    at java.util.regex.Matcher.group(Matcher.java:355)
    at 
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
    ... 1 more

Here is the complete file, for info:

ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
XX
AC   DQ158013;
XX
SV   DQ158013.1
XX
DT   19-JAN-2006 (Rel. 86, Created)
DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
XX
DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
XX
KW   .
XX
OS   Triturus helveticus (palmate newt)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Amphibia;
OC   Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
XX
RN   [1]
RP   1-118
RX   DOI; 10.1016/j.ympev.2005.08.012.
RX   PUBMED; 16198128.
RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
RT   "A PCR survey for posterior Hox genes in amphibians";
RL   Mol. Phylogenet. Evol. 38(2):449-458(2006).
XX
RN   [2]
RP   1-118
RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
RT   ;
RL   Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
RL   Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, 
Brussels 1050,
RL   Belgium
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..118
FT                   /organism="Triturus helveticus"
FT                   /mol_type="genomic DNA"
FT                   /clone="Thel.b9"
FT                   /db_xref="taxon:256425"
FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"
FT   mRNA            <1..>118
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT   CDS             <1..>118
FT                   /codon_start=2
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
FT                   /protein_id="ABA39736.1"
FT                   /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
XX
SQ   Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
     caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc 
tcacccggga        60
     ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca 
agatctgg         118
//

Thanks for helping,

Morgane.

Richard Holland wrote:

>That was indeed a bug. I have made a change to the date parsing in
>EMBLFormat and committed it to CVS. Could you test it for me please?
>
>cheers,
>Richard
>
>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
>  
>
>>Hello,
>>
>>I am currently using biojavax that I checked out today from CVS to parse 
>>an EMBL file, exported from EBI SRS server.
>>
>>I ran into this error :
>>
>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>>sequence
>>    at 
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>>    at 
>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
>>    at 
>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
>>    at 
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>>    ... 1 more
>>
>>The EMBL file is :
>>
>>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
>>XX
>>AC   DQ158013;
>>XX
>>SV   DQ158013.1
>>XX
>>DT   19-JAN-2006 (Rel. 86, Created)
>>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
>>XX
>>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
>>
>>Removing the two lines that comprise the date information resolves the 
>>problem.
>>
>>Thanks,
>>
>>Morgane.
>>
>>    
>>

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From richard.holland at ebi.ac.uk  Fri Apr  7 08:48:46 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 07 Apr 2006 13:48:46 +0100
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <4436589C.8010501@vub.ac.be>
References: <44362EE1.5060804@vub.ac.be>
	<1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be>
Message-ID: <1144414126.3958.32.camel@texas.ebi.ac.uk>

Sorry, my bad. An off-by-one error... 

Check it out again and see if it works now.

cheers,
Richard

PS. I don't have any EMBL files to test with at the moment otherwise I'd
check it myself... :)


On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote:
> I now get another error message with the same file :
> 
> Exception in thread "main" org.biojava.bio.BioException: Could not read 
> sequence
>     at 
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>     at 
> org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> Caused by: java.lang.IndexOutOfBoundsException: No group 5
>     at java.util.regex.Matcher.group(Matcher.java:355)
>     at 
> org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
>     at 
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>     ... 1 more
> 
> Here is the complete file, for info:
> 
> ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> XX
> AC   DQ158013;
> XX
> SV   DQ158013.1
> XX
> DT   19-JAN-2006 (Rel. 86, Created)
> DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> XX
> DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> XX
> KW   .
> XX
> OS   Triturus helveticus (palmate newt)
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
> Amphibia;
> OC   Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
> XX
> RN   [1]
> RP   1-118
> RX   DOI; 10.1016/j.ympev.2005.08.012.
> RX   PUBMED; 16198128.
> RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
> RT   "A PCR survey for posterior Hox genes in amphibians";
> RL   Mol. Phylogenet. Evol. 38(2):449-458(2006).
> XX
> RN   [2]
> RP   1-118
> RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
> RT   ;
> RL   Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
> RL   Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, 
> Brussels 1050,
> RL   Belgium
> XX
> FH   Key             Location/Qualifiers
> FH
> FT   source          1..118
> FT                   /organism="Triturus helveticus"
> FT                   /mol_type="genomic DNA"
> FT                   /clone="Thel.b9"
> FT                   /db_xref="taxon:256425"
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> FT                   /codon_start=2
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
> FT                   /protein_id="ABA39736.1"
> FT                   /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
> XX
> SQ   Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
>      caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc 
> tcacccggga        60
>      ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca 
> agatctgg         118
> //
> 
> Thanks for helping,
> 
> Morgane.
> 
> Richard Holland wrote:
> 
> >That was indeed a bug. I have made a change to the date parsing in
> >EMBLFormat and committed it to CVS. Could you test it for me please?
> >
> >cheers,
> >Richard
> >
> >On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
> >  
> >
> >>Hello,
> >>
> >>I am currently using biojavax that I checked out today from CVS to parse 
> >>an EMBL file, exported from EBI SRS server.
> >>
> >>I ran into this error :
> >>
> >>Exception in thread "main" org.biojava.bio.BioException: Could not read 
> >>sequence
> >>    at 
> >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
> >>    at 
> >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> >>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
> >>    at 
> >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
> >>    at 
> >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
> >>    ... 1 more
> >>
> >>The EMBL file is :
> >>
> >>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> >>XX
> >>AC   DQ158013;
> >>XX
> >>SV   DQ158013.1
> >>XX
> >>DT   19-JAN-2006 (Rel. 86, Created)
> >>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> >>XX
> >>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> >>
> >>Removing the two lines that comprise the date information resolves the 
> >>problem.
> >>
> >>Thanks,
> >>
> >>Morgane.
> >>
> >>    
> >>
> 
> -- 
> **********************************************************
> Morgane THOMAS-CHOLLIER, PHD Student
> 
> Vrije Universiteit Brussels (VUB)
> Laboratory of Cell Genetics
> Pleinlaan 2
> 1050 Brussels
> Belgium
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From richard.holland at ebi.ac.uk  Fri Apr  7 09:42:10 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 07 Apr 2006 14:42:10 +0100
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <44366419.4050505@dbm.ulb.ac.be>
References: <44362EE1.5060804@vub.ac.be>
	<1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be>
	<1144414126.3958.32.camel@texas.ebi.ac.uk>
	<44366419.4050505@dbm.ulb.ac.be>
Message-ID: <1144417330.3958.34.camel@texas.ebi.ac.uk>

Hi. Someone else had checked in a change to a different class, but that
change was incorrect and didn't compile. It should compile now.

cheers,
Richard

PS. Note to all those who commit changes - PLEASE check your code
compiles first before committing it! 

On Fri, 2006-04-07 at 15:07 +0200, Morgane THOMAS-CHOLLIER wrote:
> I tried to checkout biojava-live but it seems I cannot build it anymore. 
> I get the following error :
> 
> compile-biojava:
>     [javac] Compiling 1321 source files to 
> /Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/ant-build/classes/biojava
>     [javac] 
> /Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/src/org/biojavax/utils/StringTools.java:97: 
> exception java.io.IOException is never thrown in body of corresponding 
> try statement
>     [javac]           } catch (IOException e) {
>     [javac]             ^
>     [javac] Note: Some input files use or override a deprecated API.
>     [javac] Note: Recompile with -deprecation for details.
>     [javac] 1 error
> 
> I use Mac OS X 10.3.9, java 1.4.2.
> 
> Hope you could help,
> 
> Cheers,
> 
> Morgane.
> 
> 
> Richard Holland wrote:
> 
> >Sorry, my bad. An off-by-one error... 
> >
> >Check it out again and see if it works now.
> >
> >cheers,
> >Richard
> >
> >PS. I don't have any EMBL files to test with at the moment otherwise I'd
> >check it myself... :)
> >
> >
> >On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote:
> >  
> >
> >>I now get another error message with the same file :
> >>
> >>Exception in thread "main" org.biojava.bio.BioException: Could not read 
> >>sequence
> >>    at 
> >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
> >>    at 
> >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> >>Caused by: java.lang.IndexOutOfBoundsException: No group 5
> >>    at java.util.regex.Matcher.group(Matcher.java:355)
> >>    at 
> >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
> >>    at 
> >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
> >>    ... 1 more
> >>
> >>Here is the complete file, for info:
> >>
> >>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> >>XX
> >>AC   DQ158013;
> >>XX
> >>SV   DQ158013.1
> >>XX
> >>DT   19-JAN-2006 (Rel. 86, Created)
> >>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> >>XX
> >>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> >>XX
> >>KW   .
> >>XX
> >>OS   Triturus helveticus (palmate newt)
> >>OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
> >>Amphibia;
> >>OC   Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
> >>XX
> >>RN   [1]
> >>RP   1-118
> >>RX   DOI; 10.1016/j.ympev.2005.08.012.
> >>RX   PUBMED; 16198128.
> >>RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
> >>RT   "A PCR survey for posterior Hox genes in amphibians";
> >>RL   Mol. Phylogenet. Evol. 38(2):449-458(2006).
> >>XX
> >>RN   [2]
> >>RP   1-118
> >>RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
> >>RT   ;
> >>RL   Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
> >>RL   Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, 
> >>Brussels 1050,
> >>RL   Belgium
> >>XX
> >>FH   Key             Location/Qualifiers
> >>FH
> >>FT   source          1..118
> >>FT                   /organism="Triturus helveticus"
> >>FT                   /mol_type="genomic DNA"
> >>FT                   /clone="Thel.b9"
> >>FT                   /db_xref="taxon:256425"
> >>FT   gene            <1..>118
> >>FT                   /gene="Hoxb9"
> >>FT                   /note="Hoxb-9"
> >>FT   mRNA            <1..>118
> >>FT                   /gene="Hoxb9"
> >>FT                   /product="HOXB9"
> >>FT   CDS             <1..>118
> >>FT                   /codon_start=2
> >>FT                   /gene="Hoxb9"
> >>FT                   /product="HOXB9"
> >>FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
> >>FT                   /protein_id="ABA39736.1"
> >>FT                   /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
> >>XX
> >>SQ   Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
> >>     caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc 
> >>tcacccggga        60
> >>     ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca 
> >>agatctgg         118
> >>//
> >>
> >>Thanks for helping,
> >>
> >>Morgane.
> >>
> >>Richard Holland wrote:
> >>
> >>    
> >>
> >>>That was indeed a bug. I have made a change to the date parsing in
> >>>EMBLFormat and committed it to CVS. Could you test it for me please?
> >>>
> >>>cheers,
> >>>Richard
> >>>
> >>>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
> >>> 
> >>>
> >>>      
> >>>
> >>>>Hello,
> >>>>
> >>>>I am currently using biojavax that I checked out today from CVS to parse 
> >>>>an EMBL file, exported from EBI SRS server.
> >>>>
> >>>>I ran into this error :
> >>>>
> >>>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
> >>>>sequence
> >>>>   at 
> >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
> >>>>   at 
> >>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> >>>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
> >>>>   at 
> >>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
> >>>>   at 
> >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
> >>>>   ... 1 more
> >>>>
> >>>>The EMBL file is :
> >>>>
> >>>>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> >>>>XX
> >>>>AC   DQ158013;
> >>>>XX
> >>>>SV   DQ158013.1
> >>>>XX
> >>>>DT   19-JAN-2006 (Rel. 86, Created)
> >>>>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> >>>>XX
> >>>>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> >>>>
> >>>>Removing the two lines that comprise the date information resolves the 
> >>>>problem.
> >>>>
> >>>>Thanks,
> >>>>
> >>>>Morgane.
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>-- 
> >>**********************************************************
> >>Morgane THOMAS-CHOLLIER, PHD Student
> >>
> >>Vrije Universiteit Brussels (VUB)
> >>Laboratory of Cell Genetics
> >>Pleinlaan 2
> >>1050 Brussels
> >>Belgium
> >>
> >>    
> >>
> 
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From andreas.draeger at clever-telefonieren.de  Fri Apr  7 11:43:35 2006
From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=)
Date: Fri, 07 Apr 2006 17:43:35 +0200
Subject: [Biojava-l] Senseless assignment
Message-ID: <443688A7.1000203@clever-telefonieren.de>

Hi,

This assignment has no effect in class 
org.biojavax.ontology.SimpleComparableTriple:

    // Hibernate requirement - not for public use.
    private void setOntology(ComparableOntology descriptors) { 
this.ontology = ontology; }

I do not know why this is necessary.

Andreas

-- 
==================================
Andreas Dr?ger
PhD student
Eberhard Karls University T?bingen
Center for Bioinformatics (ZBIT)
Phone: +49-7071-29-70436
==================================


From richard.holland at ebi.ac.uk  Mon Apr 10 05:26:51 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Mon, 10 Apr 2006 10:26:51 +0100
Subject: [Biojava-l] Senseless assignment
In-Reply-To: <443688A7.1000203@clever-telefonieren.de>
References: <443688A7.1000203@clever-telefonieren.de>
Message-ID: <1144661211.3951.9.camel@texas.ebi.ac.uk>

It's a typo. The method declaration should read:

  	// Hibernate requirement - not for public use.
	private void setOntology(ComparableOntology ontology) {
		this.ontoloy = ontology;
	}

I have fixed it in CVS.

cheers,
Richard

On Fri, 2006-04-07 at 17:43 +0200, Andreas Dr?ger wrote:
> Hi,
> 
> This assignment has no effect in class 
> org.biojavax.ontology.SimpleComparableTriple:
> 
>     // Hibernate requirement - not for public use.
>     private void setOntology(ComparableOntology descriptors) { 
> this.ontology = ontology; }
> 
> I do not know why this is necessary.
> 
> Andreas
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From mthomasc at vub.ac.be  Sat Apr  8 04:20:47 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Sat, 08 Apr 2006 10:20:47 +0200
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <1144417330.3958.34.camel@texas.ebi.ac.uk>
References: <44362EE1.5060804@vub.ac.be>	
	<1144403817.3958.30.camel@texas.ebi.ac.uk>
	<4436589C.8010501@vub.ac.be>	
	<1144414126.3958.32.camel@texas.ebi.ac.uk>
	<44366419.4050505@dbm.ulb.ac.be>
	<1144417330.3958.34.camel@texas.ebi.ac.uk>
Message-ID: <4437725F.9000503@vub.ac.be>

It works fine now !

Thanks for your help,

cheers,

Morgane.


Richard Holland wrote:

>Hi. Someone else had checked in a change to a different class, but that
>change was incorrect and didn't compile. It should compile now.
>
>cheers,
>Richard
>
>PS. Note to all those who commit changes - PLEASE check your code
>compiles first before committing it! 
>
>On Fri, 2006-04-07 at 15:07 +0200, Morgane THOMAS-CHOLLIER wrote:
>  
>
>>I tried to checkout biojava-live but it seems I cannot build it anymore. 
>>I get the following error :
>>
>>compile-biojava:
>>    [javac] Compiling 1321 source files to 
>>/Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/ant-build/classes/biojava
>>    [javac] 
>>/Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/src/org/biojavax/utils/StringTools.java:97: 
>>exception java.io.IOException is never thrown in body of corresponding 
>>try statement
>>    [javac]           } catch (IOException e) {
>>    [javac]             ^
>>    [javac] Note: Some input files use or override a deprecated API.
>>    [javac] Note: Recompile with -deprecation for details.
>>    [javac] 1 error
>>
>>I use Mac OS X 10.3.9, java 1.4.2.
>>
>>Hope you could help,
>>
>>Cheers,
>>
>>Morgane.
>>
>>
>>Richard Holland wrote:
>>
>>    
>>
>>>Sorry, my bad. An off-by-one error... 
>>>
>>>Check it out again and see if it works now.
>>>
>>>cheers,
>>>Richard
>>>
>>>PS. I don't have any EMBL files to test with at the moment otherwise I'd
>>>check it myself... :)
>>>
>>>
>>>On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote:
>>> 
>>>
>>>      
>>>
>>>>I now get another error message with the same file :
>>>>
>>>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>>>>sequence
>>>>   at 
>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>>>>   at 
>>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
>>>>Caused by: java.lang.IndexOutOfBoundsException: No group 5
>>>>   at java.util.regex.Matcher.group(Matcher.java:355)
>>>>   at 
>>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
>>>>   at 
>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>>>>   ... 1 more
>>>>
>>>>Here is the complete file, for info:
>>>>
>>>>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
>>>>XX
>>>>AC   DQ158013;
>>>>XX
>>>>SV   DQ158013.1
>>>>XX
>>>>DT   19-JAN-2006 (Rel. 86, Created)
>>>>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
>>>>XX
>>>>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
>>>>XX
>>>>KW   .
>>>>XX
>>>>OS   Triturus helveticus (palmate newt)
>>>>OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
>>>>Amphibia;
>>>>OC   Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
>>>>XX
>>>>RN   [1]
>>>>RP   1-118
>>>>RX   DOI; 10.1016/j.ympev.2005.08.012.
>>>>RX   PUBMED; 16198128.
>>>>RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
>>>>RT   "A PCR survey for posterior Hox genes in amphibians";
>>>>RL   Mol. Phylogenet. Evol. 38(2):449-458(2006).
>>>>XX
>>>>RN   [2]
>>>>RP   1-118
>>>>RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
>>>>RT   ;
>>>>RL   Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
>>>>RL   Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, 
>>>>Brussels 1050,
>>>>RL   Belgium
>>>>XX
>>>>FH   Key             Location/Qualifiers
>>>>FH
>>>>FT   source          1..118
>>>>FT                   /organism="Triturus helveticus"
>>>>FT                   /mol_type="genomic DNA"
>>>>FT                   /clone="Thel.b9"
>>>>FT                   /db_xref="taxon:256425"
>>>>FT   gene            <1..>118
>>>>FT                   /gene="Hoxb9"
>>>>FT                   /note="Hoxb-9"
>>>>FT   mRNA            <1..>118
>>>>FT                   /gene="Hoxb9"
>>>>FT                   /product="HOXB9"
>>>>FT   CDS             <1..>118
>>>>FT                   /codon_start=2
>>>>FT                   /gene="Hoxb9"
>>>>FT                   /product="HOXB9"
>>>>FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
>>>>FT                   /protein_id="ABA39736.1"
>>>>FT                   /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
>>>>XX
>>>>SQ   Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
>>>>    caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc 
>>>>tcacccggga        60
>>>>    ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca 
>>>>agatctgg         118
>>>>//
>>>>
>>>>Thanks for helping,
>>>>
>>>>Morgane.
>>>>
>>>>Richard Holland wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>That was indeed a bug. I have made a change to the date parsing in
>>>>>EMBLFormat and committed it to CVS. Could you test it for me please?
>>>>>
>>>>>cheers,
>>>>>Richard
>>>>>
>>>>>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Hello,
>>>>>>
>>>>>>I am currently using biojavax that I checked out today from CVS to parse 
>>>>>>an EMBL file, exported from EBI SRS server.
>>>>>>
>>>>>>I ran into this error :
>>>>>>
>>>>>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>>>>>>sequence
>>>>>>  at 
>>>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>>>>>>  at 
>>>>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
>>>>>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
>>>>>>  at 
>>>>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
>>>>>>  at 
>>>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>>>>>>  ... 1 more
>>>>>>
>>>>>>The EMBL file is :
>>>>>>
>>>>>>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
>>>>>>XX
>>>>>>AC   DQ158013;
>>>>>>XX
>>>>>>SV   DQ158013.1
>>>>>>XX
>>>>>>DT   19-JAN-2006 (Rel. 86, Created)
>>>>>>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
>>>>>>XX
>>>>>>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
>>>>>>
>>>>>>Removing the two lines that comprise the date information resolves the 
>>>>>>problem.
>>>>>>
>>>>>>Thanks,
>>>>>>
>>>>>>Morgane.
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>-- 
>>>>**********************************************************
>>>>Morgane THOMAS-CHOLLIER, PHD Student
>>>>
>>>>Vrije Universiteit Brussels (VUB)
>>>>Laboratory of Cell Genetics
>>>>Pleinlaan 2
>>>>1050 Brussels
>>>>Belgium
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>    
>>


-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From mthomasc at vub.ac.be  Wed Apr 12 04:34:43 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Wed, 12 Apr 2006 10:34:43 +0200
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing
Message-ID: <443CBBA3.9070101@vub.ac.be>

Hello again,

I am currently using biojavax to parse EMBL files exported from Ensembl 
website.

Compared to the EBI files I have, they show a difference in the Features 
lines :

sometimes, only one "/word" is present. ie:

EBI file :

FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"

Ensembl file;

FT   gene         complement(1..3218)
FT                   /gene="ENSMUSG00000038227"

The problem I encounter is that the parser correctly convert the "/word" 
into a Note, but the Note is then in relation with the immediate 
following feature (ie: mRNA).
The current gene feature thus has no annotation.

This behavior is reproducible when removing one "/word" of an EBI file.

Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a 
feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
with an incomplete Note, as the parser seems to split on "=" to separate 
the Key and the Value.

Thanks for your help,

Morgane.

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From jolyon.holdstock at ogt.co.uk  Thu Apr 13 12:42:36 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Thu, 13 Apr 2006 17:42:36 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>

Hi Morgane,

I have amended the EmblFormat readSection method as below and the
parsing seems to work; please test it.

I think that the last bit of annotation is carried over into the next
feature so before adding the new feature I dump the annotation and reset
currentTag and currentVal.

if (!line.startsWith(" ")) {
//--------- new code starts ---------------------------
  if (currentTag!=null) {
    section.add(new String[]{currentTag,currentVal.toString()});
    currentTag = null;
    currentVal = null;
  }
//--------- new code ends -----------------------------
// case 1 : word value - splits into key-value on its own
  section.add(line.split("\\s+"));
}

Cheers,

Jolyon


-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org
[mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
THOMAS-CHOLLIER
Sent: 12 April 2006 09:35
To: biojava-l at open-bio.org
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]

Hello again,

I am currently using biojavax to parse EMBL files exported from Ensembl 
website.

Compared to the EBI files I have, they show a difference in the Features

lines :

sometimes, only one "/word" is present. ie:

EBI file :

FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"

Ensembl file;

FT   gene         complement(1..3218)
FT                   /gene="ENSMUSG00000038227"

The problem I encounter is that the parser correctly convert the "/word"

into a Note, but the Note is then in relation with the immediate 
following feature (ie: mRNA).
The current gene feature thus has no annotation.

This behavior is reproducible when removing one "/word" of an EBI file.

Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a

feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
with an incomplete Note, as the parser seems to split on "=" to separate

the Key and the Value.

Thanks for your help,

Morgane.

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


From david at autohandle.com  Fri Apr 14 17:29:51 2006
From: david at autohandle.com (David Scott)
Date: Fri, 14 Apr 2006 14:29:51 -0700
Subject: [Biojava-l] BioJavaX.html
Message-ID: <4440144F.7010603@autohandle.com>

is BioJavaX.html posted somewhere - i am getting an 
ArrayIndexOutofBoundException on the build.

thanks

From david at autohandle.com  Fri Apr 14 17:20:47 2006
From: david at autohandle.com (David Scott)
Date: Fri, 14 Apr 2006 14:20:47 -0700
Subject: [Biojava-l] BioJavaX.html
Message-ID: <4440122F.2080809@autohandle.com>

is it possible to post the BioJavaX.html somewhere - i am getting an 
ArrayIndexOutOfBoundsException on the build  docbook. i used google - 
but could not locate it.

thanks-


From mark.schreiber at novartis.com  Sat Apr 15 19:19:13 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Sun, 16 Apr 2006 07:19:13 +0800
Subject: [Biojava-l] BioJavaX.html
Message-ID: <OF7DC239B8.70535343-ON48257151.007FC85C-48257151.00801C4C@EU.novartis.net>

Could someone post the text to the wiki site temporarily. Actually it may 
be more sensible for this document to be hosted as a wiki page. The wiki 
was not available at the time that Richard wrote it so moving it may be a 
good idea. Any objections?

Additionally some platforms have trouble building docbook html from ant 
(especially platforms developed in Redmond WA which we don't speak of).

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


David Scott <david at autohandle.com>
Sent by: biojava-l-bounces at lists.open-bio.org
04/15/2006 05:20 AM

 
        To:     biojava-l at biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BioJavaX.html


is it possible to post the BioJavaX.html somewhere - i am getting an 
ArrayIndexOutOfBoundsException on the build  docbook. i used google - 
but could not locate it.

thanks-

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From richard.holland at ebi.ac.uk  Tue Apr 18 05:21:49 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 18 Apr 2006 10:21:49 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing
In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>
References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>
Message-ID: <1145352109.4188.3.camel@texas.ebi.ac.uk>

I have committed an UNTESTED patch based on Jolyon's suggestion, and
also attempted to fix the split-on-equals problem Morgane observed. 

Please let me know if there are any problems with it.

As this problem affected the UniProt parser in a similar manner (much of
the code is identical), the same fixes were applied there too.

cheers,
Richard

On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> Hi Morgane,
> 
> I have amended the EmblFormat readSection method as below and the
> parsing seems to work; please test it.
> 
> I think that the last bit of annotation is carried over into the next
> feature so before adding the new feature I dump the annotation and reset
> currentTag and currentVal.
> 
> if (!line.startsWith(" ")) {
> //--------- new code starts ---------------------------
>   if (currentTag!=null) {
>     section.add(new String[]{currentTag,currentVal.toString()});
>     currentTag = null;
>     currentVal = null;
>   }
> //--------- new code ends -----------------------------
> // case 1 : word value - splits into key-value on its own
>   section.add(line.split("\\s+"));
> }
> 
> Cheers,
> 
> Jolyon
> 
> 
> 
> -----Original Message-----
> From: biojava-l-bounces at lists.open-bio.org
> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
> THOMAS-CHOLLIER
> Sent: 12 April 2006 09:35
> To: biojava-l at open-bio.org
> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
> 
> Hello again,
> 
> I am currently using biojavax to parse EMBL files exported from Ensembl 
> website.
> 
> Compared to the EBI files I have, they show a difference in the Features
> 
> lines :
> 
> sometimes, only one "/word" is present. ie:
> 
> EBI file :
> 
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> 
> Ensembl file;
> 
> FT   gene         complement(1..3218)
> FT                   /gene="ENSMUSG00000038227"
> 
> The problem I encounter is that the parser correctly convert the "/word"
> 
> into a Note, but the Note is then in relation with the immediate 
> following feature (ie: mRNA).
> The current gene feature thus has no annotation.
> 
> This behavior is reproducible when removing one "/word" of an EBI file.
> 
> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a
> 
> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
> with an incomplete Note, as the parser seems to split on "=" to separate
> 
> the Key and the Value.
> 
> Thanks for your help,
> 
> Morgane.
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From richard.holland at ebi.ac.uk  Tue Apr 18 04:20:44 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 18 Apr 2006 09:20:44 +0100
Subject: [Biojava-l] BioJavaX.html
In-Reply-To: <OF7DC239B8.70535343-ON48257151.007FC85C-48257151.00801C4C@EU.novartis.net>
References: <OF7DC239B8.70535343-ON48257151.007FC85C-48257151.00801C4C@EU.novartis.net>
Message-ID: <1145348444.4188.0.camel@texas.ebi.ac.uk>

HTML version attached. I've created a placeholder on the BioJava website
- could someone convert it who has the time? :)

cheers,
Richard


On Sun, 2006-04-16 at 07:19 +0800, mark.schreiber at novartis.com wrote:
> Could someone post the text to the wiki site temporarily. Actually it may 
> be more sensible for this document to be hosted as a wiki page. The wiki 
> was not available at the time that Richard wrote it so moving it may be a 
> good idea. Any objections?
> 
> Additionally some platforms have trouble building docbook html from ant 
> (especially platforms developed in Redmond WA which we don't speak of).
> 
> - Mark
> 
> Mark Schreiber
> Research Investigator (Bioinformatics)
> 
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
> 
> phone +65 6722 2973
> fax  +65 6722 2910
> 
> 
> 
> 
> 
> David Scott <david at autohandle.com>
> Sent by: biojava-l-bounces at lists.open-bio.org
> 04/15/2006 05:20 AM
> 
>  
>         To:     biojava-l at biojava.org
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] BioJavaX.html
> 
> 
> is it possible to post the BioJavaX.html somewhere - i am getting an 
> ArrayIndexOutOfBoundsException on the build  docbook. i used google - 
> but could not locate it.
> 
> thanks-
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/biojava-l/attachments/20060418/f6e5bb6b/attachment-0001.html 

From J.L.Sharman at sms.ed.ac.uk  Wed Apr 19 05:35:14 2006
From: J.L.Sharman at sms.ed.ac.uk (Joanna Sharman)
Date: Wed, 19 Apr 2006 10:35:14 +0100
Subject: [Biojava-l] Pairwise Alignment
Message-ID: <20060419103514.rwtqmzy00k0ogog8@www.sms.ed.ac.uk>

Hello,

I'm new to BioJava so I'm sorry if this question has been asked several
times before.

This is actually sort of in reply to this message from last month:

http://lists.open-bio.org/pipermail/biojava-l/2006-March/005365.html

I'd like to perform a simple pairwise alignment using the
Smith-Waterman class I saw described here:

http://www.biojava.org/wiki/BioJava:CookBook:DP:PairWise2

but I can't find the classes it mentions anywhere on the cvs.  Can you
point me to where they are?

Also, I'm just wondering why the HMM method is preferred to the
Smith-Waterman (or others)?  It seems quite complicated to me, and like
it might require more memory, or am I wrong? :)

Cheers,
Joanna


From mthomasc at vub.ac.be  Thu Apr 20 05:35:54 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Thu, 20 Apr 2006 11:35:54 +0200
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing
In-Reply-To: <1145352109.4188.3.camel@texas.ebi.ac.uk>
References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>
	<1145352109.4188.3.camel@texas.ebi.ac.uk>
Message-ID: <444755FA.7030009@vub.ac.be>

Hi,

I have tested today's version from CVS.

Both EBI and Ensembl files now react the same way.
The last annotation of a feature is nevertheless related to its 
immediate following feature.
e.g. :

FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"
FT   mRNA            <1..>118
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT   CDS             <1..>118

/note="Hoxb-9" is related to mRNA
/product="HOXB9" is related to CDS

Concerning the split-on-equals problem, I still observe the problem :

 [(#2) biojavax:note: transcript_i]

for this annotation :  /note="transcript_id=ENSMUST00000048680"

Thanks for helping,

Cheers,

Morgane.

Richard Holland wrote:
> I have committed an UNTESTED patch based on Jolyon's suggestion, and
> also attempted to fix the split-on-equals problem Morgane observed. 
>
> Please let me know if there are any problems with it.
>
> As this problem affected the UniProt parser in a similar manner (much of
> the code is identical), the same fixes were applied there too.
>
> cheers,
> Richard
>
> On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
>   
>> Hi Morgane,
>>
>> I have amended the EmblFormat readSection method as below and the
>> parsing seems to work; please test it.
>>
>> I think that the last bit of annotation is carried over into the next
>> feature so before adding the new feature I dump the annotation and reset
>> currentTag and currentVal.
>>
>> if (!line.startsWith(" ")) {
>> //--------- new code starts ---------------------------
>>   if (currentTag!=null) {
>>     section.add(new String[]{currentTag,currentVal.toString()});
>>     currentTag = null;
>>     currentVal = null;
>>   }
>> //--------- new code ends -----------------------------
>> // case 1 : word value - splits into key-value on its own
>>   section.add(line.split("\\s+"));
>> }
>>
>> Cheers,
>>
>> Jolyon
>>
>>
>>
>> -----Original Message-----
>> From: biojava-l-bounces at lists.open-bio.org
>> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
>> THOMAS-CHOLLIER
>> Sent: 12 April 2006 09:35
>> To: biojava-l at open-bio.org
>> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
>>
>> Hello again,
>>
>> I am currently using biojavax to parse EMBL files exported from Ensembl 
>> website.
>>
>> Compared to the EBI files I have, they show a difference in the Features
>>
>> lines :
>>
>> sometimes, only one "/word" is present. ie:
>>
>> EBI file :
>>
>> FT   gene            <1..>118
>> FT                   /gene="Hoxb9"
>> FT                   /note="Hoxb-9"
>>
>> Ensembl file;
>>
>> FT   gene         complement(1..3218)
>> FT                   /gene="ENSMUSG00000038227"
>>
>> The problem I encounter is that the parser correctly convert the "/word"
>>
>> into a Note, but the Note is then in relation with the immediate 
>> following feature (ie: mRNA).
>> The current gene feature thus has no annotation.
>>
>> This behavior is reproducible when removing one "/word" of an EBI file.
>>
>> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a
>>
>> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
>> with an incomplete Note, as the parser seems to split on "=" to separate
>>
>> the Key and the Value.
>>
>> Thanks for your help,
>>
>> Morgane.
>>
>>     

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc at vub.ac.be)

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium

Tel : +32 2 629 15 22
**********************************************************
Stop Using Internet Explorer, choose FIREFOX !


From richard.holland at ebi.ac.uk  Thu Apr 20 08:05:00 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 20 Apr 2006 13:05:00 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing
In-Reply-To: <444755FA.7030009@vub.ac.be>
References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>
	<1145352109.4188.3.camel@texas.ebi.ac.uk> <444755FA.7030009@vub.ac.be>
Message-ID: <1145534700.4188.28.camel@texas.ebi.ac.uk>

Hi.

I made some small changes to the code, although nothing that would fix
this kind of problem, committed it back to CVS, checked it out again,
compiled, and ran a test program that read in an EMBL file with the
feature table you describe below, and output it in EMBL format to
another file. I then compared the two files... and found no differences!
The split-on-equals problem didn't occur, and all notes appeared
alongside their correct features.

Could there be a problem maybe with the script you are using?

I've really no idea what the problem is as I can't reproduce it based on
the current CVS contents!

cheers,
Richard

On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
> Hi,
> 
> I have tested today's version from CVS.
> 
> Both EBI and Ensembl files now react the same way.
> The last annotation of a feature is nevertheless related to its 
> immediate following feature.
> e.g. :
> 
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> 
> /note="Hoxb-9" is related to mRNA
> /product="HOXB9" is related to CDS
> 
> Concerning the split-on-equals problem, I still observe the problem :
> 
>  [(#2) biojavax:note: transcript_i]
> 
> for this annotation :  /note="transcript_id=ENSMUST00000048680"
> 
> Thanks for helping,
> 
> Cheers,
> 
> Morgane.
> 
> Richard Holland wrote:
> > I have committed an UNTESTED patch based on Jolyon's suggestion, and
> > also attempted to fix the split-on-equals problem Morgane observed. 
> >
> > Please let me know if there are any problems with it.
> >
> > As this problem affected the UniProt parser in a similar manner (much of
> > the code is identical), the same fixes were applied there too.
> >
> > cheers,
> > Richard
> >
> > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> >   
> >> Hi Morgane,
> >>
> >> I have amended the EmblFormat readSection method as below and the
> >> parsing seems to work; please test it.
> >>
> >> I think that the last bit of annotation is carried over into the next
> >> feature so before adding the new feature I dump the annotation and reset
> >> currentTag and currentVal.
> >>
> >> if (!line.startsWith(" ")) {
> >> //--------- new code starts ---------------------------
> >>   if (currentTag!=null) {
> >>     section.add(new String[]{currentTag,currentVal.toString()});
> >>     currentTag = null;
> >>     currentVal = null;
> >>   }
> >> //--------- new code ends -----------------------------
> >> // case 1 : word value - splits into key-value on its own
> >>   section.add(line.split("\\s+"));
> >> }
> >>
> >> Cheers,
> >>
> >> Jolyon
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: biojava-l-bounces at lists.open-bio.org
> >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
> >> THOMAS-CHOLLIER
> >> Sent: 12 April 2006 09:35
> >> To: biojava-l at open-bio.org
> >> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
> >>
> >> Hello again,
> >>
> >> I am currently using biojavax to parse EMBL files exported from Ensembl 
> >> website.
> >>
> >> Compared to the EBI files I have, they show a difference in the Features
> >>
> >> lines :
> >>
> >> sometimes, only one "/word" is present. ie:
> >>
> >> EBI file :
> >>
> >> FT   gene            <1..>118
> >> FT                   /gene="Hoxb9"
> >> FT                   /note="Hoxb-9"
> >>
> >> Ensembl file;
> >>
> >> FT   gene         complement(1..3218)
> >> FT                   /gene="ENSMUSG00000038227"
> >>
> >> The problem I encounter is that the parser correctly convert the "/word"
> >>
> >> into a Note, but the Note is then in relation with the immediate 
> >> following feature (ie: mRNA).
> >> The current gene feature thus has no annotation.
> >>
> >> This behavior is reproducible when removing one "/word" of an EBI file.
> >>
> >> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a
> >>
> >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
> >> with an incomplete Note, as the parser seems to split on "=" to separate
> >>
> >> the Key and the Value.
> >>
> >> Thanks for your help,
> >>
> >> Morgane.
> >>
> >>     
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From jolyon.holdstock at ogt.co.uk  Thu Apr 20 08:08:40 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Thu, 20 Apr 2006 13:08:40 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com>

I've run the sequence through the parser and it seems to work OK. I
iterate through the features and then iterate through the annotations of
that feature

Based on the input....

FT   source          1..118
FT                   /organism="Triturus helveticus"
FT                   /mol_type="genomic DNA"
FT                   /clone="Thel.b9"
FT                   /db_xref="taxon:256425"
FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"
FT   mRNA            <1..>118
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT   CDS             <1..>118
FT                   /codon_start=2
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
FT                   /protein_id="ABA39736.1"
FT
/translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"

The output is....

========================================
Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118)
Note: (#0) biojavax:mol_type: genomic DNA
Note: (#1) biojavax:clone: Thel.b9
========================================
Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>)
Note: (#2) biojavax:gene: Hoxb9
Note: (#3) biojavax:note: Hoxb-9
========================================
Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>)
Note: (#4) biojavax:gene: Hoxb9
Note: (#5) biojavax:product: HOXB9
========================================
Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>)
Note: (#6) biojavax:codon_start: 2
Note: (#7) biojavax:gene: Hoxb9
Note: (#8) biojavax:product: HOXB9
Note: (#9) biojavax:protein_id: ABA39736.1
Note: (#10) biojavax:translation:
KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
Note: (#11) biojavax:translation:
KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
=============================================

This looks OK, the one thing I've just noticed is that the last piece of
annotation of the last feature is assigned twice.

Jolyon


-----Original Message-----
From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
Sent: 20 April 2006 13:05
To: mthomas at dbm.ulb.ac.be
Cc: Jolyon Holdstock; biojava-l at open-bio.org
Subject: Re: [Biojava-l] [biojavax] EMBL parser : features
parsing[Scanned]

Hi.

I made some small changes to the code, although nothing that would fix
this kind of problem, committed it back to CVS, checked it out again,
compiled, and ran a test program that read in an EMBL file with the
feature table you describe below, and output it in EMBL format to
another file. I then compared the two files... and found no differences!
The split-on-equals problem didn't occur, and all notes appeared
alongside their correct features.

Could there be a problem maybe with the script you are using?

I've really no idea what the problem is as I can't reproduce it based on
the current CVS contents!

cheers,
Richard

On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
> Hi,
> 
> I have tested today's version from CVS.
> 
> Both EBI and Ensembl files now react the same way.
> The last annotation of a feature is nevertheless related to its 
> immediate following feature.
> e.g. :
> 
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> 
> /note="Hoxb-9" is related to mRNA
> /product="HOXB9" is related to CDS
> 
> Concerning the split-on-equals problem, I still observe the problem :
> 
>  [(#2) biojavax:note: transcript_i]
> 
> for this annotation :  /note="transcript_id=ENSMUST00000048680"
> 
> Thanks for helping,
> 
> Cheers,
> 
> Morgane.
> 
> Richard Holland wrote:
> > I have committed an UNTESTED patch based on Jolyon's suggestion, and
> > also attempted to fix the split-on-equals problem Morgane observed. 
> >
> > Please let me know if there are any problems with it.
> >
> > As this problem affected the UniProt parser in a similar manner
(much of
> > the code is identical), the same fixes were applied there too.
> >
> > cheers,
> > Richard
> >
> > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> >   
> >> Hi Morgane,
> >>
> >> I have amended the EmblFormat readSection method as below and the
> >> parsing seems to work; please test it.
> >>
> >> I think that the last bit of annotation is carried over into the
next
> >> feature so before adding the new feature I dump the annotation and
reset
> >> currentTag and currentVal.
> >>
> >> if (!line.startsWith(" ")) {
> >> //--------- new code starts ---------------------------
> >>   if (currentTag!=null) {
> >>     section.add(new String[]{currentTag,currentVal.toString()});
> >>     currentTag = null;
> >>     currentVal = null;
> >>   }
> >> //--------- new code ends -----------------------------
> >> // case 1 : word value - splits into key-value on its own
> >>   section.add(line.split("\\s+"));
> >> }
> >>
> >> Cheers,
> >>
> >> Jolyon
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: biojava-l-bounces at lists.open-bio.org
> >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
> >> THOMAS-CHOLLIER
> >> Sent: 12 April 2006 09:35
> >> To: biojava-l at open-bio.org
> >> Subject: [Biojava-l] [biojavax] EMBL parser : features
parsing[Scanned]
> >>
> >> Hello again,
> >>
> >> I am currently using biojavax to parse EMBL files exported from
Ensembl 
> >> website.
> >>
> >> Compared to the EBI files I have, they show a difference in the
Features
> >>
> >> lines :
> >>
> >> sometimes, only one "/word" is present. ie:
> >>
> >> EBI file :
> >>
> >> FT   gene            <1..>118
> >> FT                   /gene="Hoxb9"
> >> FT                   /note="Hoxb-9"
> >>
> >> Ensembl file;
> >>
> >> FT   gene         complement(1..3218)
> >> FT                   /gene="ENSMUSG00000038227"
> >>
> >> The problem I encounter is that the parser correctly convert the
"/word"
> >>
> >> into a Note, but the Note is then in relation with the immediate 
> >> following feature (ie: mRNA).
> >> The current gene feature thus has no annotation.
> >>
> >> This behavior is reproducible when removing one "/word" of an EBI
file.
> >>
> >> Apart from this issue, I noted that Ensembl EMBL files uses "="
inside a
> >>
> >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends
up 
> >> with an incomplete Note, as the parser seems to split on "=" to
separate
> >>
> >> the Key and the Value.
> >>
> >> Thanks for your help,
> >>
> >> Morgane.
> >>
> >>     
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


From richard.holland at ebi.ac.uk  Thu Apr 20 08:16:00 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 20 Apr 2006 13:16:00 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com>
References: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com>
Message-ID: <1145535361.4188.33.camel@texas.ebi.ac.uk>

Did you use the latest CVS version? (I committed a change that I think
should have fixed that about 1 minute before my previous email).


On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote:
> I've run the sequence through the parser and it seems to work OK. I
> iterate through the features and then iterate through the annotations of
> that feature
> 
> Based on the input....
> 
> FT   source          1..118
> FT                   /organism="Triturus helveticus"
> FT                   /mol_type="genomic DNA"
> FT                   /clone="Thel.b9"
> FT                   /db_xref="taxon:256425"
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> FT                   /codon_start=2
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
> FT                   /protein_id="ABA39736.1"
> FT
> /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
> 
> The output is....
> 
> ========================================
> Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118)
> Note: (#0) biojavax:mol_type: genomic DNA
> Note: (#1) biojavax:clone: Thel.b9
> ========================================
> Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>)
> Note: (#2) biojavax:gene: Hoxb9
> Note: (#3) biojavax:note: Hoxb-9
> ========================================
> Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>)
> Note: (#4) biojavax:gene: Hoxb9
> Note: (#5) biojavax:product: HOXB9
> ========================================
> Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>)
> Note: (#6) biojavax:codon_start: 2
> Note: (#7) biojavax:gene: Hoxb9
> Note: (#8) biojavax:product: HOXB9
> Note: (#9) biojavax:protein_id: ABA39736.1
> Note: (#10) biojavax:translation:
> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
> Note: (#11) biojavax:translation:
> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
> =============================================
> 
> This looks OK, the one thing I've just noticed is that the last piece of
> annotation of the last feature is assigned twice.
> 
> Jolyon
> 
> 
> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
> Sent: 20 April 2006 13:05
> To: mthomas at dbm.ulb.ac.be
> Cc: Jolyon Holdstock; biojava-l at open-bio.org
> Subject: Re: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
> 
> Hi.
> 
> I made some small changes to the code, although nothing that would fix
> this kind of problem, committed it back to CVS, checked it out again,
> compiled, and ran a test program that read in an EMBL file with the
> feature table you describe below, and output it in EMBL format to
> another file. I then compared the two files... and found no differences!
> The split-on-equals problem didn't occur, and all notes appeared
> alongside their correct features.
> 
> Could there be a problem maybe with the script you are using?
> 
> I've really no idea what the problem is as I can't reproduce it based on
> the current CVS contents!
> 
> cheers,
> Richard
> 
> On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
> > Hi,
> > 
> > I have tested today's version from CVS.
> > 
> > Both EBI and Ensembl files now react the same way.
> > The last annotation of a feature is nevertheless related to its 
> > immediate following feature.
> > e.g. :
> > 
> > FT   gene            <1..>118
> > FT                   /gene="Hoxb9"
> > FT                   /note="Hoxb-9"
> > FT   mRNA            <1..>118
> > FT                   /gene="Hoxb9"
> > FT                   /product="HOXB9"
> > FT   CDS             <1..>118
> > 
> > /note="Hoxb-9" is related to mRNA
> > /product="HOXB9" is related to CDS
> > 
> > Concerning the split-on-equals problem, I still observe the problem :
> > 
> >  [(#2) biojavax:note: transcript_i]
> > 
> > for this annotation :  /note="transcript_id=ENSMUST00000048680"
> > 
> > Thanks for helping,
> > 
> > Cheers,
> > 
> > Morgane.
> > 
> > Richard Holland wrote:
> > > I have committed an UNTESTED patch based on Jolyon's suggestion, and
> > > also attempted to fix the split-on-equals problem Morgane observed. 
> > >
> > > Please let me know if there are any problems with it.
> > >
> > > As this problem affected the UniProt parser in a similar manner
> (much of
> > > the code is identical), the same fixes were applied there too.
> > >
> > > cheers,
> > > Richard
> > >
> > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> > >   
> > >> Hi Morgane,
> > >>
> > >> I have amended the EmblFormat readSection method as below and the
> > >> parsing seems to work; please test it.
> > >>
> > >> I think that the last bit of annotation is carried over into the
> next
> > >> feature so before adding the new feature I dump the annotation and
> reset
> > >> currentTag and currentVal.
> > >>
> > >> if (!line.startsWith(" ")) {
> > >> //--------- new code starts ---------------------------
> > >>   if (currentTag!=null) {
> > >>     section.add(new String[]{currentTag,currentVal.toString()});
> > >>     currentTag = null;
> > >>     currentVal = null;
> > >>   }
> > >> //--------- new code ends -----------------------------
> > >> // case 1 : word value - splits into key-value on its own
> > >>   section.add(line.split("\\s+"));
> > >> }
> > >>
> > >> Cheers,
> > >>
> > >> Jolyon
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: biojava-l-bounces at lists.open-bio.org
> > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
> > >> THOMAS-CHOLLIER
> > >> Sent: 12 April 2006 09:35
> > >> To: biojava-l at open-bio.org
> > >> Subject: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
> > >>
> > >> Hello again,
> > >>
> > >> I am currently using biojavax to parse EMBL files exported from
> Ensembl 
> > >> website.
> > >>
> > >> Compared to the EBI files I have, they show a difference in the
> Features
> > >>
> > >> lines :
> > >>
> > >> sometimes, only one "/word" is present. ie:
> > >>
> > >> EBI file :
> > >>
> > >> FT   gene            <1..>118
> > >> FT                   /gene="Hoxb9"
> > >> FT                   /note="Hoxb-9"
> > >>
> > >> Ensembl file;
> > >>
> > >> FT   gene         complement(1..3218)
> > >> FT                   /gene="ENSMUSG00000038227"
> > >>
> > >> The problem I encounter is that the parser correctly convert the
> "/word"
> > >>
> > >> into a Note, but the Note is then in relation with the immediate 
> > >> following feature (ie: mRNA).
> > >> The current gene feature thus has no annotation.
> > >>
> > >> This behavior is reproducible when removing one "/word" of an EBI
> file.
> > >>
> > >> Apart from this issue, I noted that Ensembl EMBL files uses "="
> inside a
> > >>
> > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends
> up 
> > >> with an incomplete Note, as the parser seems to split on "=" to
> separate
> > >>
> > >> the Key and the Value.
> > >>
> > >> Thanks for your help,
> > >>
> > >> Morgane.
> > >>
> > >>     
> > 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mthomasc at vub.ac.be  Thu Apr 20 08:30:10 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Thu, 20 Apr 2006 14:30:10 +0200
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Resolved]
In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com>
References: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com>
Message-ID: <44477ED2.2010200@vub.ac.be>

I've just updated my sources few minutes ago and everything works fine 
now (both annotations and split-on-equals problem).

I've tested both the EBI file and Ensembl file.

Thanks for fixing the problems !!

Cheers,

Morgane

Jolyon Holdstock wrote:
> No, I'll update my source.
>
> Thanks,
>
> Jolyon
>
>
> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
> Sent: 20 April 2006 13:16
> To: Jolyon Holdstock
> Cc: mthomas at dbm.ulb.ac.be; biojava-l at open-bio.org
> Subject: RE: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
>
> Did you use the latest CVS version? (I committed a change that I think
> should have fixed that about 1 minute before my previous email).
>
>
> On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote:
>   
>> I've run the sequence through the parser and it seems to work OK. I
>> iterate through the features and then iterate through the annotations
>>     
> of
>   
>> that feature
>>
>> Based on the input....
>>
>> FT   source          1..118
>> FT                   /organism="Triturus helveticus"
>> FT                   /mol_type="genomic DNA"
>> FT                   /clone="Thel.b9"
>> FT                   /db_xref="taxon:256425"
>> FT   gene            <1..>118
>> FT                   /gene="Hoxb9"
>> FT                   /note="Hoxb-9"
>> FT   mRNA            <1..>118
>> FT                   /gene="Hoxb9"
>> FT                   /product="HOXB9"
>> FT   CDS             <1..>118
>> FT                   /codon_start=2
>> FT                   /gene="Hoxb9"
>> FT                   /product="HOXB9"
>> FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
>> FT                   /protein_id="ABA39736.1"
>> FT
>> /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
>>
>> The output is....
>>
>> ========================================
>> Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118)
>> Note: (#0) biojavax:mol_type: genomic DNA
>> Note: (#1) biojavax:clone: Thel.b9
>> ========================================
>> Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>)
>> Note: (#2) biojavax:gene: Hoxb9
>> Note: (#3) biojavax:note: Hoxb-9
>> ========================================
>> Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>)
>> Note: (#4) biojavax:gene: Hoxb9
>> Note: (#5) biojavax:product: HOXB9
>> ========================================
>> Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>)
>> Note: (#6) biojavax:codon_start: 2
>> Note: (#7) biojavax:gene: Hoxb9
>> Note: (#8) biojavax:product: HOXB9
>> Note: (#9) biojavax:protein_id: ABA39736.1
>> Note: (#10) biojavax:translation:
>> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
>> Note: (#11) biojavax:translation:
>> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
>> =============================================
>>
>> This looks OK, the one thing I've just noticed is that the last piece
>>     
> of
>   
>> annotation of the last feature is assigned twice.
>>
>> Jolyon
>>
>>
>> -----Original Message-----
>> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
>> Sent: 20 April 2006 13:05
>> To: mthomas at dbm.ulb.ac.be
>> Cc: Jolyon Holdstock; biojava-l at open-bio.org
>> Subject: Re: [Biojava-l] [biojavax] EMBL parser : features
>> parsing[Scanned]
>>
>> Hi.
>>
>> I made some small changes to the code, although nothing that would fix
>> this kind of problem, committed it back to CVS, checked it out again,
>> compiled, and ran a test program that read in an EMBL file with the
>> feature table you describe below, and output it in EMBL format to
>> another file. I then compared the two files... and found no
>>     
> differences!
>   
>> The split-on-equals problem didn't occur, and all notes appeared
>> alongside their correct features.
>>
>> Could there be a problem maybe with the script you are using?
>>
>> I've really no idea what the problem is as I can't reproduce it based
>>     
> on
>   
>> the current CVS contents!
>>
>> cheers,
>> Richard
>>
>> On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
>>     
>>> Hi,
>>>
>>> I have tested today's version from CVS.
>>>
>>> Both EBI and Ensembl files now react the same way.
>>> The last annotation of a feature is nevertheless related to its 
>>> immediate following feature.
>>> e.g. :
>>>
>>> FT   gene            <1..>118
>>> FT                   /gene="Hoxb9"
>>> FT                   /note="Hoxb-9"
>>> FT   mRNA            <1..>118
>>> FT                   /gene="Hoxb9"
>>> FT                   /product="HOXB9"
>>> FT   CDS             <1..>118
>>>
>>> /note="Hoxb-9" is related to mRNA
>>> /product="HOXB9" is related to CDS
>>>
>>> Concerning the split-on-equals problem, I still observe the problem
>>>       
> :
>   
>>>  [(#2) biojavax:note: transcript_i]
>>>
>>> for this annotation :  /note="transcript_id=ENSMUST00000048680"
>>>
>>> Thanks for helping,
>>>
>>> Cheers,
>>>
>>> Morgane.
>>>
>>> Richard Holland wrote:
>>>       
>>>> I have committed an UNTESTED patch based on Jolyon's suggestion,
>>>>         
> and
>   
>>>> also attempted to fix the split-on-equals problem Morgane
>>>>         
> observed. 
>   
>>>> Please let me know if there are any problems with it.
>>>>
>>>> As this problem affected the UniProt parser in a similar manner
>>>>         
>> (much of
>>     
>>>> the code is identical), the same fixes were applied there too.
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
>>>>   
>>>>         
>>>>> Hi Morgane,
>>>>>
>>>>> I have amended the EmblFormat readSection method as below and the
>>>>> parsing seems to work; please test it.
>>>>>
>>>>> I think that the last bit of annotation is carried over into the
>>>>>           
>> next
>>     
>>>>> feature so before adding the new feature I dump the annotation
>>>>>           
> and
>   
>> reset
>>     
>>>>> currentTag and currentVal.
>>>>>
>>>>> if (!line.startsWith(" ")) {
>>>>> //--------- new code starts ---------------------------
>>>>>   if (currentTag!=null) {
>>>>>     section.add(new String[]{currentTag,currentVal.toString()});
>>>>>     currentTag = null;
>>>>>     currentVal = null;
>>>>>   }
>>>>> //--------- new code ends -----------------------------
>>>>> // case 1 : word value - splits into key-value on its own
>>>>>   section.add(line.split("\\s+"));
>>>>> }
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jolyon
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: biojava-l-bounces at lists.open-bio.org
>>>>> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>           
> Morgane
>   
>>>>> THOMAS-CHOLLIER
>>>>> Sent: 12 April 2006 09:35
>>>>> To: biojava-l at open-bio.org
>>>>> Subject: [Biojava-l] [biojavax] EMBL parser : features
>>>>>           
>> parsing[Scanned]
>>     
>>>>> Hello again,
>>>>>
>>>>> I am currently using biojavax to parse EMBL files exported from
>>>>>           
>> Ensembl 
>>     
>>>>> website.
>>>>>
>>>>> Compared to the EBI files I have, they show a difference in the
>>>>>           
>> Features
>>     
>>>>> lines :
>>>>>
>>>>> sometimes, only one "/word" is present. ie:
>>>>>
>>>>> EBI file :
>>>>>
>>>>> FT   gene            <1..>118
>>>>> FT                   /gene="Hoxb9"
>>>>> FT                   /note="Hoxb-9"
>>>>>
>>>>> Ensembl file;
>>>>>
>>>>> FT   gene         complement(1..3218)
>>>>> FT                   /gene="ENSMUSG00000038227"
>>>>>
>>>>> The problem I encounter is that the parser correctly convert the
>>>>>           
>> "/word"
>>     
>>>>> into a Note, but the Note is then in relation with the immediate 
>>>>> following feature (ie: mRNA).
>>>>> The current gene feature thus has no annotation.
>>>>>
>>>>> This behavior is reproducible when removing one "/word" of an EBI
>>>>>           
>> file.
>>     
>>>>> Apart from this issue, I noted that Ensembl EMBL files uses "="
>>>>>           
>> inside a
>>     
>>>>> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends
>>>>>           
>> up 
>>     
>>>>> with an incomplete Note, as the parser seems to split on "=" to
>>>>>           
>> separate
>>     
>>>>> the Key and the Value.
>>>>>
>>>>> Thanks for your help,
>>>>>
>>>>> Morgane.
>>>>>
>>>>>     
>>>>>           


-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc at vub.ac.be)

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From jolyon.holdstock at ogt.co.uk  Thu Apr 20 08:18:21 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Thu, 20 Apr 2006 13:18:21 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com>

No, I'll update my source.

Thanks,

Jolyon


-----Original Message-----
From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
Sent: 20 April 2006 13:16
To: Jolyon Holdstock
Cc: mthomas at dbm.ulb.ac.be; biojava-l at open-bio.org
Subject: RE: [Biojava-l] [biojavax] EMBL parser : features
parsing[Scanned]

Did you use the latest CVS version? (I committed a change that I think
should have fixed that about 1 minute before my previous email).


On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote:
> I've run the sequence through the parser and it seems to work OK. I
> iterate through the features and then iterate through the annotations
of
> that feature
> 
> Based on the input....
> 
> FT   source          1..118
> FT                   /organism="Triturus helveticus"
> FT                   /mol_type="genomic DNA"
> FT                   /clone="Thel.b9"
> FT                   /db_xref="taxon:256425"
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> FT                   /codon_start=2
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
> FT                   /protein_id="ABA39736.1"
> FT
> /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
> 
> The output is....
> 
> ========================================
> Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118)
> Note: (#0) biojavax:mol_type: genomic DNA
> Note: (#1) biojavax:clone: Thel.b9
> ========================================
> Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>)
> Note: (#2) biojavax:gene: Hoxb9
> Note: (#3) biojavax:note: Hoxb-9
> ========================================
> Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>)
> Note: (#4) biojavax:gene: Hoxb9
> Note: (#5) biojavax:product: HOXB9
> ========================================
> Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>)
> Note: (#6) biojavax:codon_start: 2
> Note: (#7) biojavax:gene: Hoxb9
> Note: (#8) biojavax:product: HOXB9
> Note: (#9) biojavax:protein_id: ABA39736.1
> Note: (#10) biojavax:translation:
> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
> Note: (#11) biojavax:translation:
> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
> =============================================
> 
> This looks OK, the one thing I've just noticed is that the last piece
of
> annotation of the last feature is assigned twice.
> 
> Jolyon
> 
> 
> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
> Sent: 20 April 2006 13:05
> To: mthomas at dbm.ulb.ac.be
> Cc: Jolyon Holdstock; biojava-l at open-bio.org
> Subject: Re: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
> 
> Hi.
> 
> I made some small changes to the code, although nothing that would fix
> this kind of problem, committed it back to CVS, checked it out again,
> compiled, and ran a test program that read in an EMBL file with the
> feature table you describe below, and output it in EMBL format to
> another file. I then compared the two files... and found no
differences!
> The split-on-equals problem didn't occur, and all notes appeared
> alongside their correct features.
> 
> Could there be a problem maybe with the script you are using?
> 
> I've really no idea what the problem is as I can't reproduce it based
on
> the current CVS contents!
> 
> cheers,
> Richard
> 
> On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
> > Hi,
> > 
> > I have tested today's version from CVS.
> > 
> > Both EBI and Ensembl files now react the same way.
> > The last annotation of a feature is nevertheless related to its 
> > immediate following feature.
> > e.g. :
> > 
> > FT   gene            <1..>118
> > FT                   /gene="Hoxb9"
> > FT                   /note="Hoxb-9"
> > FT   mRNA            <1..>118
> > FT                   /gene="Hoxb9"
> > FT                   /product="HOXB9"
> > FT   CDS             <1..>118
> > 
> > /note="Hoxb-9" is related to mRNA
> > /product="HOXB9" is related to CDS
> > 
> > Concerning the split-on-equals problem, I still observe the problem
:
> > 
> >  [(#2) biojavax:note: transcript_i]
> > 
> > for this annotation :  /note="transcript_id=ENSMUST00000048680"
> > 
> > Thanks for helping,
> > 
> > Cheers,
> > 
> > Morgane.
> > 
> > Richard Holland wrote:
> > > I have committed an UNTESTED patch based on Jolyon's suggestion,
and
> > > also attempted to fix the split-on-equals problem Morgane
observed. 
> > >
> > > Please let me know if there are any problems with it.
> > >
> > > As this problem affected the UniProt parser in a similar manner
> (much of
> > > the code is identical), the same fixes were applied there too.
> > >
> > > cheers,
> > > Richard
> > >
> > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> > >   
> > >> Hi Morgane,
> > >>
> > >> I have amended the EmblFormat readSection method as below and the
> > >> parsing seems to work; please test it.
> > >>
> > >> I think that the last bit of annotation is carried over into the
> next
> > >> feature so before adding the new feature I dump the annotation
and
> reset
> > >> currentTag and currentVal.
> > >>
> > >> if (!line.startsWith(" ")) {
> > >> //--------- new code starts ---------------------------
> > >>   if (currentTag!=null) {
> > >>     section.add(new String[]{currentTag,currentVal.toString()});
> > >>     currentTag = null;
> > >>     currentVal = null;
> > >>   }
> > >> //--------- new code ends -----------------------------
> > >> // case 1 : word value - splits into key-value on its own
> > >>   section.add(line.split("\\s+"));
> > >> }
> > >>
> > >> Cheers,
> > >>
> > >> Jolyon
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: biojava-l-bounces at lists.open-bio.org
> > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of
Morgane
> > >> THOMAS-CHOLLIER
> > >> Sent: 12 April 2006 09:35
> > >> To: biojava-l at open-bio.org
> > >> Subject: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
> > >>
> > >> Hello again,
> > >>
> > >> I am currently using biojavax to parse EMBL files exported from
> Ensembl 
> > >> website.
> > >>
> > >> Compared to the EBI files I have, they show a difference in the
> Features
> > >>
> > >> lines :
> > >>
> > >> sometimes, only one "/word" is present. ie:
> > >>
> > >> EBI file :
> > >>
> > >> FT   gene            <1..>118
> > >> FT                   /gene="Hoxb9"
> > >> FT                   /note="Hoxb-9"
> > >>
> > >> Ensembl file;
> > >>
> > >> FT   gene         complement(1..3218)
> > >> FT                   /gene="ENSMUSG00000038227"
> > >>
> > >> The problem I encounter is that the parser correctly convert the
> "/word"
> > >>
> > >> into a Note, but the Note is then in relation with the immediate 
> > >> following feature (ie: mRNA).
> > >> The current gene feature thus has no annotation.
> > >>
> > >> This behavior is reproducible when removing one "/word" of an EBI
> file.
> > >>
> > >> Apart from this issue, I noted that Ensembl EMBL files uses "="
> inside a
> > >>
> > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends
> up 
> > >> with an incomplete Note, as the parser seems to split on "=" to
> separate
> > >>
> > >> the Key and the Value.
> > >>
> > >> Thanks for your help,
> > >>
> > >> Morgane.
> > >>
> > >>     
> > 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


From mark.schreiber at novartis.com  Tue Apr 25 02:07:59 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 25 Apr 2006 14:07:59 +0800
Subject: [Biojava-l] Pairwise Alignment
Message-ID: <OF646C368F.AE5F0486-ON4825715B.00212D15-4825715B.0021B0D4@EU.novartis.net>

Hi -

The appropriate classes for SW and NW pairwise alignment are in the 
org.biojava.bio.alignment package in the CVS (see 
http://code.open-bio.org/cgi/viewcvs.cgi/biojava-live/src/org/biojava/bio/alignment/?cvsroot=biojava).

While SW and NW are simple they are not as flexible as the pairwise 
architectures that can be made with HMMs. For a standard pairwise 
alignment I would think that the SW and NW algorithms are fine.

I'm not sure about comparative speed or memory requirements.

- Mark


Joanna Sharman <J.L.Sharman at sms.ed.ac.uk>
Sent by: biojava-l-bounces at lists.open-bio.org
04/19/2006 05:35 PM

 
        To:     biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Pairwise Alignment


Hello,

I'm new to BioJava so I'm sorry if this question has been asked several
times before.

This is actually sort of in reply to this message from last month:

http://lists.open-bio.org/pipermail/biojava-l/2006-March/005365.html

I'd like to perform a simple pairwise alignment using the
Smith-Waterman class I saw described here:

http://www.biojava.org/wiki/BioJava:CookBook:DP:PairWise2

but I can't find the classes it mentions anywhere on the cvs.  Can you
point me to where they are?

Also, I'm just wondering why the HMM method is preferred to the
Smith-Waterman (or others)?  It seems quite complicated to me, and like
it might require more memory, or am I wrong? :)

Cheers,
Joanna

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From e.willighagen at science.ru.nl  Wed Apr 26 12:03:47 2006
From: e.willighagen at science.ru.nl (Egon Willighagen)
Date: Wed, 26 Apr 2006 18:03:47 +0200
Subject: [Biojava-l] org.biojava.bio.gui.glyph classes?
Message-ID: <200604261803.47333.e.willighagen@science.ru.nl>


Hi all, 

in the wiki I saw mention of the org.biojava.bio.gui.glyph package, which does 
not seem to be part of BioJava 1.4.

Where can I download the code classes in that package?

Egon

-- 
Radboud University Nijmegen
http://www.cac.science.ru.nl/
blog: http://chem-bla-ics.blogspot.com/

From mark.schreiber at novartis.com  Wed Apr 26 21:14:38 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 27 Apr 2006 09:14:38 +0800
Subject: [Biojava-l] org.biojava.bio.gui.glyph classes?
Message-ID: <OFEC3139B0.EDAB2C44-ON4825715D.0006BCE4-4825715D.0006D569@EU.novartis.net>

Hi -

They are in biojava-live, which is the development version available for 
download via cvs. Take a look at the instructions on www.biojava.org.

- Mark


Egon Willighagen <e.willighagen at science.ru.nl>
Sent by: biojava-l-bounces at lists.open-bio.org
04/27/2006 12:03 AM

 
        To:     biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] org.biojava.bio.gui.glyph classes?


Hi all, 

in the wiki I saw mention of the org.biojava.bio.gui.glyph package, which 
does 
not seem to be part of BioJava 1.4.

Where can I download the code classes in that package?

Egon

-- 
Radboud University Nijmegen
http://www.cac.science.ru.nl/
blog: http://chem-bla-ics.blogspot.com/
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From heatkent at gmail.com  Wed Apr 26 19:22:46 2006
From: heatkent at gmail.com (Heather Kent)
Date: Wed, 26 Apr 2006 18:22:46 -0500
Subject: [Biojava-l] chromatogram viewer
Message-ID: <de8b3c810604261622s6af56c5aoc6fa4456607d4a9@mail.gmail.com>

I'm wondering if anyone can help me locate some source code for swing
components involved in viewing chromatograms, i read a 2003 forum from
biojava where Rhett Sutphin mentioned he would make some source code for a
chromatogram viewer (using the chromatogramgraphic class) available but i
cant seem to find it anywhere....im trying to fashion some scroll bars for
my chromatogram viewer that function to scroll through the image,  as well
as vertically and horizontally scale the chromatgram....i have some code
from an old viewer that will perform all these functions but doesnt use any
of the biojava classes or swing components....

thanx
heather


From russ at kepler-eng.com  Thu Apr 27 00:24:19 2006
From: russ at kepler-eng.com (Russ Kepler)
Date: Wed, 26 Apr 2006 22:24:19 -0600
Subject: [Biojava-l] chromatogram viewer
In-Reply-To: <de8b3c810604261622s6af56c5aoc6fa4456607d4a9@mail.gmail.com>
References: <de8b3c810604261622s6af56c5aoc6fa4456607d4a9@mail.gmail.com>
Message-ID: <200604262224.19525.russ@kepler-eng.com>

On Wednesday 26 April 2006 05:22 pm, Heather Kent wrote:
> I'm wondering if anyone can help me locate some source code for swing
> components involved in viewing chromatograms, i read a 2003 forum from
> biojava where Rhett Sutphin mentioned he would make some source code for a
> chromatogram viewer (using the chromatogramgraphic class) available but i
> cant seem to find it anywhere....im trying to fashion some scroll bars for
> my chromatogram viewer that function to scroll through the image,  as well
> as vertically and horizontally scale the chromatgram....i have some code
> from an old viewer that will perform all these functions but doesnt use any
> of the biojava classes or swing components....

There's org.biojava.bio.gui.sequence.ABITraceRenderer with demo code in 
seqviewer.TraceViewer  

It should give you a start.

From n.haigh at sheffield.ac.uk  Thu Apr 27 09:48:59 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 14:48:59 +0100
Subject: [Biojava-l] Sun One Studio+Biojava
Message-ID: <002301c66a01$5637d910$9f5ea78f@bmbpc196>

I?m totally new to Java and Biojava as I'm trying to defect from Bioperl!
I'm trying to use Sun One Studio for editing my java files - at least
initially. I don't know how to setup Sun One Studio to find my
biojava-1.4.jar file, I'm not even sure how to test if it can find it
correctly. Any help on these issues would be gratefully received. As I said
I'm a newbie - bear with me!

Cheers
Nathan

----------------------------------------------------------------------------
------
Dr. Nathan S. Haigh
Bioinformatics PostDoctoral Research Associate
?
Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22
20112
Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533
569
University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22
20002
Western Bank???????????????????????????? ?????? ?????? Web:
www.bioinf.shef.ac.uk
Sheffield??????????????????????????????? ??????
www.petraea.shef.ac.uk
S10 2TN????????????????????????????????? ?????? 	
----------------------------------------------------------------------------
------

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 14:48:56
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From richard.holland at ebi.ac.uk  Thu Apr 27 10:51:23 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 27 Apr 2006 15:51:23 +0100
Subject: [Biojava-l] Sun One Studio+Biojava
In-Reply-To: <002301c66a01$5637d910$9f5ea78f@bmbpc196>
References: <002301c66a01$5637d910$9f5ea78f@bmbpc196>
Message-ID: <1146149483.3955.7.camel@texas.ebi.ac.uk>

Sun One Studio is built on NetBeans, which is what I use to develop bits
of BioJava with, so I think what works for me should work for you. Here
goes...:

If you are working with BioJava in apps you are developing yourself, you
need to set up BioJava as a library in NetBeans. Do this by going to the
Library Manager (Tools menu), creating a new library called BioJava,
then using the buttons provided to locate and add the biojava-1.4.jar
file to the library. You can then associate this library with any
project you are working on by right-clicking on that project, choosing
Properties, then click on Libraries in the tree on the left of the
window that appears and use this to add the BioJava library.

If you are intending to develop BioJava itself, you need to check out
the entire biojava-live project from CVS. You can then set up
development in NetBeans by creating a "new project from existing Ant
script", and telling it where the build.xml file can be found within the
BioJava project. It'll do the rest for you. 

Hope this helps.

cheers,
Richard

On Thu, 2006-04-27 at 14:48 +0100, Nathan S. Haigh wrote:
> I?m totally new to Java and Biojava as I'm trying to defect from Bioperl!
> I'm trying to use Sun One Studio for editing my java files - at least
> initially. I don't know how to setup Sun One Studio to find my
> biojava-1.4.jar file, I'm not even sure how to test if it can find it
> correctly. Any help on these issues would be gratefully received. As I said
> I'm a newbie - bear with me!
> 
> Cheers
> Nathan
> 
> ----------------------------------------------------------------------------
> ------
> Dr. Nathan S. Haigh
> Bioinformatics PostDoctoral Research Associate
>  
> Room B2 211                                            Tel: +44 (0)114 22
> 20112
> Department of Animal and Plant Sciences                Mob: +44 (0)7742 533
> 569
> University of Sheffield                                Fax: +44 (0)114 22
> 20002
> Western Bank                                           Web:
> www.bioinf.shef.ac.uk
> Sheffield                                      
> www.petraea.shef.ac.uk
> S10 2TN                                         	
> ----------------------------------------------------------------------------
> ------
> 
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0615-2, 12/04/2006
> Tested on: 27/04/2006 14:48:56
> avast! - copyright (c) 1988-2006 ALWIL Software.
> http://www.avast.com
> 
> 
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From n.haigh at sheffield.ac.uk  Thu Apr 27 11:01:56 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 16:01:56 +0100
Subject: [Biojava-l] Sun One Studio+Biojava
In-Reply-To: <1146149483.3955.7.camel@texas.ebi.ac.uk>
Message-ID: <003601c66a0b$86b289f0$9f5ea78f@bmbpc196>

Thanks for the info - the fog is starting to lift! :o)

I think I'll leave actual Biojava development for now - see how I go with
actually learning Java first :o) I have a steep learning curve, as I have an
application written in Perl which I use Bioperl modules and Perl/Tk for the
GUI. So I'm trying to rewrite this application in Java while trying to think
about OO programming.....i'm sure I'll send some really simple questions to
the list over the coming weeks/months, but hopefully there won't be too many
nightmares along the way!

Thanks
Nathan

> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> Sent: 27 April 2006 15:51
> To: n.haigh at sheffield.ac.uk
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] Sun One Studio+Biojava
> 
> Sun One Studio is built on NetBeans, which is what I use to develop bits
> of BioJava with, so I think what works for me should work for you. Here
> goes...:
> 
> If you are working with BioJava in apps you are developing yourself, you
> need to set up BioJava as a library in NetBeans. Do this by going to the
> Library Manager (Tools menu), creating a new library called BioJava,
> then using the buttons provided to locate and add the biojava-1.4.jar
> file to the library. You can then associate this library with any
> project you are working on by right-clicking on that project, choosing
> Properties, then click on Libraries in the tree on the left of the
> window that appears and use this to add the BioJava library.
> 
> If you are intending to develop BioJava itself, you need to check out
> the entire biojava-live project from CVS. You can then set up
> development in NetBeans by creating a "new project from existing Ant
> script", and telling it where the build.xml file can be found within the
> BioJava project. It'll do the rest for you.
> 
> Hope this helps.
> 
> cheers,
> Richard
> 
> On Thu, 2006-04-27 at 14:48 +0100, Nathan S. Haigh wrote:
> > I'm totally new to Java and Biojava as I'm trying to defect from
> Bioperl!
> > I'm trying to use Sun One Studio for editing my java files - at least
> > initially. I don't know how to setup Sun One Studio to find my
> > biojava-1.4.jar file, I'm not even sure how to test if it can find it
> > correctly. Any help on these issues would be gratefully received. As I
> said
> > I'm a newbie - bear with me!
> >
> > Cheers
> > Nathan
> >
> > ------------------------------------------------------------------------
> ----
> > ------
> > Dr. Nathan S. Haigh
> > Bioinformatics PostDoctoral Research Associate
> >
> > Room B2 211                                            Tel: +44 (0)114
> 22
> > 20112
> > Department of Animal and Plant Sciences                Mob: +44 (0)7742
> 533
> > 569
> > University of Sheffield                                Fax: +44 (0)114
> 22
> > 20002
> > Western Bank                                           Web:
> > www.bioinf.shef.ac.uk
> > Sheffield
> > www.petraea.shef.ac.uk
> > S10 2TN
> > ------------------------------------------------------------------------
> ----
> > ------
> >
> > ---
> > avast! Antivirus: Outbound message clean.
> > Virus Database (VPS): 0615-2, 12/04/2006
> > Tested on: 27/04/2006 14:48:56
> > avast! - copyright (c) 1988-2006 ALWIL Software.
> > http://www.avast.com
> >
> >
> >
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> --
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 16:00:23
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From n.haigh at sheffield.ac.uk  Thu Apr 27 11:12:34 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 16:12:34 +0100
Subject: [Biojava-l] Creating my own classes
Message-ID: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196>

I?m trying to learn/think about OO programming as I?m learning Java and port
a Perl app into Java ? could you tell me if this sounds reasonable for
writing some of my own classes!?

My application essentially defines sets of positions from an alignment - I
call them CHARSETs as they are analogous to CHARSETs in the Nexus file
format. I believe in Biojava the Locations object/interface (sorry, not
familiar enough with correct terminology yet) is essentially the same sort
of thing. In my app, the user can use several approaches to define a CHARSET
e.g. a CHARSET containing just invariable sites, or a CHARSET containing
sites above a given % identity.

My question is this, if I were to create a class called Charset, and I
create several subclasses called e.g. Invariable etc is this reasonable? Or
should the class Charset contain many methods for creating a different type
of CHARSET?

In my app, a CHARSET needs to be associated with a particular alignment, and
settings used to define the CHARSET, so my Charset class have variables such
as an Alignment object, Locations objects etc. I?d like to write a method
that returns a subalignment based on the CHARSETs associated alignment
object and Locations object but I?m not sure how to do this.

Thanks for any help/comments/corrections/critiques
Nathan


----------------------------------------------------------------------------
------
Dr. Nathan S. Haigh
Bioinformatics PostDoctoral Research Associate
?
Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22
20112
Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533
569
University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22
20002
Western Bank???????????????????????????? ?????? ?????? Web:
www.bioinf.shef.ac.uk
Sheffield??????????????????????????????? ??????
www.petraea.shef.ac.uk
S10 2TN????????????????????????????????? ?????? 	
----------------------------------------------------------------------------
------

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 16:12:34
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From richard.holland at ebi.ac.uk  Thu Apr 27 11:36:51 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 27 Apr 2006 16:36:51 +0100
Subject: [Biojava-l] Creating my own classes
In-Reply-To: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196>
References: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196>
Message-ID: <1146152212.3955.24.camel@texas.ebi.ac.uk>

On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> My application essentially defines sets of positions from an alignment - I
> call them CHARSETs as they are analogous to CHARSETs in the Nexus file
> format. I believe in Biojava the Locations object/interface (sorry, not
> familiar enough with correct terminology yet) is essentially the same sort
> of thing. In my app, the user can use several approaches to define a CHARSET
> e.g. a CHARSET containing just invariable sites, or a CHARSET containing
> sites above a given % identity.

You'd be right there. A Location in BioJava represents a range of
positions.

> My question is this, if I were to create a class called Charset, and I
> create several subclasses called e.g. Invariable etc is this reasonable? Or
> should the class Charset contain many methods for creating a different type
> of CHARSET?

My suggestion would be create an interface called Charset, which defines
behaviour which you expect all types of Charset to exhibit. Then,
implement a number of classes which implement this interface, one for
each type of Charset you have, which each add their own methods or
special behaviour. If a lot of the behaviour is common, you can define
an abstract class called something like AbstractCharset which defines
this common behaviour, and have the others extend it.

> In my app, a CHARSET needs to be associated with a particular alignment, and
> settings used to define the CHARSET, so my Charset class have variables such
> as an Alignment object, Locations objects etc. I?d like to write a method
> that returns a subalignment based on the CHARSETs associated alignment
> object and Locations object but I?m not sure how to do this.

BioJava Alignment objects implement the SymbolList interface, which
means you can use all the methods from SymbolList to work with the
Alignment, including the subList() method.

cheers,
Richard

-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From n.haigh at sheffield.ac.uk  Thu Apr 27 11:44:05 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 16:44:05 +0100
Subject: [Biojava-l] Creating my own classes
In-Reply-To: <1146152212.3955.24.camel@texas.ebi.ac.uk>
Message-ID: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196>

Thanks Richard,

I'll think about this and try to do some deciphering. The only thing I'm in
need of help for is possibly some actual code that would take an Alignment
object and return a subalignment based on the positions specified in a
Locations object - it's difficult to make sense of a new language until you
start to pick up some of the basics.

Thanks
Nathan

> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> Sent: 27 April 2006 16:37
> To: n.haigh at sheffield.ac.uk
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] Creating my own classes
> 
> On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> > My application essentially defines sets of positions from an alignment -
> I
> > call them CHARSETs as they are analogous to CHARSETs in the Nexus file
> > format. I believe in Biojava the Locations object/interface (sorry, not
> > familiar enough with correct terminology yet) is essentially the same
> sort
> > of thing. In my app, the user can use several approaches to define a
> CHARSET
> > e.g. a CHARSET containing just invariable sites, or a CHARSET containing
> > sites above a given % identity.
> 
> You'd be right there. A Location in BioJava represents a range of
> positions.
> 
> > My question is this, if I were to create a class called Charset, and I
> > create several subclasses called e.g. Invariable etc is this reasonable?
> Or
> > should the class Charset contain many methods for creating a different
> type
> > of CHARSET?
> 
> My suggestion would be create an interface called Charset, which defines
> behaviour which you expect all types of Charset to exhibit. Then,
> implement a number of classes which implement this interface, one for
> each type of Charset you have, which each add their own methods or
> special behaviour. If a lot of the behaviour is common, you can define
> an abstract class called something like AbstractCharset which defines
> this common behaviour, and have the others extend it.
> 
> > In my app, a CHARSET needs to be associated with a particular alignment,
> and
> > settings used to define the CHARSET, so my Charset class have variables
> such
> > as an Alignment object, Locations objects etc. I'd like to write a
> method
> > that returns a subalignment based on the CHARSETs associated alignment
> > object and Locations object but I'm not sure how to do this.
> 
> BioJava Alignment objects implement the SymbolList interface, which
> means you can use all the methods from SymbolList to work with the
> Alignment, including the subList() method.
> 
> cheers,
> Richard
> 
> --
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 16:44:04
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From richard.holland at ebi.ac.uk  Thu Apr 27 11:55:39 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 27 Apr 2006 16:55:39 +0100
Subject: [Biojava-l] Creating my own classes
In-Reply-To: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196>
References: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196>
Message-ID: <1146153339.3955.30.camel@texas.ebi.ac.uk>

Given some existing Location object (let's called it 'loc'), and an
existing Alignment (hypothetically called 'algn'), you can do this:

	// Obtain the labels of all the sequences in the alignment.
	Set labels = new HashSet(); 
	labels.addAll(algn.getLabels());
	// Obtain a sub-alignment including all the sequences in the 
 	// original alignment.
        Alignment subAlignment = algn.subAlignment(labels, loc);

cheers,
Richard


On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote:
> Thanks Richard,
> 
> I'll think about this and try to do some deciphering. The only thing I'm in
> need of help for is possibly some actual code that would take an Alignment
> object and return a subalignment based on the positions specified in a
> Locations object - it's difficult to make sense of a new language until you
> start to pick up some of the basics.
> 
> Thanks
> Nathan
> 
> > -----Original Message-----
> > From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> > Sent: 27 April 2006 16:37
> > To: n.haigh at sheffield.ac.uk
> > Cc: biojava-l at lists.open-bio.org
> > Subject: Re: [Biojava-l] Creating my own classes
> > 
> > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> > > My application essentially defines sets of positions from an alignment -
> > I
> > > call them CHARSETs as they are analogous to CHARSETs in the Nexus file
> > > format. I believe in Biojava the Locations object/interface (sorry, not
> > > familiar enough with correct terminology yet) is essentially the same
> > sort
> > > of thing. In my app, the user can use several approaches to define a
> > CHARSET
> > > e.g. a CHARSET containing just invariable sites, or a CHARSET containing
> > > sites above a given % identity.
> > 
> > You'd be right there. A Location in BioJava represents a range of
> > positions.
> > 
> > > My question is this, if I were to create a class called Charset, and I
> > > create several subclasses called e.g. Invariable etc is this reasonable?
> > Or
> > > should the class Charset contain many methods for creating a different
> > type
> > > of CHARSET?
> > 
> > My suggestion would be create an interface called Charset, which defines
> > behaviour which you expect all types of Charset to exhibit. Then,
> > implement a number of classes which implement this interface, one for
> > each type of Charset you have, which each add their own methods or
> > special behaviour. If a lot of the behaviour is common, you can define
> > an abstract class called something like AbstractCharset which defines
> > this common behaviour, and have the others extend it.
> > 
> > > In my app, a CHARSET needs to be associated with a particular alignment,
> > and
> > > settings used to define the CHARSET, so my Charset class have variables
> > such
> > > as an Alignment object, Locations objects etc. I'd like to write a
> > method
> > > that returns a subalignment based on the CHARSETs associated alignment
> > > object and Locations object but I'm not sure how to do this.
> > 
> > BioJava Alignment objects implement the SymbolList interface, which
> > means you can use all the methods from SymbolList to work with the
> > Alignment, including the subList() method.
> > 
> > cheers,
> > Richard
> > 
> > --
> > Richard Holland (BioMart Team)
> > EMBL-EBI
> > Wellcome Trust Genome Campus
> > Hinxton
> > Cambridge CB10 1SD
> > UNITED KINGDOM
> > Tel: +44-(0)1223-494416
> 
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0615-2, 12/04/2006
> Tested on: 27/04/2006 16:44:04
> avast! - copyright (c) 1988-2006 ALWIL Software.
> http://www.avast.com
> 
> 
> 
> 
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From n.haigh at sheffield.ac.uk  Thu Apr 27 12:00:09 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 17:00:09 +0100
Subject: [Biojava-l] Creating my own classes
In-Reply-To: <1146153339.3955.30.camel@texas.ebi.ac.uk>
Message-ID: <000d01c66a13$a8b51380$9f5ea78f@bmbpc196>

Fantastic stuff - again, I'll look into this over the coming weeks (I
actually have annual leave for a week, so my flurry of e-mail will have to
stop for now.

Thanks again!
Nathan

> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> Sent: 27 April 2006 16:56
> To: n.haigh at sheffield.ac.uk
> Cc: biojava-l at lists.open-bio.org
> Subject: RE: [Biojava-l] Creating my own classes
> 
> Given some existing Location object (let's called it 'loc'), and an
> existing Alignment (hypothetically called 'algn'), you can do this:
> 
> 	// Obtain the labels of all the sequences in the alignment.
> 	Set labels = new HashSet();
> 	labels.addAll(algn.getLabels());
> 	// Obtain a sub-alignment including all the sequences in the
>  	// original alignment.
>         Alignment subAlignment = algn.subAlignment(labels, loc);
> 
> cheers,
> Richard
> 
> 
> On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote:
> > Thanks Richard,
> >
> > I'll think about this and try to do some deciphering. The only thing I'm
> in
> > need of help for is possibly some actual code that would take an
> Alignment
> > object and return a subalignment based on the positions specified in a
> > Locations object - it's difficult to make sense of a new language until
> you
> > start to pick up some of the basics.
> >
> > Thanks
> > Nathan
> >
> > > -----Original Message-----
> > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> > > Sent: 27 April 2006 16:37
> > > To: n.haigh at sheffield.ac.uk
> > > Cc: biojava-l at lists.open-bio.org
> > > Subject: Re: [Biojava-l] Creating my own classes
> > >
> > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> > > > My application essentially defines sets of positions from an
> alignment -
> > > I
> > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus
> file
> > > > format. I believe in Biojava the Locations object/interface (sorry,
> not
> > > > familiar enough with correct terminology yet) is essentially the
> same
> > > sort
> > > > of thing. In my app, the user can use several approaches to define a
> > > CHARSET
> > > > e.g. a CHARSET containing just invariable sites, or a CHARSET
> containing
> > > > sites above a given % identity.
> > >
> > > You'd be right there. A Location in BioJava represents a range of
> > > positions.
> > >
> > > > My question is this, if I were to create a class called Charset, and
> I
> > > > create several subclasses called e.g. Invariable etc is this
> reasonable?
> > > Or
> > > > should the class Charset contain many methods for creating a
> different
> > > type
> > > > of CHARSET?
> > >
> > > My suggestion would be create an interface called Charset, which
> defines
> > > behaviour which you expect all types of Charset to exhibit. Then,
> > > implement a number of classes which implement this interface, one for
> > > each type of Charset you have, which each add their own methods or
> > > special behaviour. If a lot of the behaviour is common, you can define
> > > an abstract class called something like AbstractCharset which defines
> > > this common behaviour, and have the others extend it.
> > >
> > > > In my app, a CHARSET needs to be associated with a particular
> alignment,
> > > and
> > > > settings used to define the CHARSET, so my Charset class have
> variables
> > > such
> > > > as an Alignment object, Locations objects etc. I'd like to write a
> > > method
> > > > that returns a subalignment based on the CHARSETs associated
> alignment
> > > > object and Locations object but I'm not sure how to do this.
> > >
> > > BioJava Alignment objects implement the SymbolList interface, which
> > > means you can use all the methods from SymbolList to work with the
> > > Alignment, including the subList() method.
> > >
> > > cheers,
> > > Richard
> > >
> > > --
> > > Richard Holland (BioMart Team)
> > > EMBL-EBI
> > > Wellcome Trust Genome Campus
> > > Hinxton
> > > Cambridge CB10 1SD
> > > UNITED KINGDOM
> > > Tel: +44-(0)1223-494416
> >
> > ---
> > avast! Antivirus: Outbound message clean.
> > Virus Database (VPS): 0615-2, 12/04/2006
> > Tested on: 27/04/2006 16:44:04
> > avast! - copyright (c) 1988-2006 ALWIL Software.
> > http://www.avast.com
> >
> >
> >
> >
> >
> --
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 17:00:06
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From david at autohandle.com  Thu Apr 27 13:10:08 2006
From: david at autohandle.com (David Scott)
Date: Thu, 27 Apr 2006 10:10:08 -0700
Subject: [Biojava-l] hibernate-xml mapping
Message-ID: <4450FAF0.9070206@autohandle.com>

what is the xml mapping in the hibernate files based on?


From mark.schreiber at novartis.com  Thu Apr 27 22:05:44 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 28 Apr 2006 10:05:44 +0800
Subject: [Biojava-l] Creating my own classes
Message-ID: <OF27AEDEFF.D5D8985D-ON4825715E.000B5C4C-4825715E.000B8305@EU.novartis.net>

An excellent book on OO and Java is Thinking in Java by Bruce Eckell. If 
you come from a C or Perl background it will change the way you think 
about programming.

You can get online versions for free, most good bookstores have hardcopies 
as well.

- Mark


"Nathan S. Haigh" <n.haigh at sheffield.ac.uk>
Sent by: biojava-l-bounces at lists.open-bio.org
04/28/2006 12:00 AM
Please respond to n.haigh

 
        To:     "'Richard Holland'" <richard.holland at ebi.ac.uk>
        cc:     biojava-l at lists.open-bio.org, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Creating my own classes


Fantastic stuff - again, I'll look into this over the coming weeks (I
actually have annual leave for a week, so my flurry of e-mail will have to
stop for now.

Thanks again!
Nathan

> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> Sent: 27 April 2006 16:56
> To: n.haigh at sheffield.ac.uk
> Cc: biojava-l at lists.open-bio.org
> Subject: RE: [Biojava-l] Creating my own classes
> 
> Given some existing Location object (let's called it 'loc'), and an
> existing Alignment (hypothetically called 'algn'), you can do this:
> 
>                // Obtain the labels of all the sequences in the 
alignment.
>                Set labels = new HashSet();
>                labels.addAll(algn.getLabels());
>                // Obtain a sub-alignment including all the sequences in 
the
>                // original alignment.
>         Alignment subAlignment = algn.subAlignment(labels, loc);
> 
> cheers,
> Richard
> 
> 
> On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote:
> > Thanks Richard,
> >
> > I'll think about this and try to do some deciphering. The only thing 
I'm
> in
> > need of help for is possibly some actual code that would take an
> Alignment
> > object and return a subalignment based on the positions specified in a
> > Locations object - it's difficult to make sense of a new language 
until
> you
> > start to pick up some of the basics.
> >
> > Thanks
> > Nathan
> >
> > > -----Original Message-----
> > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> > > Sent: 27 April 2006 16:37
> > > To: n.haigh at sheffield.ac.uk
> > > Cc: biojava-l at lists.open-bio.org
> > > Subject: Re: [Biojava-l] Creating my own classes
> > >
> > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> > > > My application essentially defines sets of positions from an
> alignment -
> > > I
> > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus
> file
> > > > format. I believe in Biojava the Locations object/interface 
(sorry,
> not
> > > > familiar enough with correct terminology yet) is essentially the
> same
> > > sort
> > > > of thing. In my app, the user can use several approaches to define 
a
> > > CHARSET
> > > > e.g. a CHARSET containing just invariable sites, or a CHARSET
> containing
> > > > sites above a given % identity.
> > >
> > > You'd be right there. A Location in BioJava represents a range of
> > > positions.
> > >
> > > > My question is this, if I were to create a class called Charset, 
and
> I
> > > > create several subclasses called e.g. Invariable etc is this
> reasonable?
> > > Or
> > > > should the class Charset contain many methods for creating a
> different
> > > type
> > > > of CHARSET?
> > >
> > > My suggestion would be create an interface called Charset, which
> defines
> > > behaviour which you expect all types of Charset to exhibit. Then,
> > > implement a number of classes which implement this interface, one 
for
> > > each type of Charset you have, which each add their own methods or
> > > special behaviour. If a lot of the behaviour is common, you can 
define
> > > an abstract class called something like AbstractCharset which 
defines
> > > this common behaviour, and have the others extend it.
> > >
> > > > In my app, a CHARSET needs to be associated with a particular
> alignment,
> > > and
> > > > settings used to define the CHARSET, so my Charset class have
> variables
> > > such
> > > > as an Alignment object, Locations objects etc. I'd like to write a
> > > method
> > > > that returns a subalignment based on the CHARSETs associated
> alignment
> > > > object and Locations object but I'm not sure how to do this.
> > >
> > > BioJava Alignment objects implement the SymbolList interface, which
> > > means you can use all the methods from SymbolList to work with the
> > > Alignment, including the subList() method.
> > >
> > > cheers,
> > > Richard
> > >
> > > --
> > > Richard Holland (BioMart Team)
> > > EMBL-EBI
> > > Wellcome Trust Genome Campus
> > > Hinxton
> > > Cambridge CB10 1SD
> > > UNITED KINGDOM
> > > Tel: +44-(0)1223-494416
> >
> > ---
> > avast! Antivirus: Outbound message clean.
> > Virus Database (VPS): 0615-2, 12/04/2006
> > Tested on: 27/04/2006 16:44:04
> > avast! - copyright (c) 1988-2006 ALWIL Software.
> > http://www.avast.com
> >
> >
> >
> >
> >
> --
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 17:00:06
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Thu Apr 27 22:06:31 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 28 Apr 2006 10:06:31 +0800
Subject: [Biojava-l] hibernate-xml mapping
Message-ID: <OF76803BFB.0A810F33-ON4825715E.000B8871-4825715E.000B9557@EU.novartis.net>

It is based on the BioSQL schema

- Mark


David Scott <david at autohandle.com>
Sent by: biojava-l-bounces at lists.open-bio.org
04/28/2006 01:10 AM

 
        To:     Biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] hibernate-xml mapping


what is the xml mapping in the hibernate files based on?

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From ilhami.visne at gmail.com  Fri Apr 28 05:09:56 2006
From: ilhami.visne at gmail.com (Ilhami Visne)
Date: Fri, 28 Apr 2006 11:09:56 +0200
Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi
Message-ID: <ce6b4d120604280209y5d82c954s417116f8c1c93a29@mail.gmail.com>

i got a file in fasta format, which is not encoded in ansi. but it seems ok.
it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta
i tried to read it with SeqIOTools.readFastaDNA and this exception was
thrown:

org.biojava.bio.BioException: Could not read sequence
    at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java
:104)
..............
..............
Caused by: java.io.IOException: Stream does not appear to contain FASTA
formatted data: ??>
org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112)
 at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:101)

"??>" there is no row like this but it seems it is hidden.

How should i handle such files?

thax in advance.


From richard.holland at ebi.ac.uk  Fri Apr 28 06:37:35 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 28 Apr 2006 11:37:35 +0100
Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi
In-Reply-To: <ce6b4d120604280209y5d82c954s417116f8c1c93a29@mail.gmail.com>
References: <ce6b4d120604280209y5d82c954s417116f8c1c93a29@mail.gmail.com>
Message-ID: <1146220656.3955.46.camel@texas.ebi.ac.uk>

I've no idea what binary format that file is in - it contains some very
strange characters. It appears to contain _some_ ANSI data but with
extra binary bits added to the start and end. I think you need to check
the program that generated the file as it is obviously not doing what it
is supposed to.

Your best bet is to convert the file to ANSI or some other format
understood out-of-the-box by Java.

cheers,
Richard

On Fri, 2006-04-28 at 11:09 +0200, Ilhami Visne wrote:
> i got a file in fasta format, which is not encoded in ansi. but it seems ok.
> it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta
> i tried to read it with SeqIOTools.readFastaDNA and this exception was
> thrown:
> 
> org.biojava.bio.BioException: Could not read sequence
>     at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java
> :104)
> ..............
> ..............
> Caused by: java.io.IOException: Stream does not appear to contain FASTA
> formatted data: ??>
> org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112)
>  at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:101)
> 
> "??>" there is no row like this but it seems it is hidden.
> 
> How should i handle such files?
> 
> thax in advance.
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From ilhami.visne at gmail.com  Fri Apr 28 05:29:07 2006
From: ilhami.visne at gmail.com (Ilhami Visne)
Date: Fri, 28 Apr 2006 11:29:07 +0200
Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi
Message-ID: <ce6b4d120604280229r53717f8dhfc809deb7d2cee21@mail.gmail.com>

i got a file in fasta format, which is not encoded in ansi. but it seems ok.
it can be downloaded here:
http://stud3.tuwien.ac.at/~e0125935/try3.fasta<http://stud3.tuwien.ac.at/%7Ee0125935/try3.fasta>
i tried to read it with SeqIOTools.readFastaDNA and this exception was
thrown:

org.biojava.bio.BioException: Could not read sequence
    at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java
:104)
..............
..............
Caused by: java.io.IOException: Stream does not appear to contain FASTA
formatted data: ??>
org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112)
 at org.biojava.bio.seq.io.StreamReader.nextSequence (StreamReader.java:101)

"??>" there is no row like this but it seems it is hidden.

How should i handle such files?

thax in advance.


From richard.holland at ebi.ac.uk  Fri Apr 28 09:19:30 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 28 Apr 2006 14:19:30 +0100
Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi
In-Reply-To: <ce6b4d120604280600q2be638f8mfac84220420f25d5@mail.gmail.com>
References: <ce6b4d120604280209y5d82c954s417116f8c1c93a29@mail.gmail.com>
	<1146220656.3955.46.camel@texas.ebi.ac.uk>
	<ce6b4d120604280600q2be638f8mfac84220420f25d5@mail.gmail.com>
Message-ID: <1146230371.3955.59.camel@texas.ebi.ac.uk>

Thinking about this a bit more, I think you meant ASCII when you said
ANSI?

FASTA format is very strictly defined. It is a file containing a number
sequences each with their own header, which starts with a '>' symbol.
You can indeed use any character you like within the header, which ends
at the first new-line after the '>' (newline is ASCII 10 or 13, or both,
depending on your OS). No whitespace is allowed at the start or end of
the file or between or within sequences.

The problem with your file is that the unusual characters are appearing
at the start of the file before the first header, and maybe also during
the sequence itself although I didn't look that closely. Hence it breaks
the FASTA format specification.

The problem here lies with the program that is generating your FASTA
file. BioJava is behaving correctly.

cheers,
Richard

On Fri, 2006-04-28 at 15:00 +0200, Ilhami Visne wrote:
> I thought already to convert the file to ANSI. Sequence part must
> contain only ansi-chararacters but header or other annotaion must not
> contain only ansi characters. if i convert it to ansi, doesn't it may
> cause to lose some data? 
> 
> On 4/28/06, Richard Holland <richard.holland at ebi.ac.uk> wrote:
>         I've no idea what binary format that file is in - it contains
>         some very
>         strange characters. It appears to contain _some_ ANSI data but
>         with
>         extra binary bits added to the start and end. I think you need
>         to check
>         the program that generated the file as it is obviously not
>         doing what it
>         is supposed to.
>         
>         Your best bet is to convert the file to ANSI or some other
>         format
>         understood out-of-the-box by Java.
>         
>         cheers,
>         Richard
>         
>         On Fri, 2006-04-28 at 11:09 +0200, Ilhami Visne wrote:
>         > i got a file in fasta format, which is not encoded in ansi.
>         but it seems ok.
>         > it can be downloaded here:
>         http://stud3.tuwien.ac.at/~e0125935/try3.fasta
>         > i tried to read it with SeqIOTools.readFastaDNA and this
>         exception was
>         > thrown:
>         >
>         > org.biojava.bio.BioException: Could not read sequence
>         >     at org.biojava.bio.seq.io.StreamReader.nextSequence
>         (StreamReader.java
>         > :104)
>         > ..............
>         > ..............
>         > Caused by: java.io.IOException: Stream does not appear to
>         contain FASTA
>         > formatted data: ??> 
>         > org.biojava.bio.seq.io.FastaFormat.readSequence
>         (FastaFormat.java:112)
>         >  at org.biojava.bio.seq.io.StreamReader.nextSequence
>         (StreamReader.java:101)
>         >
>         > "??>" there is no row like this but it seems it is hidden. 
>         >
>         > How should i handle such files?
>         >
>         > thax in advance.
>         >
>         > _______________________________________________
>         > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>         > http://lists.open-bio.org/mailman/listinfo/biojava-l
>         >
>         --
>         Richard Holland (BioMart Team)
>         EMBL-EBI
>         Wellcome Trust Genome Campus
>         Hinxton
>         Cambridge CB10 1SD
>         UNITED KINGDOM
>         Tel: +44-(0)1223-494416
>         
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From anderson.moura at telemar-rj.com.br  Mon Apr  3 14:09:23 2006
From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva)
Date: Mon, 3 Apr 2006 11:09:23 -0300
Subject: [Biojava-l] Get a sequence from internet
Message-ID: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net>

Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava?
 
Can anybody help?
 
Thanks,


Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a.


From anderson.moura at telemar-rj.com.br  Mon Apr  3 15:54:01 2006
From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva)
Date: Mon, 3 Apr 2006 12:54:01 -0300
Subject: [Biojava-l] RES:  Get a sequence from internet
Message-ID: <3C39C09ED334F243838953854BE43FB6025C7F40@MAILBX02.telemar.corp.net>

Nice!!

It work only with the sequence ID? Can I search by the name of the sequence?

Thanks a lot!

-----Mensagem original-----
De: Dickson S. Guedes [mailto:guedes at unisul.br]
Enviada em: segunda-feira, 3 de abril de 2006 12:10
Para: Anderson Moura da Silva
Cc: biojava-l at lists.open-bio.org
Assunto: Re: [Biojava-l] Get a sequence from internet


Yes,
Hi Anderson,

You can use the NCBISequenceDB:


(...)

NCBISequenceDB ncbiDB = new NCBISequenceDB();
Sequence sequenceFromGenbank = ncbiDB.getSequence("sequence_id");

System.out.println(sequenceFromGenbank.getName());

(...)

Change "sequence_id" for a ID from Genbank.

:)


Anderson Moura da Silva escreveu:
> Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava?
>  
> Can anybody help?
>  
> Thanks,
> 
> 
> Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a.
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 


-- 
Dickson S. Guedes
/*
  * UNISUL - Universidade do Sul de Santa Catarina
  * ATI - Assessoria de Tecnologia da Informa??o
  * (0xx48) 621-3200 - http://www.unisul.br
  *
  *    "Quis custodiet ipsos custodes?"
  */


From guedes at unisul.br  Mon Apr  3 15:09:43 2006
From: guedes at unisul.br (Dickson S. Guedes)
Date: Mon, 03 Apr 2006 12:09:43 -0300
Subject: [Biojava-l] Get a sequence from internet
In-Reply-To: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net>
References: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net>
Message-ID: <44313AB7.7080309@unisul.br>

Yes,
Hi Anderson,

You can use the NCBISequenceDB:


(...)

NCBISequenceDB ncbiDB = new NCBISequenceDB();
Sequence sequenceFromGenbank = ncbiDB.getSequence("sequence_id");

System.out.println(sequenceFromGenbank.getName());

(...)

Change "sequence_id" for a ID from Genbank.

:)


Anderson Moura da Silva escreveu:
> Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava?
>  
> Can anybody help?
>  
> Thanks,
> 
> 
> Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a.
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 


-- 
Dickson S. Guedes
/*
  * UNISUL - Universidade do Sul de Santa Catarina
  * ATI - Assessoria de Tecnologia da Informa??o
  * (0xx48) 621-3200 - http://www.unisul.br
  *
  *    "Quis custodiet ipsos custodes?"
  */


From wendy.wong at gmail.com  Tue Apr  4 18:22:00 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Tue, 4 Apr 2006 19:22:00 +0100
Subject: [Biojava-l] unsupervised training of transition weights
In-Reply-To: <200603311805.25861.matthew.pocock@ncl.ac.uk>
References: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
	<5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>
	<200603311805.25861.matthew.pocock@ncl.ac.uk>
Message-ID: <e554425b0604041122y2cbf4012g57f3d2069e7ca219@mail.gmail.com>

Thanks for your advice! I am able to train a subset of transition
probabilities now!

I found something strange, first I changed my emission distributions
to untrainabledistributions and  the trainer didn't seem to be doing
anything, all cycles have the same score. I then changed it back to
SimpleDistribution (still keepting my getWeightImp in my custom
distribution). this time it works and it doesn't seem to be modifying
my emission probabilities. So it works for me - I am just curious if
it is a bug or if I was doing something wrong?

Thanks again!
wendy


On 3/31/06, Matthew Pocock <matthew.pocock at ncl.ac.uk> wrote:
> > The DP code does some caching of probabilities, I don't think there's
> > any way to turn this off without modifying the DP implementations.
> >
> >            Thomas.
>
> My reccolection is that if you did turn this off, the algorithm would run
> very, very much more slowly. Internally to the DP objects, the distribution
> probabilities (in fact, they aren't even probabilities by this stage) are
> stored in a data-structure optimized for the type of lookups performed during
> the dynamic programming recursions.
>
> Matthew
>


From wendy.wong at gmail.com  Tue Apr  4 18:22:00 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Tue, 4 Apr 2006 19:22:00 +0100
Subject: [Biojava-l] unsupervised training of transition weights
In-Reply-To: <200603311805.25861.matthew.pocock@ncl.ac.uk>
References: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
	<5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>
	<200603311805.25861.matthew.pocock@ncl.ac.uk>
Message-ID: <e554425b0604041122y2cbf4012g57f3d2069e7ca219@mail.gmail.com>

Thanks for your advice! I am able to train a subset of transition
probabilities now!

I found something strange, first I changed my emission distributions
to untrainabledistributions and  the trainer didn't seem to be doing
anything, all cycles have the same score. I then changed it back to
SimpleDistribution (still keepting my getWeightImp in my custom
distribution). this time it works and it doesn't seem to be modifying
my emission probabilities. So it works for me - I am just curious if
it is a bug or if I was doing something wrong?

Thanks again!
wendy


On 3/31/06, Matthew Pocock <matthew.pocock at ncl.ac.uk> wrote:
> > The DP code does some caching of probabilities, I don't think there's
> > any way to turn this off without modifying the DP implementations.
> >
> >            Thomas.
>
> My reccolection is that if you did turn this off, the algorithm would run
> very, very much more slowly. Internally to the DP objects, the distribution
> probabilities (in fact, they aren't even probabilities by this stage) are
> stored in a data-structure optimized for the type of lookups performed during
> the dynamic programming recursions.
>
> Matthew
>


From mthomasc at vub.ac.be  Fri Apr  7 09:20:33 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Fri, 07 Apr 2006 11:20:33 +0200
Subject: [Biojava-l] [biojavax] EMBL parser error
Message-ID: <44362EE1.5060804@vub.ac.be>

Hello,

I am currently using biojavax that I checked out today from CVS to parse 
an EMBL file, exported from EBI SRS server.

I ran into this error :

Exception in thread "main" org.biojava.bio.BioException: Could not read 
sequence
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
    at 
org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
    at 
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
    ... 1 more

The EMBL file is :

ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
XX
AC   DQ158013;
XX
SV   DQ158013.1
XX
DT   19-JAN-2006 (Rel. 86, Created)
DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
XX
DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.

Removing the two lines that comprise the date information resolves the 
problem.

Thanks,

Morgane.

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student
Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From richard.holland at ebi.ac.uk  Fri Apr  7 09:56:57 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 07 Apr 2006 10:56:57 +0100
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <44362EE1.5060804@vub.ac.be>
References: <44362EE1.5060804@vub.ac.be>
Message-ID: <1144403817.3958.30.camel@texas.ebi.ac.uk>

That was indeed a bug. I have made a change to the date parsing in
EMBLFormat and committed it to CVS. Could you test it for me please?

cheers,
Richard

On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
> Hello,
> 
> I am currently using biojavax that I checked out today from CVS to parse 
> an EMBL file, exported from EBI SRS server.
> 
> I ran into this error :
> 
> Exception in thread "main" org.biojava.bio.BioException: Could not read 
> sequence
>     at 
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>     at 
> org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
>     at 
> org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
>     at 
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>     ... 1 more
> 
> The EMBL file is :
> 
> ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> XX
> AC   DQ158013;
> XX
> SV   DQ158013.1
> XX
> DT   19-JAN-2006 (Rel. 86, Created)
> DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> XX
> DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> 
> Removing the two lines that comprise the date information resolves the 
> problem.
> 
> Thanks,
> 
> Morgane.
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From mthomasc at vub.ac.be  Fri Apr  7 12:18:36 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Fri, 07 Apr 2006 14:18:36 +0200
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <1144403817.3958.30.camel@texas.ebi.ac.uk>
References: <44362EE1.5060804@vub.ac.be>
	<1144403817.3958.30.camel@texas.ebi.ac.uk>
Message-ID: <4436589C.8010501@vub.ac.be>

I now get another error message with the same file :

Exception in thread "main" org.biojava.bio.BioException: Could not read 
sequence
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
    at 
org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
Caused by: java.lang.IndexOutOfBoundsException: No group 5
    at java.util.regex.Matcher.group(Matcher.java:355)
    at 
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
    ... 1 more

Here is the complete file, for info:

ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
XX
AC   DQ158013;
XX
SV   DQ158013.1
XX
DT   19-JAN-2006 (Rel. 86, Created)
DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
XX
DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
XX
KW   .
XX
OS   Triturus helveticus (palmate newt)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Amphibia;
OC   Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
XX
RN   [1]
RP   1-118
RX   DOI; 10.1016/j.ympev.2005.08.012.
RX   PUBMED; 16198128.
RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
RT   "A PCR survey for posterior Hox genes in amphibians";
RL   Mol. Phylogenet. Evol. 38(2):449-458(2006).
XX
RN   [2]
RP   1-118
RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
RT   ;
RL   Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
RL   Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, 
Brussels 1050,
RL   Belgium
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..118
FT                   /organism="Triturus helveticus"
FT                   /mol_type="genomic DNA"
FT                   /clone="Thel.b9"
FT                   /db_xref="taxon:256425"
FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"
FT   mRNA            <1..>118
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT   CDS             <1..>118
FT                   /codon_start=2
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
FT                   /protein_id="ABA39736.1"
FT                   /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
XX
SQ   Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
     caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc 
tcacccggga        60
     ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca 
agatctgg         118
//

Thanks for helping,

Morgane.

Richard Holland wrote:

>That was indeed a bug. I have made a change to the date parsing in
>EMBLFormat and committed it to CVS. Could you test it for me please?
>
>cheers,
>Richard
>
>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
>  
>
>>Hello,
>>
>>I am currently using biojavax that I checked out today from CVS to parse 
>>an EMBL file, exported from EBI SRS server.
>>
>>I ran into this error :
>>
>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>>sequence
>>    at 
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>>    at 
>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
>>    at 
>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
>>    at 
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>>    ... 1 more
>>
>>The EMBL file is :
>>
>>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
>>XX
>>AC   DQ158013;
>>XX
>>SV   DQ158013.1
>>XX
>>DT   19-JAN-2006 (Rel. 86, Created)
>>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
>>XX
>>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
>>
>>Removing the two lines that comprise the date information resolves the 
>>problem.
>>
>>Thanks,
>>
>>Morgane.
>>
>>    
>>

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From richard.holland at ebi.ac.uk  Fri Apr  7 12:48:46 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 07 Apr 2006 13:48:46 +0100
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <4436589C.8010501@vub.ac.be>
References: <44362EE1.5060804@vub.ac.be>
	<1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be>
Message-ID: <1144414126.3958.32.camel@texas.ebi.ac.uk>

Sorry, my bad. An off-by-one error... 

Check it out again and see if it works now.

cheers,
Richard

PS. I don't have any EMBL files to test with at the moment otherwise I'd
check it myself... :)


On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote:
> I now get another error message with the same file :
> 
> Exception in thread "main" org.biojava.bio.BioException: Could not read 
> sequence
>     at 
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>     at 
> org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> Caused by: java.lang.IndexOutOfBoundsException: No group 5
>     at java.util.regex.Matcher.group(Matcher.java:355)
>     at 
> org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
>     at 
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>     ... 1 more
> 
> Here is the complete file, for info:
> 
> ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> XX
> AC   DQ158013;
> XX
> SV   DQ158013.1
> XX
> DT   19-JAN-2006 (Rel. 86, Created)
> DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> XX
> DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> XX
> KW   .
> XX
> OS   Triturus helveticus (palmate newt)
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
> Amphibia;
> OC   Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
> XX
> RN   [1]
> RP   1-118
> RX   DOI; 10.1016/j.ympev.2005.08.012.
> RX   PUBMED; 16198128.
> RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
> RT   "A PCR survey for posterior Hox genes in amphibians";
> RL   Mol. Phylogenet. Evol. 38(2):449-458(2006).
> XX
> RN   [2]
> RP   1-118
> RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
> RT   ;
> RL   Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
> RL   Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, 
> Brussels 1050,
> RL   Belgium
> XX
> FH   Key             Location/Qualifiers
> FH
> FT   source          1..118
> FT                   /organism="Triturus helveticus"
> FT                   /mol_type="genomic DNA"
> FT                   /clone="Thel.b9"
> FT                   /db_xref="taxon:256425"
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> FT                   /codon_start=2
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
> FT                   /protein_id="ABA39736.1"
> FT                   /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
> XX
> SQ   Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
>      caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc 
> tcacccggga        60
>      ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca 
> agatctgg         118
> //
> 
> Thanks for helping,
> 
> Morgane.
> 
> Richard Holland wrote:
> 
> >That was indeed a bug. I have made a change to the date parsing in
> >EMBLFormat and committed it to CVS. Could you test it for me please?
> >
> >cheers,
> >Richard
> >
> >On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
> >  
> >
> >>Hello,
> >>
> >>I am currently using biojavax that I checked out today from CVS to parse 
> >>an EMBL file, exported from EBI SRS server.
> >>
> >>I ran into this error :
> >>
> >>Exception in thread "main" org.biojava.bio.BioException: Could not read 
> >>sequence
> >>    at 
> >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
> >>    at 
> >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> >>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
> >>    at 
> >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
> >>    at 
> >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
> >>    ... 1 more
> >>
> >>The EMBL file is :
> >>
> >>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> >>XX
> >>AC   DQ158013;
> >>XX
> >>SV   DQ158013.1
> >>XX
> >>DT   19-JAN-2006 (Rel. 86, Created)
> >>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> >>XX
> >>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> >>
> >>Removing the two lines that comprise the date information resolves the 
> >>problem.
> >>
> >>Thanks,
> >>
> >>Morgane.
> >>
> >>    
> >>
> 
> -- 
> **********************************************************
> Morgane THOMAS-CHOLLIER, PHD Student
> 
> Vrije Universiteit Brussels (VUB)
> Laboratory of Cell Genetics
> Pleinlaan 2
> 1050 Brussels
> Belgium
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From richard.holland at ebi.ac.uk  Fri Apr  7 13:42:10 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 07 Apr 2006 14:42:10 +0100
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <44366419.4050505@dbm.ulb.ac.be>
References: <44362EE1.5060804@vub.ac.be>
	<1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be>
	<1144414126.3958.32.camel@texas.ebi.ac.uk>
	<44366419.4050505@dbm.ulb.ac.be>
Message-ID: <1144417330.3958.34.camel@texas.ebi.ac.uk>

Hi. Someone else had checked in a change to a different class, but that
change was incorrect and didn't compile. It should compile now.

cheers,
Richard

PS. Note to all those who commit changes - PLEASE check your code
compiles first before committing it! 

On Fri, 2006-04-07 at 15:07 +0200, Morgane THOMAS-CHOLLIER wrote:
> I tried to checkout biojava-live but it seems I cannot build it anymore. 
> I get the following error :
> 
> compile-biojava:
>     [javac] Compiling 1321 source files to 
> /Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/ant-build/classes/biojava
>     [javac] 
> /Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/src/org/biojavax/utils/StringTools.java:97: 
> exception java.io.IOException is never thrown in body of corresponding 
> try statement
>     [javac]           } catch (IOException e) {
>     [javac]             ^
>     [javac] Note: Some input files use or override a deprecated API.
>     [javac] Note: Recompile with -deprecation for details.
>     [javac] 1 error
> 
> I use Mac OS X 10.3.9, java 1.4.2.
> 
> Hope you could help,
> 
> Cheers,
> 
> Morgane.
> 
> 
> Richard Holland wrote:
> 
> >Sorry, my bad. An off-by-one error... 
> >
> >Check it out again and see if it works now.
> >
> >cheers,
> >Richard
> >
> >PS. I don't have any EMBL files to test with at the moment otherwise I'd
> >check it myself... :)
> >
> >
> >On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote:
> >  
> >
> >>I now get another error message with the same file :
> >>
> >>Exception in thread "main" org.biojava.bio.BioException: Could not read 
> >>sequence
> >>    at 
> >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
> >>    at 
> >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> >>Caused by: java.lang.IndexOutOfBoundsException: No group 5
> >>    at java.util.regex.Matcher.group(Matcher.java:355)
> >>    at 
> >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
> >>    at 
> >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
> >>    ... 1 more
> >>
> >>Here is the complete file, for info:
> >>
> >>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> >>XX
> >>AC   DQ158013;
> >>XX
> >>SV   DQ158013.1
> >>XX
> >>DT   19-JAN-2006 (Rel. 86, Created)
> >>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> >>XX
> >>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> >>XX
> >>KW   .
> >>XX
> >>OS   Triturus helveticus (palmate newt)
> >>OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
> >>Amphibia;
> >>OC   Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
> >>XX
> >>RN   [1]
> >>RP   1-118
> >>RX   DOI; 10.1016/j.ympev.2005.08.012.
> >>RX   PUBMED; 16198128.
> >>RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
> >>RT   "A PCR survey for posterior Hox genes in amphibians";
> >>RL   Mol. Phylogenet. Evol. 38(2):449-458(2006).
> >>XX
> >>RN   [2]
> >>RP   1-118
> >>RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
> >>RT   ;
> >>RL   Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
> >>RL   Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, 
> >>Brussels 1050,
> >>RL   Belgium
> >>XX
> >>FH   Key             Location/Qualifiers
> >>FH
> >>FT   source          1..118
> >>FT                   /organism="Triturus helveticus"
> >>FT                   /mol_type="genomic DNA"
> >>FT                   /clone="Thel.b9"
> >>FT                   /db_xref="taxon:256425"
> >>FT   gene            <1..>118
> >>FT                   /gene="Hoxb9"
> >>FT                   /note="Hoxb-9"
> >>FT   mRNA            <1..>118
> >>FT                   /gene="Hoxb9"
> >>FT                   /product="HOXB9"
> >>FT   CDS             <1..>118
> >>FT                   /codon_start=2
> >>FT                   /gene="Hoxb9"
> >>FT                   /product="HOXB9"
> >>FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
> >>FT                   /protein_id="ABA39736.1"
> >>FT                   /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
> >>XX
> >>SQ   Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
> >>     caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc 
> >>tcacccggga        60
> >>     ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca 
> >>agatctgg         118
> >>//
> >>
> >>Thanks for helping,
> >>
> >>Morgane.
> >>
> >>Richard Holland wrote:
> >>
> >>    
> >>
> >>>That was indeed a bug. I have made a change to the date parsing in
> >>>EMBLFormat and committed it to CVS. Could you test it for me please?
> >>>
> >>>cheers,
> >>>Richard
> >>>
> >>>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
> >>> 
> >>>
> >>>      
> >>>
> >>>>Hello,
> >>>>
> >>>>I am currently using biojavax that I checked out today from CVS to parse 
> >>>>an EMBL file, exported from EBI SRS server.
> >>>>
> >>>>I ran into this error :
> >>>>
> >>>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
> >>>>sequence
> >>>>   at 
> >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
> >>>>   at 
> >>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
> >>>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
> >>>>   at 
> >>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
> >>>>   at 
> >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
> >>>>   ... 1 more
> >>>>
> >>>>The EMBL file is :
> >>>>
> >>>>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
> >>>>XX
> >>>>AC   DQ158013;
> >>>>XX
> >>>>SV   DQ158013.1
> >>>>XX
> >>>>DT   19-JAN-2006 (Rel. 86, Created)
> >>>>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
> >>>>XX
> >>>>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
> >>>>
> >>>>Removing the two lines that comprise the date information resolves the 
> >>>>problem.
> >>>>
> >>>>Thanks,
> >>>>
> >>>>Morgane.
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>-- 
> >>**********************************************************
> >>Morgane THOMAS-CHOLLIER, PHD Student
> >>
> >>Vrije Universiteit Brussels (VUB)
> >>Laboratory of Cell Genetics
> >>Pleinlaan 2
> >>1050 Brussels
> >>Belgium
> >>
> >>    
> >>
> 
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From andreas.draeger at clever-telefonieren.de  Fri Apr  7 15:43:35 2006
From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=)
Date: Fri, 07 Apr 2006 17:43:35 +0200
Subject: [Biojava-l] Senseless assignment
Message-ID: <443688A7.1000203@clever-telefonieren.de>

Hi,

This assignment has no effect in class 
org.biojavax.ontology.SimpleComparableTriple:

    // Hibernate requirement - not for public use.
    private void setOntology(ComparableOntology descriptors) { 
this.ontology = ontology; }

I do not know why this is necessary.

Andreas

-- 
==================================
Andreas Dr?ger
PhD student
Eberhard Karls University T?bingen
Center for Bioinformatics (ZBIT)
Phone: +49-7071-29-70436
==================================


From richard.holland at ebi.ac.uk  Mon Apr 10 09:26:51 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Mon, 10 Apr 2006 10:26:51 +0100
Subject: [Biojava-l] Senseless assignment
In-Reply-To: <443688A7.1000203@clever-telefonieren.de>
References: <443688A7.1000203@clever-telefonieren.de>
Message-ID: <1144661211.3951.9.camel@texas.ebi.ac.uk>

It's a typo. The method declaration should read:

  	// Hibernate requirement - not for public use.
	private void setOntology(ComparableOntology ontology) {
		this.ontoloy = ontology;
	}

I have fixed it in CVS.

cheers,
Richard

On Fri, 2006-04-07 at 17:43 +0200, Andreas Dr?ger wrote:
> Hi,
> 
> This assignment has no effect in class 
> org.biojavax.ontology.SimpleComparableTriple:
> 
>     // Hibernate requirement - not for public use.
>     private void setOntology(ComparableOntology descriptors) { 
> this.ontology = ontology; }
> 
> I do not know why this is necessary.
> 
> Andreas
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From mthomasc at vub.ac.be  Sat Apr  8 08:20:47 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Sat, 08 Apr 2006 10:20:47 +0200
Subject: [Biojava-l] [biojavax] EMBL parser error
In-Reply-To: <1144417330.3958.34.camel@texas.ebi.ac.uk>
References: <44362EE1.5060804@vub.ac.be>	
	<1144403817.3958.30.camel@texas.ebi.ac.uk>
	<4436589C.8010501@vub.ac.be>	
	<1144414126.3958.32.camel@texas.ebi.ac.uk>
	<44366419.4050505@dbm.ulb.ac.be>
	<1144417330.3958.34.camel@texas.ebi.ac.uk>
Message-ID: <4437725F.9000503@vub.ac.be>

It works fine now !

Thanks for your help,

cheers,

Morgane.


Richard Holland wrote:

>Hi. Someone else had checked in a change to a different class, but that
>change was incorrect and didn't compile. It should compile now.
>
>cheers,
>Richard
>
>PS. Note to all those who commit changes - PLEASE check your code
>compiles first before committing it! 
>
>On Fri, 2006-04-07 at 15:07 +0200, Morgane THOMAS-CHOLLIER wrote:
>  
>
>>I tried to checkout biojava-live but it seems I cannot build it anymore. 
>>I get the following error :
>>
>>compile-biojava:
>>    [javac] Compiling 1321 source files to 
>>/Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/ant-build/classes/biojava
>>    [javac] 
>>/Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/src/org/biojavax/utils/StringTools.java:97: 
>>exception java.io.IOException is never thrown in body of corresponding 
>>try statement
>>    [javac]           } catch (IOException e) {
>>    [javac]             ^
>>    [javac] Note: Some input files use or override a deprecated API.
>>    [javac] Note: Recompile with -deprecation for details.
>>    [javac] 1 error
>>
>>I use Mac OS X 10.3.9, java 1.4.2.
>>
>>Hope you could help,
>>
>>Cheers,
>>
>>Morgane.
>>
>>
>>Richard Holland wrote:
>>
>>    
>>
>>>Sorry, my bad. An off-by-one error... 
>>>
>>>Check it out again and see if it works now.
>>>
>>>cheers,
>>>Richard
>>>
>>>PS. I don't have any EMBL files to test with at the moment otherwise I'd
>>>check it myself... :)
>>>
>>>
>>>On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote:
>>> 
>>>
>>>      
>>>
>>>>I now get another error message with the same file :
>>>>
>>>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>>>>sequence
>>>>   at 
>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>>>>   at 
>>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
>>>>Caused by: java.lang.IndexOutOfBoundsException: No group 5
>>>>   at java.util.regex.Matcher.group(Matcher.java:355)
>>>>   at 
>>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
>>>>   at 
>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>>>>   ... 1 more
>>>>
>>>>Here is the complete file, for info:
>>>>
>>>>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
>>>>XX
>>>>AC   DQ158013;
>>>>XX
>>>>SV   DQ158013.1
>>>>XX
>>>>DT   19-JAN-2006 (Rel. 86, Created)
>>>>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
>>>>XX
>>>>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
>>>>XX
>>>>KW   .
>>>>XX
>>>>OS   Triturus helveticus (palmate newt)
>>>>OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
>>>>Amphibia;
>>>>OC   Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
>>>>XX
>>>>RN   [1]
>>>>RP   1-118
>>>>RX   DOI; 10.1016/j.ympev.2005.08.012.
>>>>RX   PUBMED; 16198128.
>>>>RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
>>>>RT   "A PCR survey for posterior Hox genes in amphibians";
>>>>RL   Mol. Phylogenet. Evol. 38(2):449-458(2006).
>>>>XX
>>>>RN   [2]
>>>>RP   1-118
>>>>RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
>>>>RT   ;
>>>>RL   Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
>>>>RL   Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, 
>>>>Brussels 1050,
>>>>RL   Belgium
>>>>XX
>>>>FH   Key             Location/Qualifiers
>>>>FH
>>>>FT   source          1..118
>>>>FT                   /organism="Triturus helveticus"
>>>>FT                   /mol_type="genomic DNA"
>>>>FT                   /clone="Thel.b9"
>>>>FT                   /db_xref="taxon:256425"
>>>>FT   gene            <1..>118
>>>>FT                   /gene="Hoxb9"
>>>>FT                   /note="Hoxb-9"
>>>>FT   mRNA            <1..>118
>>>>FT                   /gene="Hoxb9"
>>>>FT                   /product="HOXB9"
>>>>FT   CDS             <1..>118
>>>>FT                   /codon_start=2
>>>>FT                   /gene="Hoxb9"
>>>>FT                   /product="HOXB9"
>>>>FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
>>>>FT                   /protein_id="ABA39736.1"
>>>>FT                   /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
>>>>XX
>>>>SQ   Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
>>>>    caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc 
>>>>tcacccggga        60
>>>>    ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca 
>>>>agatctgg         118
>>>>//
>>>>
>>>>Thanks for helping,
>>>>
>>>>Morgane.
>>>>
>>>>Richard Holland wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>That was indeed a bug. I have made a change to the date parsing in
>>>>>EMBLFormat and committed it to CVS. Could you test it for me please?
>>>>>
>>>>>cheers,
>>>>>Richard
>>>>>
>>>>>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Hello,
>>>>>>
>>>>>>I am currently using biojavax that I checked out today from CVS to parse 
>>>>>>an EMBL file, exported from EBI SRS server.
>>>>>>
>>>>>>I ran into this error :
>>>>>>
>>>>>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>>>>>>sequence
>>>>>>  at 
>>>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>>>>>>  at 
>>>>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
>>>>>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
>>>>>>  at 
>>>>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
>>>>>>  at 
>>>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>>>>>>  ... 1 more
>>>>>>
>>>>>>The EMBL file is :
>>>>>>
>>>>>>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
>>>>>>XX
>>>>>>AC   DQ158013;
>>>>>>XX
>>>>>>SV   DQ158013.1
>>>>>>XX
>>>>>>DT   19-JAN-2006 (Rel. 86, Created)
>>>>>>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
>>>>>>XX
>>>>>>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
>>>>>>
>>>>>>Removing the two lines that comprise the date information resolves the 
>>>>>>problem.
>>>>>>
>>>>>>Thanks,
>>>>>>
>>>>>>Morgane.
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>-- 
>>>>**********************************************************
>>>>Morgane THOMAS-CHOLLIER, PHD Student
>>>>
>>>>Vrije Universiteit Brussels (VUB)
>>>>Laboratory of Cell Genetics
>>>>Pleinlaan 2
>>>>1050 Brussels
>>>>Belgium
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>    
>>


-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From mthomasc at vub.ac.be  Wed Apr 12 08:34:43 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Wed, 12 Apr 2006 10:34:43 +0200
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing
Message-ID: <443CBBA3.9070101@vub.ac.be>

Hello again,

I am currently using biojavax to parse EMBL files exported from Ensembl 
website.

Compared to the EBI files I have, they show a difference in the Features 
lines :

sometimes, only one "/word" is present. ie:

EBI file :

FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"

Ensembl file;

FT   gene         complement(1..3218)
FT                   /gene="ENSMUSG00000038227"

The problem I encounter is that the parser correctly convert the "/word" 
into a Note, but the Note is then in relation with the immediate 
following feature (ie: mRNA).
The current gene feature thus has no annotation.

This behavior is reproducible when removing one "/word" of an EBI file.

Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a 
feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
with an incomplete Note, as the parser seems to split on "=" to separate 
the Key and the Value.

Thanks for your help,

Morgane.

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From jolyon.holdstock at ogt.co.uk  Thu Apr 13 16:42:36 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Thu, 13 Apr 2006 17:42:36 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>

Hi Morgane,

I have amended the EmblFormat readSection method as below and the
parsing seems to work; please test it.

I think that the last bit of annotation is carried over into the next
feature so before adding the new feature I dump the annotation and reset
currentTag and currentVal.

if (!line.startsWith(" ")) {
//--------- new code starts ---------------------------
  if (currentTag!=null) {
    section.add(new String[]{currentTag,currentVal.toString()});
    currentTag = null;
    currentVal = null;
  }
//--------- new code ends -----------------------------
// case 1 : word value - splits into key-value on its own
  section.add(line.split("\\s+"));
}

Cheers,

Jolyon


-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org
[mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
THOMAS-CHOLLIER
Sent: 12 April 2006 09:35
To: biojava-l at open-bio.org
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]

Hello again,

I am currently using biojavax to parse EMBL files exported from Ensembl 
website.

Compared to the EBI files I have, they show a difference in the Features

lines :

sometimes, only one "/word" is present. ie:

EBI file :

FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"

Ensembl file;

FT   gene         complement(1..3218)
FT                   /gene="ENSMUSG00000038227"

The problem I encounter is that the parser correctly convert the "/word"

into a Note, but the Note is then in relation with the immediate 
following feature (ie: mRNA).
The current gene feature thus has no annotation.

This behavior is reproducible when removing one "/word" of an EBI file.

Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a

feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
with an incomplete Note, as the parser seems to split on "=" to separate

the Key and the Value.

Thanks for your help,

Morgane.

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


From david at autohandle.com  Fri Apr 14 21:29:51 2006
From: david at autohandle.com (David Scott)
Date: Fri, 14 Apr 2006 14:29:51 -0700
Subject: [Biojava-l] BioJavaX.html
Message-ID: <4440144F.7010603@autohandle.com>

is BioJavaX.html posted somewhere - i am getting an 
ArrayIndexOutofBoundException on the build.

thanks


From david at autohandle.com  Fri Apr 14 21:20:47 2006
From: david at autohandle.com (David Scott)
Date: Fri, 14 Apr 2006 14:20:47 -0700
Subject: [Biojava-l] BioJavaX.html
Message-ID: <4440122F.2080809@autohandle.com>

is it possible to post the BioJavaX.html somewhere - i am getting an 
ArrayIndexOutOfBoundsException on the build  docbook. i used google - 
but could not locate it.

thanks-


From mark.schreiber at novartis.com  Sat Apr 15 23:19:13 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Sun, 16 Apr 2006 07:19:13 +0800
Subject: [Biojava-l] BioJavaX.html
Message-ID: <OF7DC239B8.70535343-ON48257151.007FC85C-48257151.00801C4C@EU.novartis.net>

Could someone post the text to the wiki site temporarily. Actually it may 
be more sensible for this document to be hosted as a wiki page. The wiki 
was not available at the time that Richard wrote it so moving it may be a 
good idea. Any objections?

Additionally some platforms have trouble building docbook html from ant 
(especially platforms developed in Redmond WA which we don't speak of).

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


David Scott <david at autohandle.com>
Sent by: biojava-l-bounces at lists.open-bio.org
04/15/2006 05:20 AM

 
        To:     biojava-l at biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BioJavaX.html


is it possible to post the BioJavaX.html somewhere - i am getting an 
ArrayIndexOutOfBoundsException on the build  docbook. i used google - 
but could not locate it.

thanks-

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From richard.holland at ebi.ac.uk  Tue Apr 18 09:21:49 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 18 Apr 2006 10:21:49 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing
In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>
References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>
Message-ID: <1145352109.4188.3.camel@texas.ebi.ac.uk>

I have committed an UNTESTED patch based on Jolyon's suggestion, and
also attempted to fix the split-on-equals problem Morgane observed. 

Please let me know if there are any problems with it.

As this problem affected the UniProt parser in a similar manner (much of
the code is identical), the same fixes were applied there too.

cheers,
Richard

On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> Hi Morgane,
> 
> I have amended the EmblFormat readSection method as below and the
> parsing seems to work; please test it.
> 
> I think that the last bit of annotation is carried over into the next
> feature so before adding the new feature I dump the annotation and reset
> currentTag and currentVal.
> 
> if (!line.startsWith(" ")) {
> //--------- new code starts ---------------------------
>   if (currentTag!=null) {
>     section.add(new String[]{currentTag,currentVal.toString()});
>     currentTag = null;
>     currentVal = null;
>   }
> //--------- new code ends -----------------------------
> // case 1 : word value - splits into key-value on its own
>   section.add(line.split("\\s+"));
> }
> 
> Cheers,
> 
> Jolyon
> 
> 
> 
> -----Original Message-----
> From: biojava-l-bounces at lists.open-bio.org
> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
> THOMAS-CHOLLIER
> Sent: 12 April 2006 09:35
> To: biojava-l at open-bio.org
> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
> 
> Hello again,
> 
> I am currently using biojavax to parse EMBL files exported from Ensembl 
> website.
> 
> Compared to the EBI files I have, they show a difference in the Features
> 
> lines :
> 
> sometimes, only one "/word" is present. ie:
> 
> EBI file :
> 
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> 
> Ensembl file;
> 
> FT   gene         complement(1..3218)
> FT                   /gene="ENSMUSG00000038227"
> 
> The problem I encounter is that the parser correctly convert the "/word"
> 
> into a Note, but the Note is then in relation with the immediate 
> following feature (ie: mRNA).
> The current gene feature thus has no annotation.
> 
> This behavior is reproducible when removing one "/word" of an EBI file.
> 
> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a
> 
> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
> with an incomplete Note, as the parser seems to split on "=" to separate
> 
> the Key and the Value.
> 
> Thanks for your help,
> 
> Morgane.
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From richard.holland at ebi.ac.uk  Tue Apr 18 08:20:44 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 18 Apr 2006 09:20:44 +0100
Subject: [Biojava-l] BioJavaX.html
In-Reply-To: <OF7DC239B8.70535343-ON48257151.007FC85C-48257151.00801C4C@EU.novartis.net>
References: <OF7DC239B8.70535343-ON48257151.007FC85C-48257151.00801C4C@EU.novartis.net>
Message-ID: <1145348444.4188.0.camel@texas.ebi.ac.uk>

HTML version attached. I've created a placeholder on the BioJava website
- could someone convert it who has the time? :)

cheers,
Richard


On Sun, 2006-04-16 at 07:19 +0800, mark.schreiber at novartis.com wrote:
> Could someone post the text to the wiki site temporarily. Actually it may 
> be more sensible for this document to be hosted as a wiki page. The wiki 
> was not available at the time that Richard wrote it so moving it may be a 
> good idea. Any objections?
> 
> Additionally some platforms have trouble building docbook html from ant 
> (especially platforms developed in Redmond WA which we don't speak of).
> 
> - Mark
> 
> Mark Schreiber
> Research Investigator (Bioinformatics)
> 
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
> 
> phone +65 6722 2973
> fax  +65 6722 2910
> 
> 
> 
> 
> 
> David Scott <david at autohandle.com>
> Sent by: biojava-l-bounces at lists.open-bio.org
> 04/15/2006 05:20 AM
> 
>  
>         To:     biojava-l at biojava.org
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] BioJavaX.html
> 
> 
> is it possible to post the BioJavaX.html somewhere - i am getting an 
> ArrayIndexOutOfBoundsException on the build  docbook. i used google - 
> but could not locate it.
> 
> thanks-
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20060418/f6e5bb6b/attachment-0002.html>

From J.L.Sharman at sms.ed.ac.uk  Wed Apr 19 09:35:14 2006
From: J.L.Sharman at sms.ed.ac.uk (Joanna Sharman)
Date: Wed, 19 Apr 2006 10:35:14 +0100
Subject: [Biojava-l] Pairwise Alignment
Message-ID: <20060419103514.rwtqmzy00k0ogog8@www.sms.ed.ac.uk>

Hello,

I'm new to BioJava so I'm sorry if this question has been asked several
times before.

This is actually sort of in reply to this message from last month:

http://lists.open-bio.org/pipermail/biojava-l/2006-March/005365.html

I'd like to perform a simple pairwise alignment using the
Smith-Waterman class I saw described here:

http://www.biojava.org/wiki/BioJava:CookBook:DP:PairWise2

but I can't find the classes it mentions anywhere on the cvs.  Can you
point me to where they are?

Also, I'm just wondering why the HMM method is preferred to the
Smith-Waterman (or others)?  It seems quite complicated to me, and like
it might require more memory, or am I wrong? :)

Cheers,
Joanna


From mthomasc at vub.ac.be  Thu Apr 20 09:35:54 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Thu, 20 Apr 2006 11:35:54 +0200
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing
In-Reply-To: <1145352109.4188.3.camel@texas.ebi.ac.uk>
References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>
	<1145352109.4188.3.camel@texas.ebi.ac.uk>
Message-ID: <444755FA.7030009@vub.ac.be>

Hi,

I have tested today's version from CVS.

Both EBI and Ensembl files now react the same way.
The last annotation of a feature is nevertheless related to its 
immediate following feature.
e.g. :

FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"
FT   mRNA            <1..>118
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT   CDS             <1..>118

/note="Hoxb-9" is related to mRNA
/product="HOXB9" is related to CDS

Concerning the split-on-equals problem, I still observe the problem :

 [(#2) biojavax:note: transcript_i]

for this annotation :  /note="transcript_id=ENSMUST00000048680"

Thanks for helping,

Cheers,

Morgane.

Richard Holland wrote:
> I have committed an UNTESTED patch based on Jolyon's suggestion, and
> also attempted to fix the split-on-equals problem Morgane observed. 
>
> Please let me know if there are any problems with it.
>
> As this problem affected the UniProt parser in a similar manner (much of
> the code is identical), the same fixes were applied there too.
>
> cheers,
> Richard
>
> On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
>   
>> Hi Morgane,
>>
>> I have amended the EmblFormat readSection method as below and the
>> parsing seems to work; please test it.
>>
>> I think that the last bit of annotation is carried over into the next
>> feature so before adding the new feature I dump the annotation and reset
>> currentTag and currentVal.
>>
>> if (!line.startsWith(" ")) {
>> //--------- new code starts ---------------------------
>>   if (currentTag!=null) {
>>     section.add(new String[]{currentTag,currentVal.toString()});
>>     currentTag = null;
>>     currentVal = null;
>>   }
>> //--------- new code ends -----------------------------
>> // case 1 : word value - splits into key-value on its own
>>   section.add(line.split("\\s+"));
>> }
>>
>> Cheers,
>>
>> Jolyon
>>
>>
>>
>> -----Original Message-----
>> From: biojava-l-bounces at lists.open-bio.org
>> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
>> THOMAS-CHOLLIER
>> Sent: 12 April 2006 09:35
>> To: biojava-l at open-bio.org
>> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
>>
>> Hello again,
>>
>> I am currently using biojavax to parse EMBL files exported from Ensembl 
>> website.
>>
>> Compared to the EBI files I have, they show a difference in the Features
>>
>> lines :
>>
>> sometimes, only one "/word" is present. ie:
>>
>> EBI file :
>>
>> FT   gene            <1..>118
>> FT                   /gene="Hoxb9"
>> FT                   /note="Hoxb-9"
>>
>> Ensembl file;
>>
>> FT   gene         complement(1..3218)
>> FT                   /gene="ENSMUSG00000038227"
>>
>> The problem I encounter is that the parser correctly convert the "/word"
>>
>> into a Note, but the Note is then in relation with the immediate 
>> following feature (ie: mRNA).
>> The current gene feature thus has no annotation.
>>
>> This behavior is reproducible when removing one "/word" of an EBI file.
>>
>> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a
>>
>> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
>> with an incomplete Note, as the parser seems to split on "=" to separate
>>
>> the Key and the Value.
>>
>> Thanks for your help,
>>
>> Morgane.
>>
>>     

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc at vub.ac.be)

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium

Tel : +32 2 629 15 22
**********************************************************
Stop Using Internet Explorer, choose FIREFOX !


From richard.holland at ebi.ac.uk  Thu Apr 20 12:05:00 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 20 Apr 2006 13:05:00 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing
In-Reply-To: <444755FA.7030009@vub.ac.be>
References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com>
	<1145352109.4188.3.camel@texas.ebi.ac.uk> <444755FA.7030009@vub.ac.be>
Message-ID: <1145534700.4188.28.camel@texas.ebi.ac.uk>

Hi.

I made some small changes to the code, although nothing that would fix
this kind of problem, committed it back to CVS, checked it out again,
compiled, and ran a test program that read in an EMBL file with the
feature table you describe below, and output it in EMBL format to
another file. I then compared the two files... and found no differences!
The split-on-equals problem didn't occur, and all notes appeared
alongside their correct features.

Could there be a problem maybe with the script you are using?

I've really no idea what the problem is as I can't reproduce it based on
the current CVS contents!

cheers,
Richard

On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
> Hi,
> 
> I have tested today's version from CVS.
> 
> Both EBI and Ensembl files now react the same way.
> The last annotation of a feature is nevertheless related to its 
> immediate following feature.
> e.g. :
> 
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> 
> /note="Hoxb-9" is related to mRNA
> /product="HOXB9" is related to CDS
> 
> Concerning the split-on-equals problem, I still observe the problem :
> 
>  [(#2) biojavax:note: transcript_i]
> 
> for this annotation :  /note="transcript_id=ENSMUST00000048680"
> 
> Thanks for helping,
> 
> Cheers,
> 
> Morgane.
> 
> Richard Holland wrote:
> > I have committed an UNTESTED patch based on Jolyon's suggestion, and
> > also attempted to fix the split-on-equals problem Morgane observed. 
> >
> > Please let me know if there are any problems with it.
> >
> > As this problem affected the UniProt parser in a similar manner (much of
> > the code is identical), the same fixes were applied there too.
> >
> > cheers,
> > Richard
> >
> > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> >   
> >> Hi Morgane,
> >>
> >> I have amended the EmblFormat readSection method as below and the
> >> parsing seems to work; please test it.
> >>
> >> I think that the last bit of annotation is carried over into the next
> >> feature so before adding the new feature I dump the annotation and reset
> >> currentTag and currentVal.
> >>
> >> if (!line.startsWith(" ")) {
> >> //--------- new code starts ---------------------------
> >>   if (currentTag!=null) {
> >>     section.add(new String[]{currentTag,currentVal.toString()});
> >>     currentTag = null;
> >>     currentVal = null;
> >>   }
> >> //--------- new code ends -----------------------------
> >> // case 1 : word value - splits into key-value on its own
> >>   section.add(line.split("\\s+"));
> >> }
> >>
> >> Cheers,
> >>
> >> Jolyon
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: biojava-l-bounces at lists.open-bio.org
> >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
> >> THOMAS-CHOLLIER
> >> Sent: 12 April 2006 09:35
> >> To: biojava-l at open-bio.org
> >> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
> >>
> >> Hello again,
> >>
> >> I am currently using biojavax to parse EMBL files exported from Ensembl 
> >> website.
> >>
> >> Compared to the EBI files I have, they show a difference in the Features
> >>
> >> lines :
> >>
> >> sometimes, only one "/word" is present. ie:
> >>
> >> EBI file :
> >>
> >> FT   gene            <1..>118
> >> FT                   /gene="Hoxb9"
> >> FT                   /note="Hoxb-9"
> >>
> >> Ensembl file;
> >>
> >> FT   gene         complement(1..3218)
> >> FT                   /gene="ENSMUSG00000038227"
> >>
> >> The problem I encounter is that the parser correctly convert the "/word"
> >>
> >> into a Note, but the Note is then in relation with the immediate 
> >> following feature (ie: mRNA).
> >> The current gene feature thus has no annotation.
> >>
> >> This behavior is reproducible when removing one "/word" of an EBI file.
> >>
> >> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a
> >>
> >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up 
> >> with an incomplete Note, as the parser seems to split on "=" to separate
> >>
> >> the Key and the Value.
> >>
> >> Thanks for your help,
> >>
> >> Morgane.
> >>
> >>     
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From jolyon.holdstock at ogt.co.uk  Thu Apr 20 12:08:40 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Thu, 20 Apr 2006 13:08:40 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com>

I've run the sequence through the parser and it seems to work OK. I
iterate through the features and then iterate through the annotations of
that feature

Based on the input....

FT   source          1..118
FT                   /organism="Triturus helveticus"
FT                   /mol_type="genomic DNA"
FT                   /clone="Thel.b9"
FT                   /db_xref="taxon:256425"
FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"
FT   mRNA            <1..>118
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT   CDS             <1..>118
FT                   /codon_start=2
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
FT                   /protein_id="ABA39736.1"
FT
/translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"

The output is....

========================================
Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118)
Note: (#0) biojavax:mol_type: genomic DNA
Note: (#1) biojavax:clone: Thel.b9
========================================
Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>)
Note: (#2) biojavax:gene: Hoxb9
Note: (#3) biojavax:note: Hoxb-9
========================================
Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>)
Note: (#4) biojavax:gene: Hoxb9
Note: (#5) biojavax:product: HOXB9
========================================
Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>)
Note: (#6) biojavax:codon_start: 2
Note: (#7) biojavax:gene: Hoxb9
Note: (#8) biojavax:product: HOXB9
Note: (#9) biojavax:protein_id: ABA39736.1
Note: (#10) biojavax:translation:
KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
Note: (#11) biojavax:translation:
KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
=============================================

This looks OK, the one thing I've just noticed is that the last piece of
annotation of the last feature is assigned twice.

Jolyon


-----Original Message-----
From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
Sent: 20 April 2006 13:05
To: mthomas at dbm.ulb.ac.be
Cc: Jolyon Holdstock; biojava-l at open-bio.org
Subject: Re: [Biojava-l] [biojavax] EMBL parser : features
parsing[Scanned]

Hi.

I made some small changes to the code, although nothing that would fix
this kind of problem, committed it back to CVS, checked it out again,
compiled, and ran a test program that read in an EMBL file with the
feature table you describe below, and output it in EMBL format to
another file. I then compared the two files... and found no differences!
The split-on-equals problem didn't occur, and all notes appeared
alongside their correct features.

Could there be a problem maybe with the script you are using?

I've really no idea what the problem is as I can't reproduce it based on
the current CVS contents!

cheers,
Richard

On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
> Hi,
> 
> I have tested today's version from CVS.
> 
> Both EBI and Ensembl files now react the same way.
> The last annotation of a feature is nevertheless related to its 
> immediate following feature.
> e.g. :
> 
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> 
> /note="Hoxb-9" is related to mRNA
> /product="HOXB9" is related to CDS
> 
> Concerning the split-on-equals problem, I still observe the problem :
> 
>  [(#2) biojavax:note: transcript_i]
> 
> for this annotation :  /note="transcript_id=ENSMUST00000048680"
> 
> Thanks for helping,
> 
> Cheers,
> 
> Morgane.
> 
> Richard Holland wrote:
> > I have committed an UNTESTED patch based on Jolyon's suggestion, and
> > also attempted to fix the split-on-equals problem Morgane observed. 
> >
> > Please let me know if there are any problems with it.
> >
> > As this problem affected the UniProt parser in a similar manner
(much of
> > the code is identical), the same fixes were applied there too.
> >
> > cheers,
> > Richard
> >
> > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> >   
> >> Hi Morgane,
> >>
> >> I have amended the EmblFormat readSection method as below and the
> >> parsing seems to work; please test it.
> >>
> >> I think that the last bit of annotation is carried over into the
next
> >> feature so before adding the new feature I dump the annotation and
reset
> >> currentTag and currentVal.
> >>
> >> if (!line.startsWith(" ")) {
> >> //--------- new code starts ---------------------------
> >>   if (currentTag!=null) {
> >>     section.add(new String[]{currentTag,currentVal.toString()});
> >>     currentTag = null;
> >>     currentVal = null;
> >>   }
> >> //--------- new code ends -----------------------------
> >> // case 1 : word value - splits into key-value on its own
> >>   section.add(line.split("\\s+"));
> >> }
> >>
> >> Cheers,
> >>
> >> Jolyon
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: biojava-l-bounces at lists.open-bio.org
> >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
> >> THOMAS-CHOLLIER
> >> Sent: 12 April 2006 09:35
> >> To: biojava-l at open-bio.org
> >> Subject: [Biojava-l] [biojavax] EMBL parser : features
parsing[Scanned]
> >>
> >> Hello again,
> >>
> >> I am currently using biojavax to parse EMBL files exported from
Ensembl 
> >> website.
> >>
> >> Compared to the EBI files I have, they show a difference in the
Features
> >>
> >> lines :
> >>
> >> sometimes, only one "/word" is present. ie:
> >>
> >> EBI file :
> >>
> >> FT   gene            <1..>118
> >> FT                   /gene="Hoxb9"
> >> FT                   /note="Hoxb-9"
> >>
> >> Ensembl file;
> >>
> >> FT   gene         complement(1..3218)
> >> FT                   /gene="ENSMUSG00000038227"
> >>
> >> The problem I encounter is that the parser correctly convert the
"/word"
> >>
> >> into a Note, but the Note is then in relation with the immediate 
> >> following feature (ie: mRNA).
> >> The current gene feature thus has no annotation.
> >>
> >> This behavior is reproducible when removing one "/word" of an EBI
file.
> >>
> >> Apart from this issue, I noted that Ensembl EMBL files uses "="
inside a
> >>
> >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends
up 
> >> with an incomplete Note, as the parser seems to split on "=" to
separate
> >>
> >> the Key and the Value.
> >>
> >> Thanks for your help,
> >>
> >> Morgane.
> >>
> >>     
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


From richard.holland at ebi.ac.uk  Thu Apr 20 12:16:00 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 20 Apr 2006 13:16:00 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com>
References: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com>
Message-ID: <1145535361.4188.33.camel@texas.ebi.ac.uk>

Did you use the latest CVS version? (I committed a change that I think
should have fixed that about 1 minute before my previous email).


On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote:
> I've run the sequence through the parser and it seems to work OK. I
> iterate through the features and then iterate through the annotations of
> that feature
> 
> Based on the input....
> 
> FT   source          1..118
> FT                   /organism="Triturus helveticus"
> FT                   /mol_type="genomic DNA"
> FT                   /clone="Thel.b9"
> FT                   /db_xref="taxon:256425"
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> FT                   /codon_start=2
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
> FT                   /protein_id="ABA39736.1"
> FT
> /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
> 
> The output is....
> 
> ========================================
> Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118)
> Note: (#0) biojavax:mol_type: genomic DNA
> Note: (#1) biojavax:clone: Thel.b9
> ========================================
> Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>)
> Note: (#2) biojavax:gene: Hoxb9
> Note: (#3) biojavax:note: Hoxb-9
> ========================================
> Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>)
> Note: (#4) biojavax:gene: Hoxb9
> Note: (#5) biojavax:product: HOXB9
> ========================================
> Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>)
> Note: (#6) biojavax:codon_start: 2
> Note: (#7) biojavax:gene: Hoxb9
> Note: (#8) biojavax:product: HOXB9
> Note: (#9) biojavax:protein_id: ABA39736.1
> Note: (#10) biojavax:translation:
> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
> Note: (#11) biojavax:translation:
> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
> =============================================
> 
> This looks OK, the one thing I've just noticed is that the last piece of
> annotation of the last feature is assigned twice.
> 
> Jolyon
> 
> 
> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
> Sent: 20 April 2006 13:05
> To: mthomas at dbm.ulb.ac.be
> Cc: Jolyon Holdstock; biojava-l at open-bio.org
> Subject: Re: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
> 
> Hi.
> 
> I made some small changes to the code, although nothing that would fix
> this kind of problem, committed it back to CVS, checked it out again,
> compiled, and ran a test program that read in an EMBL file with the
> feature table you describe below, and output it in EMBL format to
> another file. I then compared the two files... and found no differences!
> The split-on-equals problem didn't occur, and all notes appeared
> alongside their correct features.
> 
> Could there be a problem maybe with the script you are using?
> 
> I've really no idea what the problem is as I can't reproduce it based on
> the current CVS contents!
> 
> cheers,
> Richard
> 
> On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
> > Hi,
> > 
> > I have tested today's version from CVS.
> > 
> > Both EBI and Ensembl files now react the same way.
> > The last annotation of a feature is nevertheless related to its 
> > immediate following feature.
> > e.g. :
> > 
> > FT   gene            <1..>118
> > FT                   /gene="Hoxb9"
> > FT                   /note="Hoxb-9"
> > FT   mRNA            <1..>118
> > FT                   /gene="Hoxb9"
> > FT                   /product="HOXB9"
> > FT   CDS             <1..>118
> > 
> > /note="Hoxb-9" is related to mRNA
> > /product="HOXB9" is related to CDS
> > 
> > Concerning the split-on-equals problem, I still observe the problem :
> > 
> >  [(#2) biojavax:note: transcript_i]
> > 
> > for this annotation :  /note="transcript_id=ENSMUST00000048680"
> > 
> > Thanks for helping,
> > 
> > Cheers,
> > 
> > Morgane.
> > 
> > Richard Holland wrote:
> > > I have committed an UNTESTED patch based on Jolyon's suggestion, and
> > > also attempted to fix the split-on-equals problem Morgane observed. 
> > >
> > > Please let me know if there are any problems with it.
> > >
> > > As this problem affected the UniProt parser in a similar manner
> (much of
> > > the code is identical), the same fixes were applied there too.
> > >
> > > cheers,
> > > Richard
> > >
> > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> > >   
> > >> Hi Morgane,
> > >>
> > >> I have amended the EmblFormat readSection method as below and the
> > >> parsing seems to work; please test it.
> > >>
> > >> I think that the last bit of annotation is carried over into the
> next
> > >> feature so before adding the new feature I dump the annotation and
> reset
> > >> currentTag and currentVal.
> > >>
> > >> if (!line.startsWith(" ")) {
> > >> //--------- new code starts ---------------------------
> > >>   if (currentTag!=null) {
> > >>     section.add(new String[]{currentTag,currentVal.toString()});
> > >>     currentTag = null;
> > >>     currentVal = null;
> > >>   }
> > >> //--------- new code ends -----------------------------
> > >> // case 1 : word value - splits into key-value on its own
> > >>   section.add(line.split("\\s+"));
> > >> }
> > >>
> > >> Cheers,
> > >>
> > >> Jolyon
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: biojava-l-bounces at lists.open-bio.org
> > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane
> > >> THOMAS-CHOLLIER
> > >> Sent: 12 April 2006 09:35
> > >> To: biojava-l at open-bio.org
> > >> Subject: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
> > >>
> > >> Hello again,
> > >>
> > >> I am currently using biojavax to parse EMBL files exported from
> Ensembl 
> > >> website.
> > >>
> > >> Compared to the EBI files I have, they show a difference in the
> Features
> > >>
> > >> lines :
> > >>
> > >> sometimes, only one "/word" is present. ie:
> > >>
> > >> EBI file :
> > >>
> > >> FT   gene            <1..>118
> > >> FT                   /gene="Hoxb9"
> > >> FT                   /note="Hoxb-9"
> > >>
> > >> Ensembl file;
> > >>
> > >> FT   gene         complement(1..3218)
> > >> FT                   /gene="ENSMUSG00000038227"
> > >>
> > >> The problem I encounter is that the parser correctly convert the
> "/word"
> > >>
> > >> into a Note, but the Note is then in relation with the immediate 
> > >> following feature (ie: mRNA).
> > >> The current gene feature thus has no annotation.
> > >>
> > >> This behavior is reproducible when removing one "/word" of an EBI
> file.
> > >>
> > >> Apart from this issue, I noted that Ensembl EMBL files uses "="
> inside a
> > >>
> > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends
> up 
> > >> with an incomplete Note, as the parser seems to split on "=" to
> separate
> > >>
> > >> the Key and the Value.
> > >>
> > >> Thanks for your help,
> > >>
> > >> Morgane.
> > >>
> > >>     
> > 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mthomasc at vub.ac.be  Thu Apr 20 12:30:10 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Thu, 20 Apr 2006 14:30:10 +0200
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Resolved]
In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com>
References: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com>
Message-ID: <44477ED2.2010200@vub.ac.be>

I've just updated my sources few minutes ago and everything works fine 
now (both annotations and split-on-equals problem).

I've tested both the EBI file and Ensembl file.

Thanks for fixing the problems !!

Cheers,

Morgane

Jolyon Holdstock wrote:
> No, I'll update my source.
>
> Thanks,
>
> Jolyon
>
>
> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
> Sent: 20 April 2006 13:16
> To: Jolyon Holdstock
> Cc: mthomas at dbm.ulb.ac.be; biojava-l at open-bio.org
> Subject: RE: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
>
> Did you use the latest CVS version? (I committed a change that I think
> should have fixed that about 1 minute before my previous email).
>
>
> On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote:
>   
>> I've run the sequence through the parser and it seems to work OK. I
>> iterate through the features and then iterate through the annotations
>>     
> of
>   
>> that feature
>>
>> Based on the input....
>>
>> FT   source          1..118
>> FT                   /organism="Triturus helveticus"
>> FT                   /mol_type="genomic DNA"
>> FT                   /clone="Thel.b9"
>> FT                   /db_xref="taxon:256425"
>> FT   gene            <1..>118
>> FT                   /gene="Hoxb9"
>> FT                   /note="Hoxb-9"
>> FT   mRNA            <1..>118
>> FT                   /gene="Hoxb9"
>> FT                   /product="HOXB9"
>> FT   CDS             <1..>118
>> FT                   /codon_start=2
>> FT                   /gene="Hoxb9"
>> FT                   /product="HOXB9"
>> FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
>> FT                   /protein_id="ABA39736.1"
>> FT
>> /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
>>
>> The output is....
>>
>> ========================================
>> Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118)
>> Note: (#0) biojavax:mol_type: genomic DNA
>> Note: (#1) biojavax:clone: Thel.b9
>> ========================================
>> Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>)
>> Note: (#2) biojavax:gene: Hoxb9
>> Note: (#3) biojavax:note: Hoxb-9
>> ========================================
>> Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>)
>> Note: (#4) biojavax:gene: Hoxb9
>> Note: (#5) biojavax:product: HOXB9
>> ========================================
>> Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>)
>> Note: (#6) biojavax:codon_start: 2
>> Note: (#7) biojavax:gene: Hoxb9
>> Note: (#8) biojavax:product: HOXB9
>> Note: (#9) biojavax:protein_id: ABA39736.1
>> Note: (#10) biojavax:translation:
>> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
>> Note: (#11) biojavax:translation:
>> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
>> =============================================
>>
>> This looks OK, the one thing I've just noticed is that the last piece
>>     
> of
>   
>> annotation of the last feature is assigned twice.
>>
>> Jolyon
>>
>>
>> -----Original Message-----
>> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
>> Sent: 20 April 2006 13:05
>> To: mthomas at dbm.ulb.ac.be
>> Cc: Jolyon Holdstock; biojava-l at open-bio.org
>> Subject: Re: [Biojava-l] [biojavax] EMBL parser : features
>> parsing[Scanned]
>>
>> Hi.
>>
>> I made some small changes to the code, although nothing that would fix
>> this kind of problem, committed it back to CVS, checked it out again,
>> compiled, and ran a test program that read in an EMBL file with the
>> feature table you describe below, and output it in EMBL format to
>> another file. I then compared the two files... and found no
>>     
> differences!
>   
>> The split-on-equals problem didn't occur, and all notes appeared
>> alongside their correct features.
>>
>> Could there be a problem maybe with the script you are using?
>>
>> I've really no idea what the problem is as I can't reproduce it based
>>     
> on
>   
>> the current CVS contents!
>>
>> cheers,
>> Richard
>>
>> On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
>>     
>>> Hi,
>>>
>>> I have tested today's version from CVS.
>>>
>>> Both EBI and Ensembl files now react the same way.
>>> The last annotation of a feature is nevertheless related to its 
>>> immediate following feature.
>>> e.g. :
>>>
>>> FT   gene            <1..>118
>>> FT                   /gene="Hoxb9"
>>> FT                   /note="Hoxb-9"
>>> FT   mRNA            <1..>118
>>> FT                   /gene="Hoxb9"
>>> FT                   /product="HOXB9"
>>> FT   CDS             <1..>118
>>>
>>> /note="Hoxb-9" is related to mRNA
>>> /product="HOXB9" is related to CDS
>>>
>>> Concerning the split-on-equals problem, I still observe the problem
>>>       
> :
>   
>>>  [(#2) biojavax:note: transcript_i]
>>>
>>> for this annotation :  /note="transcript_id=ENSMUST00000048680"
>>>
>>> Thanks for helping,
>>>
>>> Cheers,
>>>
>>> Morgane.
>>>
>>> Richard Holland wrote:
>>>       
>>>> I have committed an UNTESTED patch based on Jolyon's suggestion,
>>>>         
> and
>   
>>>> also attempted to fix the split-on-equals problem Morgane
>>>>         
> observed. 
>   
>>>> Please let me know if there are any problems with it.
>>>>
>>>> As this problem affected the UniProt parser in a similar manner
>>>>         
>> (much of
>>     
>>>> the code is identical), the same fixes were applied there too.
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
>>>>   
>>>>         
>>>>> Hi Morgane,
>>>>>
>>>>> I have amended the EmblFormat readSection method as below and the
>>>>> parsing seems to work; please test it.
>>>>>
>>>>> I think that the last bit of annotation is carried over into the
>>>>>           
>> next
>>     
>>>>> feature so before adding the new feature I dump the annotation
>>>>>           
> and
>   
>> reset
>>     
>>>>> currentTag and currentVal.
>>>>>
>>>>> if (!line.startsWith(" ")) {
>>>>> //--------- new code starts ---------------------------
>>>>>   if (currentTag!=null) {
>>>>>     section.add(new String[]{currentTag,currentVal.toString()});
>>>>>     currentTag = null;
>>>>>     currentVal = null;
>>>>>   }
>>>>> //--------- new code ends -----------------------------
>>>>> // case 1 : word value - splits into key-value on its own
>>>>>   section.add(line.split("\\s+"));
>>>>> }
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jolyon
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: biojava-l-bounces at lists.open-bio.org
>>>>> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>           
> Morgane
>   
>>>>> THOMAS-CHOLLIER
>>>>> Sent: 12 April 2006 09:35
>>>>> To: biojava-l at open-bio.org
>>>>> Subject: [Biojava-l] [biojavax] EMBL parser : features
>>>>>           
>> parsing[Scanned]
>>     
>>>>> Hello again,
>>>>>
>>>>> I am currently using biojavax to parse EMBL files exported from
>>>>>           
>> Ensembl 
>>     
>>>>> website.
>>>>>
>>>>> Compared to the EBI files I have, they show a difference in the
>>>>>           
>> Features
>>     
>>>>> lines :
>>>>>
>>>>> sometimes, only one "/word" is present. ie:
>>>>>
>>>>> EBI file :
>>>>>
>>>>> FT   gene            <1..>118
>>>>> FT                   /gene="Hoxb9"
>>>>> FT                   /note="Hoxb-9"
>>>>>
>>>>> Ensembl file;
>>>>>
>>>>> FT   gene         complement(1..3218)
>>>>> FT                   /gene="ENSMUSG00000038227"
>>>>>
>>>>> The problem I encounter is that the parser correctly convert the
>>>>>           
>> "/word"
>>     
>>>>> into a Note, but the Note is then in relation with the immediate 
>>>>> following feature (ie: mRNA).
>>>>> The current gene feature thus has no annotation.
>>>>>
>>>>> This behavior is reproducible when removing one "/word" of an EBI
>>>>>           
>> file.
>>     
>>>>> Apart from this issue, I noted that Ensembl EMBL files uses "="
>>>>>           
>> inside a
>>     
>>>>> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends
>>>>>           
>> up 
>>     
>>>>> with an incomplete Note, as the parser seems to split on "=" to
>>>>>           
>> separate
>>     
>>>>> the Key and the Value.
>>>>>
>>>>> Thanks for your help,
>>>>>
>>>>> Morgane.
>>>>>
>>>>>     
>>>>>           


-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc at vub.ac.be)

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium


From jolyon.holdstock at ogt.co.uk  Thu Apr 20 12:18:21 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Thu, 20 Apr 2006 13:18:21 +0100
Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned]
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com>

No, I'll update my source.

Thanks,

Jolyon


-----Original Message-----
From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
Sent: 20 April 2006 13:16
To: Jolyon Holdstock
Cc: mthomas at dbm.ulb.ac.be; biojava-l at open-bio.org
Subject: RE: [Biojava-l] [biojavax] EMBL parser : features
parsing[Scanned]

Did you use the latest CVS version? (I committed a change that I think
should have fixed that about 1 minute before my previous email).


On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote:
> I've run the sequence through the parser and it seems to work OK. I
> iterate through the features and then iterate through the annotations
of
> that feature
> 
> Based on the input....
> 
> FT   source          1..118
> FT                   /organism="Triturus helveticus"
> FT                   /mol_type="genomic DNA"
> FT                   /clone="Thel.b9"
> FT                   /db_xref="taxon:256425"
> FT   gene            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /note="Hoxb-9"
> FT   mRNA            <1..>118
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT   CDS             <1..>118
> FT                   /codon_start=2
> FT                   /gene="Hoxb9"
> FT                   /product="HOXB9"
> FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
> FT                   /protein_id="ABA39736.1"
> FT
> /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
> 
> The output is....
> 
> ========================================
> Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118)
> Note: (#0) biojavax:mol_type: genomic DNA
> Note: (#1) biojavax:clone: Thel.b9
> ========================================
> Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>)
> Note: (#2) biojavax:gene: Hoxb9
> Note: (#3) biojavax:note: Hoxb-9
> ========================================
> Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>)
> Note: (#4) biojavax:gene: Hoxb9
> Note: (#5) biojavax:product: HOXB9
> ========================================
> Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>)
> Note: (#6) biojavax:codon_start: 2
> Note: (#7) biojavax:gene: Hoxb9
> Note: (#8) biojavax:product: HOXB9
> Note: (#9) biojavax:protein_id: ABA39736.1
> Note: (#10) biojavax:translation:
> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
> Note: (#11) biojavax:translation:
> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW
> =============================================
> 
> This looks OK, the one thing I've just noticed is that the last piece
of
> annotation of the last feature is assigned twice.
> 
> Jolyon
> 
> 
> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] 
> Sent: 20 April 2006 13:05
> To: mthomas at dbm.ulb.ac.be
> Cc: Jolyon Holdstock; biojava-l at open-bio.org
> Subject: Re: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
> 
> Hi.
> 
> I made some small changes to the code, although nothing that would fix
> this kind of problem, committed it back to CVS, checked it out again,
> compiled, and ran a test program that read in an EMBL file with the
> feature table you describe below, and output it in EMBL format to
> another file. I then compared the two files... and found no
differences!
> The split-on-equals problem didn't occur, and all notes appeared
> alongside their correct features.
> 
> Could there be a problem maybe with the script you are using?
> 
> I've really no idea what the problem is as I can't reproduce it based
on
> the current CVS contents!
> 
> cheers,
> Richard
> 
> On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote:
> > Hi,
> > 
> > I have tested today's version from CVS.
> > 
> > Both EBI and Ensembl files now react the same way.
> > The last annotation of a feature is nevertheless related to its 
> > immediate following feature.
> > e.g. :
> > 
> > FT   gene            <1..>118
> > FT                   /gene="Hoxb9"
> > FT                   /note="Hoxb-9"
> > FT   mRNA            <1..>118
> > FT                   /gene="Hoxb9"
> > FT                   /product="HOXB9"
> > FT   CDS             <1..>118
> > 
> > /note="Hoxb-9" is related to mRNA
> > /product="HOXB9" is related to CDS
> > 
> > Concerning the split-on-equals problem, I still observe the problem
:
> > 
> >  [(#2) biojavax:note: transcript_i]
> > 
> > for this annotation :  /note="transcript_id=ENSMUST00000048680"
> > 
> > Thanks for helping,
> > 
> > Cheers,
> > 
> > Morgane.
> > 
> > Richard Holland wrote:
> > > I have committed an UNTESTED patch based on Jolyon's suggestion,
and
> > > also attempted to fix the split-on-equals problem Morgane
observed. 
> > >
> > > Please let me know if there are any problems with it.
> > >
> > > As this problem affected the UniProt parser in a similar manner
> (much of
> > > the code is identical), the same fixes were applied there too.
> > >
> > > cheers,
> > > Richard
> > >
> > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote:
> > >   
> > >> Hi Morgane,
> > >>
> > >> I have amended the EmblFormat readSection method as below and the
> > >> parsing seems to work; please test it.
> > >>
> > >> I think that the last bit of annotation is carried over into the
> next
> > >> feature so before adding the new feature I dump the annotation
and
> reset
> > >> currentTag and currentVal.
> > >>
> > >> if (!line.startsWith(" ")) {
> > >> //--------- new code starts ---------------------------
> > >>   if (currentTag!=null) {
> > >>     section.add(new String[]{currentTag,currentVal.toString()});
> > >>     currentTag = null;
> > >>     currentVal = null;
> > >>   }
> > >> //--------- new code ends -----------------------------
> > >> // case 1 : word value - splits into key-value on its own
> > >>   section.add(line.split("\\s+"));
> > >> }
> > >>
> > >> Cheers,
> > >>
> > >> Jolyon
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: biojava-l-bounces at lists.open-bio.org
> > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of
Morgane
> > >> THOMAS-CHOLLIER
> > >> Sent: 12 April 2006 09:35
> > >> To: biojava-l at open-bio.org
> > >> Subject: [Biojava-l] [biojavax] EMBL parser : features
> parsing[Scanned]
> > >>
> > >> Hello again,
> > >>
> > >> I am currently using biojavax to parse EMBL files exported from
> Ensembl 
> > >> website.
> > >>
> > >> Compared to the EBI files I have, they show a difference in the
> Features
> > >>
> > >> lines :
> > >>
> > >> sometimes, only one "/word" is present. ie:
> > >>
> > >> EBI file :
> > >>
> > >> FT   gene            <1..>118
> > >> FT                   /gene="Hoxb9"
> > >> FT                   /note="Hoxb-9"
> > >>
> > >> Ensembl file;
> > >>
> > >> FT   gene         complement(1..3218)
> > >> FT                   /gene="ENSMUSG00000038227"
> > >>
> > >> The problem I encounter is that the parser correctly convert the
> "/word"
> > >>
> > >> into a Note, but the Note is then in relation with the immediate 
> > >> following feature (ie: mRNA).
> > >> The current gene feature thus has no annotation.
> > >>
> > >> This behavior is reproducible when removing one "/word" of an EBI
> file.
> > >>
> > >> Apart from this issue, I noted that Ensembl EMBL files uses "="
> inside a
> > >>
> > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends
> up 
> > >> with an incomplete Note, as the parser seems to split on "=" to
> separate
> > >>
> > >> the Key and the Value.
> > >>
> > >> Thanks for your help,
> > >>
> > >> Morgane.
> > >>
> > >>     
> > 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


From mark.schreiber at novartis.com  Tue Apr 25 06:07:59 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 25 Apr 2006 14:07:59 +0800
Subject: [Biojava-l] Pairwise Alignment
Message-ID: <OF646C368F.AE5F0486-ON4825715B.00212D15-4825715B.0021B0D4@EU.novartis.net>

Hi -

The appropriate classes for SW and NW pairwise alignment are in the 
org.biojava.bio.alignment package in the CVS (see 
http://code.open-bio.org/cgi/viewcvs.cgi/biojava-live/src/org/biojava/bio/alignment/?cvsroot=biojava).

While SW and NW are simple they are not as flexible as the pairwise 
architectures that can be made with HMMs. For a standard pairwise 
alignment I would think that the SW and NW algorithms are fine.

I'm not sure about comparative speed or memory requirements.

- Mark


Joanna Sharman <J.L.Sharman at sms.ed.ac.uk>
Sent by: biojava-l-bounces at lists.open-bio.org
04/19/2006 05:35 PM

 
        To:     biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Pairwise Alignment


Hello,

I'm new to BioJava so I'm sorry if this question has been asked several
times before.

This is actually sort of in reply to this message from last month:

http://lists.open-bio.org/pipermail/biojava-l/2006-March/005365.html

I'd like to perform a simple pairwise alignment using the
Smith-Waterman class I saw described here:

http://www.biojava.org/wiki/BioJava:CookBook:DP:PairWise2

but I can't find the classes it mentions anywhere on the cvs.  Can you
point me to where they are?

Also, I'm just wondering why the HMM method is preferred to the
Smith-Waterman (or others)?  It seems quite complicated to me, and like
it might require more memory, or am I wrong? :)

Cheers,
Joanna

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From e.willighagen at science.ru.nl  Wed Apr 26 16:03:47 2006
From: e.willighagen at science.ru.nl (Egon Willighagen)
Date: Wed, 26 Apr 2006 18:03:47 +0200
Subject: [Biojava-l] org.biojava.bio.gui.glyph classes?
Message-ID: <200604261803.47333.e.willighagen@science.ru.nl>


Hi all, 

in the wiki I saw mention of the org.biojava.bio.gui.glyph package, which does 
not seem to be part of BioJava 1.4.

Where can I download the code classes in that package?

Egon

-- 
Radboud University Nijmegen
http://www.cac.science.ru.nl/
blog: http://chem-bla-ics.blogspot.com/


From mark.schreiber at novartis.com  Thu Apr 27 01:14:38 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 27 Apr 2006 09:14:38 +0800
Subject: [Biojava-l] org.biojava.bio.gui.glyph classes?
Message-ID: <OFEC3139B0.EDAB2C44-ON4825715D.0006BCE4-4825715D.0006D569@EU.novartis.net>

Hi -

They are in biojava-live, which is the development version available for 
download via cvs. Take a look at the instructions on www.biojava.org.

- Mark


Egon Willighagen <e.willighagen at science.ru.nl>
Sent by: biojava-l-bounces at lists.open-bio.org
04/27/2006 12:03 AM

 
        To:     biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] org.biojava.bio.gui.glyph classes?


Hi all, 

in the wiki I saw mention of the org.biojava.bio.gui.glyph package, which 
does 
not seem to be part of BioJava 1.4.

Where can I download the code classes in that package?

Egon

-- 
Radboud University Nijmegen
http://www.cac.science.ru.nl/
blog: http://chem-bla-ics.blogspot.com/
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From heatkent at gmail.com  Wed Apr 26 23:22:46 2006
From: heatkent at gmail.com (Heather Kent)
Date: Wed, 26 Apr 2006 18:22:46 -0500
Subject: [Biojava-l] chromatogram viewer
Message-ID: <de8b3c810604261622s6af56c5aoc6fa4456607d4a9@mail.gmail.com>

I'm wondering if anyone can help me locate some source code for swing
components involved in viewing chromatograms, i read a 2003 forum from
biojava where Rhett Sutphin mentioned he would make some source code for a
chromatogram viewer (using the chromatogramgraphic class) available but i
cant seem to find it anywhere....im trying to fashion some scroll bars for
my chromatogram viewer that function to scroll through the image,  as well
as vertically and horizontally scale the chromatgram....i have some code
from an old viewer that will perform all these functions but doesnt use any
of the biojava classes or swing components....

thanx
heather


From russ at kepler-eng.com  Thu Apr 27 04:24:19 2006
From: russ at kepler-eng.com (Russ Kepler)
Date: Wed, 26 Apr 2006 22:24:19 -0600
Subject: [Biojava-l] chromatogram viewer
In-Reply-To: <de8b3c810604261622s6af56c5aoc6fa4456607d4a9@mail.gmail.com>
References: <de8b3c810604261622s6af56c5aoc6fa4456607d4a9@mail.gmail.com>
Message-ID: <200604262224.19525.russ@kepler-eng.com>

On Wednesday 26 April 2006 05:22 pm, Heather Kent wrote:
> I'm wondering if anyone can help me locate some source code for swing
> components involved in viewing chromatograms, i read a 2003 forum from
> biojava where Rhett Sutphin mentioned he would make some source code for a
> chromatogram viewer (using the chromatogramgraphic class) available but i
> cant seem to find it anywhere....im trying to fashion some scroll bars for
> my chromatogram viewer that function to scroll through the image,  as well
> as vertically and horizontally scale the chromatgram....i have some code
> from an old viewer that will perform all these functions but doesnt use any
> of the biojava classes or swing components....

There's org.biojava.bio.gui.sequence.ABITraceRenderer with demo code in 
seqviewer.TraceViewer  

It should give you a start.


From n.haigh at sheffield.ac.uk  Thu Apr 27 13:48:59 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 14:48:59 +0100
Subject: [Biojava-l] Sun One Studio+Biojava
Message-ID: <002301c66a01$5637d910$9f5ea78f@bmbpc196>

I?m totally new to Java and Biojava as I'm trying to defect from Bioperl!
I'm trying to use Sun One Studio for editing my java files - at least
initially. I don't know how to setup Sun One Studio to find my
biojava-1.4.jar file, I'm not even sure how to test if it can find it
correctly. Any help on these issues would be gratefully received. As I said
I'm a newbie - bear with me!

Cheers
Nathan

----------------------------------------------------------------------------
------
Dr. Nathan S. Haigh
Bioinformatics PostDoctoral Research Associate
?
Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22
20112
Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533
569
University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22
20002
Western Bank???????????????????????????? ?????? ?????? Web:
www.bioinf.shef.ac.uk
Sheffield??????????????????????????????? ??????
www.petraea.shef.ac.uk
S10 2TN????????????????????????????????? ?????? 	
----------------------------------------------------------------------------
------

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 14:48:56
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From richard.holland at ebi.ac.uk  Thu Apr 27 14:51:23 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 27 Apr 2006 15:51:23 +0100
Subject: [Biojava-l] Sun One Studio+Biojava
In-Reply-To: <002301c66a01$5637d910$9f5ea78f@bmbpc196>
References: <002301c66a01$5637d910$9f5ea78f@bmbpc196>
Message-ID: <1146149483.3955.7.camel@texas.ebi.ac.uk>

Sun One Studio is built on NetBeans, which is what I use to develop bits
of BioJava with, so I think what works for me should work for you. Here
goes...:

If you are working with BioJava in apps you are developing yourself, you
need to set up BioJava as a library in NetBeans. Do this by going to the
Library Manager (Tools menu), creating a new library called BioJava,
then using the buttons provided to locate and add the biojava-1.4.jar
file to the library. You can then associate this library with any
project you are working on by right-clicking on that project, choosing
Properties, then click on Libraries in the tree on the left of the
window that appears and use this to add the BioJava library.

If you are intending to develop BioJava itself, you need to check out
the entire biojava-live project from CVS. You can then set up
development in NetBeans by creating a "new project from existing Ant
script", and telling it where the build.xml file can be found within the
BioJava project. It'll do the rest for you. 

Hope this helps.

cheers,
Richard

On Thu, 2006-04-27 at 14:48 +0100, Nathan S. Haigh wrote:
> I?m totally new to Java and Biojava as I'm trying to defect from Bioperl!
> I'm trying to use Sun One Studio for editing my java files - at least
> initially. I don't know how to setup Sun One Studio to find my
> biojava-1.4.jar file, I'm not even sure how to test if it can find it
> correctly. Any help on these issues would be gratefully received. As I said
> I'm a newbie - bear with me!
> 
> Cheers
> Nathan
> 
> ----------------------------------------------------------------------------
> ------
> Dr. Nathan S. Haigh
> Bioinformatics PostDoctoral Research Associate
>  
> Room B2 211                                            Tel: +44 (0)114 22
> 20112
> Department of Animal and Plant Sciences                Mob: +44 (0)7742 533
> 569
> University of Sheffield                                Fax: +44 (0)114 22
> 20002
> Western Bank                                           Web:
> www.bioinf.shef.ac.uk
> Sheffield                                      
> www.petraea.shef.ac.uk
> S10 2TN                                         	
> ----------------------------------------------------------------------------
> ------
> 
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0615-2, 12/04/2006
> Tested on: 27/04/2006 14:48:56
> avast! - copyright (c) 1988-2006 ALWIL Software.
> http://www.avast.com
> 
> 
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From n.haigh at sheffield.ac.uk  Thu Apr 27 15:01:56 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 16:01:56 +0100
Subject: [Biojava-l] Sun One Studio+Biojava
In-Reply-To: <1146149483.3955.7.camel@texas.ebi.ac.uk>
Message-ID: <003601c66a0b$86b289f0$9f5ea78f@bmbpc196>

Thanks for the info - the fog is starting to lift! :o)

I think I'll leave actual Biojava development for now - see how I go with
actually learning Java first :o) I have a steep learning curve, as I have an
application written in Perl which I use Bioperl modules and Perl/Tk for the
GUI. So I'm trying to rewrite this application in Java while trying to think
about OO programming.....i'm sure I'll send some really simple questions to
the list over the coming weeks/months, but hopefully there won't be too many
nightmares along the way!

Thanks
Nathan

> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> Sent: 27 April 2006 15:51
> To: n.haigh at sheffield.ac.uk
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] Sun One Studio+Biojava
> 
> Sun One Studio is built on NetBeans, which is what I use to develop bits
> of BioJava with, so I think what works for me should work for you. Here
> goes...:
> 
> If you are working with BioJava in apps you are developing yourself, you
> need to set up BioJava as a library in NetBeans. Do this by going to the
> Library Manager (Tools menu), creating a new library called BioJava,
> then using the buttons provided to locate and add the biojava-1.4.jar
> file to the library. You can then associate this library with any
> project you are working on by right-clicking on that project, choosing
> Properties, then click on Libraries in the tree on the left of the
> window that appears and use this to add the BioJava library.
> 
> If you are intending to develop BioJava itself, you need to check out
> the entire biojava-live project from CVS. You can then set up
> development in NetBeans by creating a "new project from existing Ant
> script", and telling it where the build.xml file can be found within the
> BioJava project. It'll do the rest for you.
> 
> Hope this helps.
> 
> cheers,
> Richard
> 
> On Thu, 2006-04-27 at 14:48 +0100, Nathan S. Haigh wrote:
> > I'm totally new to Java and Biojava as I'm trying to defect from
> Bioperl!
> > I'm trying to use Sun One Studio for editing my java files - at least
> > initially. I don't know how to setup Sun One Studio to find my
> > biojava-1.4.jar file, I'm not even sure how to test if it can find it
> > correctly. Any help on these issues would be gratefully received. As I
> said
> > I'm a newbie - bear with me!
> >
> > Cheers
> > Nathan
> >
> > ------------------------------------------------------------------------
> ----
> > ------
> > Dr. Nathan S. Haigh
> > Bioinformatics PostDoctoral Research Associate
> >
> > Room B2 211                                            Tel: +44 (0)114
> 22
> > 20112
> > Department of Animal and Plant Sciences                Mob: +44 (0)7742
> 533
> > 569
> > University of Sheffield                                Fax: +44 (0)114
> 22
> > 20002
> > Western Bank                                           Web:
> > www.bioinf.shef.ac.uk
> > Sheffield
> > www.petraea.shef.ac.uk
> > S10 2TN
> > ------------------------------------------------------------------------
> ----
> > ------
> >
> > ---
> > avast! Antivirus: Outbound message clean.
> > Virus Database (VPS): 0615-2, 12/04/2006
> > Tested on: 27/04/2006 14:48:56
> > avast! - copyright (c) 1988-2006 ALWIL Software.
> > http://www.avast.com
> >
> >
> >
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> --
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 16:00:23
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From n.haigh at sheffield.ac.uk  Thu Apr 27 15:12:34 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 16:12:34 +0100
Subject: [Biojava-l] Creating my own classes
Message-ID: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196>

I?m trying to learn/think about OO programming as I?m learning Java and port
a Perl app into Java ? could you tell me if this sounds reasonable for
writing some of my own classes!?

My application essentially defines sets of positions from an alignment - I
call them CHARSETs as they are analogous to CHARSETs in the Nexus file
format. I believe in Biojava the Locations object/interface (sorry, not
familiar enough with correct terminology yet) is essentially the same sort
of thing. In my app, the user can use several approaches to define a CHARSET
e.g. a CHARSET containing just invariable sites, or a CHARSET containing
sites above a given % identity.

My question is this, if I were to create a class called Charset, and I
create several subclasses called e.g. Invariable etc is this reasonable? Or
should the class Charset contain many methods for creating a different type
of CHARSET?

In my app, a CHARSET needs to be associated with a particular alignment, and
settings used to define the CHARSET, so my Charset class have variables such
as an Alignment object, Locations objects etc. I?d like to write a method
that returns a subalignment based on the CHARSETs associated alignment
object and Locations object but I?m not sure how to do this.

Thanks for any help/comments/corrections/critiques
Nathan


----------------------------------------------------------------------------
------
Dr. Nathan S. Haigh
Bioinformatics PostDoctoral Research Associate
?
Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22
20112
Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533
569
University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22
20002
Western Bank???????????????????????????? ?????? ?????? Web:
www.bioinf.shef.ac.uk
Sheffield??????????????????????????????? ??????
www.petraea.shef.ac.uk
S10 2TN????????????????????????????????? ?????? 	
----------------------------------------------------------------------------
------

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 16:12:34
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From richard.holland at ebi.ac.uk  Thu Apr 27 15:36:51 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 27 Apr 2006 16:36:51 +0100
Subject: [Biojava-l] Creating my own classes
In-Reply-To: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196>
References: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196>
Message-ID: <1146152212.3955.24.camel@texas.ebi.ac.uk>

On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> My application essentially defines sets of positions from an alignment - I
> call them CHARSETs as they are analogous to CHARSETs in the Nexus file
> format. I believe in Biojava the Locations object/interface (sorry, not
> familiar enough with correct terminology yet) is essentially the same sort
> of thing. In my app, the user can use several approaches to define a CHARSET
> e.g. a CHARSET containing just invariable sites, or a CHARSET containing
> sites above a given % identity.

You'd be right there. A Location in BioJava represents a range of
positions.

> My question is this, if I were to create a class called Charset, and I
> create several subclasses called e.g. Invariable etc is this reasonable? Or
> should the class Charset contain many methods for creating a different type
> of CHARSET?

My suggestion would be create an interface called Charset, which defines
behaviour which you expect all types of Charset to exhibit. Then,
implement a number of classes which implement this interface, one for
each type of Charset you have, which each add their own methods or
special behaviour. If a lot of the behaviour is common, you can define
an abstract class called something like AbstractCharset which defines
this common behaviour, and have the others extend it.

> In my app, a CHARSET needs to be associated with a particular alignment, and
> settings used to define the CHARSET, so my Charset class have variables such
> as an Alignment object, Locations objects etc. I?d like to write a method
> that returns a subalignment based on the CHARSETs associated alignment
> object and Locations object but I?m not sure how to do this.

BioJava Alignment objects implement the SymbolList interface, which
means you can use all the methods from SymbolList to work with the
Alignment, including the subList() method.

cheers,
Richard

-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From n.haigh at sheffield.ac.uk  Thu Apr 27 15:44:05 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 16:44:05 +0100
Subject: [Biojava-l] Creating my own classes
In-Reply-To: <1146152212.3955.24.camel@texas.ebi.ac.uk>
Message-ID: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196>

Thanks Richard,

I'll think about this and try to do some deciphering. The only thing I'm in
need of help for is possibly some actual code that would take an Alignment
object and return a subalignment based on the positions specified in a
Locations object - it's difficult to make sense of a new language until you
start to pick up some of the basics.

Thanks
Nathan

> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> Sent: 27 April 2006 16:37
> To: n.haigh at sheffield.ac.uk
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] Creating my own classes
> 
> On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> > My application essentially defines sets of positions from an alignment -
> I
> > call them CHARSETs as they are analogous to CHARSETs in the Nexus file
> > format. I believe in Biojava the Locations object/interface (sorry, not
> > familiar enough with correct terminology yet) is essentially the same
> sort
> > of thing. In my app, the user can use several approaches to define a
> CHARSET
> > e.g. a CHARSET containing just invariable sites, or a CHARSET containing
> > sites above a given % identity.
> 
> You'd be right there. A Location in BioJava represents a range of
> positions.
> 
> > My question is this, if I were to create a class called Charset, and I
> > create several subclasses called e.g. Invariable etc is this reasonable?
> Or
> > should the class Charset contain many methods for creating a different
> type
> > of CHARSET?
> 
> My suggestion would be create an interface called Charset, which defines
> behaviour which you expect all types of Charset to exhibit. Then,
> implement a number of classes which implement this interface, one for
> each type of Charset you have, which each add their own methods or
> special behaviour. If a lot of the behaviour is common, you can define
> an abstract class called something like AbstractCharset which defines
> this common behaviour, and have the others extend it.
> 
> > In my app, a CHARSET needs to be associated with a particular alignment,
> and
> > settings used to define the CHARSET, so my Charset class have variables
> such
> > as an Alignment object, Locations objects etc. I'd like to write a
> method
> > that returns a subalignment based on the CHARSETs associated alignment
> > object and Locations object but I'm not sure how to do this.
> 
> BioJava Alignment objects implement the SymbolList interface, which
> means you can use all the methods from SymbolList to work with the
> Alignment, including the subList() method.
> 
> cheers,
> Richard
> 
> --
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 16:44:04
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From richard.holland at ebi.ac.uk  Thu Apr 27 15:55:39 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 27 Apr 2006 16:55:39 +0100
Subject: [Biojava-l] Creating my own classes
In-Reply-To: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196>
References: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196>
Message-ID: <1146153339.3955.30.camel@texas.ebi.ac.uk>

Given some existing Location object (let's called it 'loc'), and an
existing Alignment (hypothetically called 'algn'), you can do this:

	// Obtain the labels of all the sequences in the alignment.
	Set labels = new HashSet(); 
	labels.addAll(algn.getLabels());
	// Obtain a sub-alignment including all the sequences in the 
 	// original alignment.
        Alignment subAlignment = algn.subAlignment(labels, loc);

cheers,
Richard


On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote:
> Thanks Richard,
> 
> I'll think about this and try to do some deciphering. The only thing I'm in
> need of help for is possibly some actual code that would take an Alignment
> object and return a subalignment based on the positions specified in a
> Locations object - it's difficult to make sense of a new language until you
> start to pick up some of the basics.
> 
> Thanks
> Nathan
> 
> > -----Original Message-----
> > From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> > Sent: 27 April 2006 16:37
> > To: n.haigh at sheffield.ac.uk
> > Cc: biojava-l at lists.open-bio.org
> > Subject: Re: [Biojava-l] Creating my own classes
> > 
> > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> > > My application essentially defines sets of positions from an alignment -
> > I
> > > call them CHARSETs as they are analogous to CHARSETs in the Nexus file
> > > format. I believe in Biojava the Locations object/interface (sorry, not
> > > familiar enough with correct terminology yet) is essentially the same
> > sort
> > > of thing. In my app, the user can use several approaches to define a
> > CHARSET
> > > e.g. a CHARSET containing just invariable sites, or a CHARSET containing
> > > sites above a given % identity.
> > 
> > You'd be right there. A Location in BioJava represents a range of
> > positions.
> > 
> > > My question is this, if I were to create a class called Charset, and I
> > > create several subclasses called e.g. Invariable etc is this reasonable?
> > Or
> > > should the class Charset contain many methods for creating a different
> > type
> > > of CHARSET?
> > 
> > My suggestion would be create an interface called Charset, which defines
> > behaviour which you expect all types of Charset to exhibit. Then,
> > implement a number of classes which implement this interface, one for
> > each type of Charset you have, which each add their own methods or
> > special behaviour. If a lot of the behaviour is common, you can define
> > an abstract class called something like AbstractCharset which defines
> > this common behaviour, and have the others extend it.
> > 
> > > In my app, a CHARSET needs to be associated with a particular alignment,
> > and
> > > settings used to define the CHARSET, so my Charset class have variables
> > such
> > > as an Alignment object, Locations objects etc. I'd like to write a
> > method
> > > that returns a subalignment based on the CHARSETs associated alignment
> > > object and Locations object but I'm not sure how to do this.
> > 
> > BioJava Alignment objects implement the SymbolList interface, which
> > means you can use all the methods from SymbolList to work with the
> > Alignment, including the subList() method.
> > 
> > cheers,
> > Richard
> > 
> > --
> > Richard Holland (BioMart Team)
> > EMBL-EBI
> > Wellcome Trust Genome Campus
> > Hinxton
> > Cambridge CB10 1SD
> > UNITED KINGDOM
> > Tel: +44-(0)1223-494416
> 
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0615-2, 12/04/2006
> Tested on: 27/04/2006 16:44:04
> avast! - copyright (c) 1988-2006 ALWIL Software.
> http://www.avast.com
> 
> 
> 
> 
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From n.haigh at sheffield.ac.uk  Thu Apr 27 16:00:09 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 27 Apr 2006 17:00:09 +0100
Subject: [Biojava-l] Creating my own classes
In-Reply-To: <1146153339.3955.30.camel@texas.ebi.ac.uk>
Message-ID: <000d01c66a13$a8b51380$9f5ea78f@bmbpc196>

Fantastic stuff - again, I'll look into this over the coming weeks (I
actually have annual leave for a week, so my flurry of e-mail will have to
stop for now.

Thanks again!
Nathan

> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> Sent: 27 April 2006 16:56
> To: n.haigh at sheffield.ac.uk
> Cc: biojava-l at lists.open-bio.org
> Subject: RE: [Biojava-l] Creating my own classes
> 
> Given some existing Location object (let's called it 'loc'), and an
> existing Alignment (hypothetically called 'algn'), you can do this:
> 
> 	// Obtain the labels of all the sequences in the alignment.
> 	Set labels = new HashSet();
> 	labels.addAll(algn.getLabels());
> 	// Obtain a sub-alignment including all the sequences in the
>  	// original alignment.
>         Alignment subAlignment = algn.subAlignment(labels, loc);
> 
> cheers,
> Richard
> 
> 
> On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote:
> > Thanks Richard,
> >
> > I'll think about this and try to do some deciphering. The only thing I'm
> in
> > need of help for is possibly some actual code that would take an
> Alignment
> > object and return a subalignment based on the positions specified in a
> > Locations object - it's difficult to make sense of a new language until
> you
> > start to pick up some of the basics.
> >
> > Thanks
> > Nathan
> >
> > > -----Original Message-----
> > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> > > Sent: 27 April 2006 16:37
> > > To: n.haigh at sheffield.ac.uk
> > > Cc: biojava-l at lists.open-bio.org
> > > Subject: Re: [Biojava-l] Creating my own classes
> > >
> > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> > > > My application essentially defines sets of positions from an
> alignment -
> > > I
> > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus
> file
> > > > format. I believe in Biojava the Locations object/interface (sorry,
> not
> > > > familiar enough with correct terminology yet) is essentially the
> same
> > > sort
> > > > of thing. In my app, the user can use several approaches to define a
> > > CHARSET
> > > > e.g. a CHARSET containing just invariable sites, or a CHARSET
> containing
> > > > sites above a given % identity.
> > >
> > > You'd be right there. A Location in BioJava represents a range of
> > > positions.
> > >
> > > > My question is this, if I were to create a class called Charset, and
> I
> > > > create several subclasses called e.g. Invariable etc is this
> reasonable?
> > > Or
> > > > should the class Charset contain many methods for creating a
> different
> > > type
> > > > of CHARSET?
> > >
> > > My suggestion would be create an interface called Charset, which
> defines
> > > behaviour which you expect all types of Charset to exhibit. Then,
> > > implement a number of classes which implement this interface, one for
> > > each type of Charset you have, which each add their own methods or
> > > special behaviour. If a lot of the behaviour is common, you can define
> > > an abstract class called something like AbstractCharset which defines
> > > this common behaviour, and have the others extend it.
> > >
> > > > In my app, a CHARSET needs to be associated with a particular
> alignment,
> > > and
> > > > settings used to define the CHARSET, so my Charset class have
> variables
> > > such
> > > > as an Alignment object, Locations objects etc. I'd like to write a
> > > method
> > > > that returns a subalignment based on the CHARSETs associated
> alignment
> > > > object and Locations object but I'm not sure how to do this.
> > >
> > > BioJava Alignment objects implement the SymbolList interface, which
> > > means you can use all the methods from SymbolList to work with the
> > > Alignment, including the subList() method.
> > >
> > > cheers,
> > > Richard
> > >
> > > --
> > > Richard Holland (BioMart Team)
> > > EMBL-EBI
> > > Wellcome Trust Genome Campus
> > > Hinxton
> > > Cambridge CB10 1SD
> > > UNITED KINGDOM
> > > Tel: +44-(0)1223-494416
> >
> > ---
> > avast! Antivirus: Outbound message clean.
> > Virus Database (VPS): 0615-2, 12/04/2006
> > Tested on: 27/04/2006 16:44:04
> > avast! - copyright (c) 1988-2006 ALWIL Software.
> > http://www.avast.com
> >
> >
> >
> >
> >
> --
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 17:00:06
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From david at autohandle.com  Thu Apr 27 17:10:08 2006
From: david at autohandle.com (David Scott)
Date: Thu, 27 Apr 2006 10:10:08 -0700
Subject: [Biojava-l] hibernate-xml mapping
Message-ID: <4450FAF0.9070206@autohandle.com>

what is the xml mapping in the hibernate files based on?


From mark.schreiber at novartis.com  Fri Apr 28 02:05:44 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 28 Apr 2006 10:05:44 +0800
Subject: [Biojava-l] Creating my own classes
Message-ID: <OF27AEDEFF.D5D8985D-ON4825715E.000B5C4C-4825715E.000B8305@EU.novartis.net>

An excellent book on OO and Java is Thinking in Java by Bruce Eckell. If 
you come from a C or Perl background it will change the way you think 
about programming.

You can get online versions for free, most good bookstores have hardcopies 
as well.

- Mark


"Nathan S. Haigh" <n.haigh at sheffield.ac.uk>
Sent by: biojava-l-bounces at lists.open-bio.org
04/28/2006 12:00 AM
Please respond to n.haigh

 
        To:     "'Richard Holland'" <richard.holland at ebi.ac.uk>
        cc:     biojava-l at lists.open-bio.org, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Creating my own classes


Fantastic stuff - again, I'll look into this over the coming weeks (I
actually have annual leave for a week, so my flurry of e-mail will have to
stop for now.

Thanks again!
Nathan

> -----Original Message-----
> From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> Sent: 27 April 2006 16:56
> To: n.haigh at sheffield.ac.uk
> Cc: biojava-l at lists.open-bio.org
> Subject: RE: [Biojava-l] Creating my own classes
> 
> Given some existing Location object (let's called it 'loc'), and an
> existing Alignment (hypothetically called 'algn'), you can do this:
> 
>                // Obtain the labels of all the sequences in the 
alignment.
>                Set labels = new HashSet();
>                labels.addAll(algn.getLabels());
>                // Obtain a sub-alignment including all the sequences in 
the
>                // original alignment.
>         Alignment subAlignment = algn.subAlignment(labels, loc);
> 
> cheers,
> Richard
> 
> 
> On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote:
> > Thanks Richard,
> >
> > I'll think about this and try to do some deciphering. The only thing 
I'm
> in
> > need of help for is possibly some actual code that would take an
> Alignment
> > object and return a subalignment based on the positions specified in a
> > Locations object - it's difficult to make sense of a new language 
until
> you
> > start to pick up some of the basics.
> >
> > Thanks
> > Nathan
> >
> > > -----Original Message-----
> > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk]
> > > Sent: 27 April 2006 16:37
> > > To: n.haigh at sheffield.ac.uk
> > > Cc: biojava-l at lists.open-bio.org
> > > Subject: Re: [Biojava-l] Creating my own classes
> > >
> > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote:
> > > > My application essentially defines sets of positions from an
> alignment -
> > > I
> > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus
> file
> > > > format. I believe in Biojava the Locations object/interface 
(sorry,
> not
> > > > familiar enough with correct terminology yet) is essentially the
> same
> > > sort
> > > > of thing. In my app, the user can use several approaches to define 
a
> > > CHARSET
> > > > e.g. a CHARSET containing just invariable sites, or a CHARSET
> containing
> > > > sites above a given % identity.
> > >
> > > You'd be right there. A Location in BioJava represents a range of
> > > positions.
> > >
> > > > My question is this, if I were to create a class called Charset, 
and
> I
> > > > create several subclasses called e.g. Invariable etc is this
> reasonable?
> > > Or
> > > > should the class Charset contain many methods for creating a
> different
> > > type
> > > > of CHARSET?
> > >
> > > My suggestion would be create an interface called Charset, which
> defines
> > > behaviour which you expect all types of Charset to exhibit. Then,
> > > implement a number of classes which implement this interface, one 
for
> > > each type of Charset you have, which each add their own methods or
> > > special behaviour. If a lot of the behaviour is common, you can 
define
> > > an abstract class called something like AbstractCharset which 
defines
> > > this common behaviour, and have the others extend it.
> > >
> > > > In my app, a CHARSET needs to be associated with a particular
> alignment,
> > > and
> > > > settings used to define the CHARSET, so my Charset class have
> variables
> > > such
> > > > as an Alignment object, Locations objects etc. I'd like to write a
> > > method
> > > > that returns a subalignment based on the CHARSETs associated
> alignment
> > > > object and Locations object but I'm not sure how to do this.
> > >
> > > BioJava Alignment objects implement the SymbolList interface, which
> > > means you can use all the methods from SymbolList to work with the
> > > Alignment, including the subList() method.
> > >
> > > cheers,
> > > Richard
> > >
> > > --
> > > Richard Holland (BioMart Team)
> > > EMBL-EBI
> > > Wellcome Trust Genome Campus
> > > Hinxton
> > > Cambridge CB10 1SD
> > > UNITED KINGDOM
> > > Tel: +44-(0)1223-494416
> >
> > ---
> > avast! Antivirus: Outbound message clean.
> > Virus Database (VPS): 0615-2, 12/04/2006
> > Tested on: 27/04/2006 16:44:04
> > avast! - copyright (c) 1988-2006 ALWIL Software.
> > http://www.avast.com
> >
> >
> >
> >
> >
> --
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0615-2, 12/04/2006
Tested on: 27/04/2006 17:00:06
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Fri Apr 28 02:06:31 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 28 Apr 2006 10:06:31 +0800
Subject: [Biojava-l] hibernate-xml mapping
Message-ID: <OF76803BFB.0A810F33-ON4825715E.000B8871-4825715E.000B9557@EU.novartis.net>

It is based on the BioSQL schema

- Mark


David Scott <david at autohandle.com>
Sent by: biojava-l-bounces at lists.open-bio.org
04/28/2006 01:10 AM

 
        To:     Biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] hibernate-xml mapping


what is the xml mapping in the hibernate files based on?

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From ilhami.visne at gmail.com  Fri Apr 28 09:09:56 2006
From: ilhami.visne at gmail.com (Ilhami Visne)
Date: Fri, 28 Apr 2006 11:09:56 +0200
Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi
Message-ID: <ce6b4d120604280209y5d82c954s417116f8c1c93a29@mail.gmail.com>

i got a file in fasta format, which is not encoded in ansi. but it seems ok.
it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta
i tried to read it with SeqIOTools.readFastaDNA and this exception was
thrown:

org.biojava.bio.BioException: Could not read sequence
    at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java
:104)
..............
..............
Caused by: java.io.IOException: Stream does not appear to contain FASTA
formatted data: ??>
org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112)
 at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:101)

"??>" there is no row like this but it seems it is hidden.

How should i handle such files?

thax in advance.


From richard.holland at ebi.ac.uk  Fri Apr 28 10:37:35 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 28 Apr 2006 11:37:35 +0100
Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi
In-Reply-To: <ce6b4d120604280209y5d82c954s417116f8c1c93a29@mail.gmail.com>
References: <ce6b4d120604280209y5d82c954s417116f8c1c93a29@mail.gmail.com>
Message-ID: <1146220656.3955.46.camel@texas.ebi.ac.uk>

I've no idea what binary format that file is in - it contains some very
strange characters. It appears to contain _some_ ANSI data but with
extra binary bits added to the start and end. I think you need to check
the program that generated the file as it is obviously not doing what it
is supposed to.

Your best bet is to convert the file to ANSI or some other format
understood out-of-the-box by Java.

cheers,
Richard

On Fri, 2006-04-28 at 11:09 +0200, Ilhami Visne wrote:
> i got a file in fasta format, which is not encoded in ansi. but it seems ok.
> it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta
> i tried to read it with SeqIOTools.readFastaDNA and this exception was
> thrown:
> 
> org.biojava.bio.BioException: Could not read sequence
>     at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java
> :104)
> ..............
> ..............
> Caused by: java.io.IOException: Stream does not appear to contain FASTA
> formatted data: ??>
> org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112)
>  at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:101)
> 
> "??>" there is no row like this but it seems it is hidden.
> 
> How should i handle such files?
> 
> thax in advance.
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From ilhami.visne at gmail.com  Fri Apr 28 09:29:07 2006
From: ilhami.visne at gmail.com (Ilhami Visne)
Date: Fri, 28 Apr 2006 11:29:07 +0200
Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi
Message-ID: <ce6b4d120604280229r53717f8dhfc809deb7d2cee21@mail.gmail.com>

i got a file in fasta format, which is not encoded in ansi. but it seems ok.
it can be downloaded here:
http://stud3.tuwien.ac.at/~e0125935/try3.fasta<http://stud3.tuwien.ac.at/%7Ee0125935/try3.fasta>
i tried to read it with SeqIOTools.readFastaDNA and this exception was
thrown:

org.biojava.bio.BioException: Could not read sequence
    at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java
:104)
..............
..............
Caused by: java.io.IOException: Stream does not appear to contain FASTA
formatted data: ??>
org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112)
 at org.biojava.bio.seq.io.StreamReader.nextSequence (StreamReader.java:101)

"??>" there is no row like this but it seems it is hidden.

How should i handle such files?

thax in advance.


From richard.holland at ebi.ac.uk  Fri Apr 28 13:19:30 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 28 Apr 2006 14:19:30 +0100
Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi
In-Reply-To: <ce6b4d120604280600q2be638f8mfac84220420f25d5@mail.gmail.com>
References: <ce6b4d120604280209y5d82c954s417116f8c1c93a29@mail.gmail.com>
	<1146220656.3955.46.camel@texas.ebi.ac.uk>
	<ce6b4d120604280600q2be638f8mfac84220420f25d5@mail.gmail.com>
Message-ID: <1146230371.3955.59.camel@texas.ebi.ac.uk>

Thinking about this a bit more, I think you meant ASCII when you said
ANSI?

FASTA format is very strictly defined. It is a file containing a number
sequences each with their own header, which starts with a '>' symbol.
You can indeed use any character you like within the header, which ends
at the first new-line after the '>' (newline is ASCII 10 or 13, or both,
depending on your OS). No whitespace is allowed at the start or end of
the file or between or within sequences.

The problem with your file is that the unusual characters are appearing
at the start of the file before the first header, and maybe also during
the sequence itself although I didn't look that closely. Hence it breaks
the FASTA format specification.

The problem here lies with the program that is generating your FASTA
file. BioJava is behaving correctly.

cheers,
Richard

On Fri, 2006-04-28 at 15:00 +0200, Ilhami Visne wrote:
> I thought already to convert the file to ANSI. Sequence part must
> contain only ansi-chararacters but header or other annotaion must not
> contain only ansi characters. if i convert it to ansi, doesn't it may
> cause to lose some data? 
> 
> On 4/28/06, Richard Holland <richard.holland at ebi.ac.uk> wrote:
>         I've no idea what binary format that file is in - it contains
>         some very
>         strange characters. It appears to contain _some_ ANSI data but
>         with
>         extra binary bits added to the start and end. I think you need
>         to check
>         the program that generated the file as it is obviously not
>         doing what it
>         is supposed to.
>         
>         Your best bet is to convert the file to ANSI or some other
>         format
>         understood out-of-the-box by Java.
>         
>         cheers,
>         Richard
>         
>         On Fri, 2006-04-28 at 11:09 +0200, Ilhami Visne wrote:
>         > i got a file in fasta format, which is not encoded in ansi.
>         but it seems ok.
>         > it can be downloaded here:
>         http://stud3.tuwien.ac.at/~e0125935/try3.fasta
>         > i tried to read it with SeqIOTools.readFastaDNA and this
>         exception was
>         > thrown:
>         >
>         > org.biojava.bio.BioException: Could not read sequence
>         >     at org.biojava.bio.seq.io.StreamReader.nextSequence
>         (StreamReader.java
>         > :104)
>         > ..............
>         > ..............
>         > Caused by: java.io.IOException: Stream does not appear to
>         contain FASTA
>         > formatted data: ??> 
>         > org.biojava.bio.seq.io.FastaFormat.readSequence
>         (FastaFormat.java:112)
>         >  at org.biojava.bio.seq.io.StreamReader.nextSequence
>         (StreamReader.java:101)
>         >
>         > "??>" there is no row like this but it seems it is hidden. 
>         >
>         > How should i handle such files?
>         >
>         > thax in advance.
>         >
>         > _______________________________________________
>         > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>         > http://lists.open-bio.org/mailman/listinfo/biojava-l
>         >
>         --
>         Richard Holland (BioMart Team)
>         EMBL-EBI
>         Wellcome Trust Genome Campus
>         Hinxton
>         Cambridge CB10 1SD
>         UNITED KINGDOM
>         Tel: +44-(0)1223-494416
>         
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416