From anderson.moura at telemar-rj.com.br Mon Apr 3 10:09:23 2006 From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva) Date: Mon, 3 Apr 2006 11:09:23 -0300 Subject: [Biojava-l] Get a sequence from internet Message-ID: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net> Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava? Can anybody help? Thanks, Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a. From anderson.moura at telemar-rj.com.br Mon Apr 3 11:54:01 2006 From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva) Date: Mon, 3 Apr 2006 12:54:01 -0300 Subject: [Biojava-l] RES: Get a sequence from internet Message-ID: <3C39C09ED334F243838953854BE43FB6025C7F40@MAILBX02.telemar.corp.net> Nice!! It work only with the sequence ID? Can I search by the name of the sequence? Thanks a lot! -----Mensagem original----- De: Dickson S. Guedes [mailto:guedes at unisul.br] Enviada em: segunda-feira, 3 de abril de 2006 12:10 Para: Anderson Moura da Silva Cc: biojava-l at lists.open-bio.org Assunto: Re: [Biojava-l] Get a sequence from internet Yes, Hi Anderson, You can use the NCBISequenceDB: (...) NCBISequenceDB ncbiDB = new NCBISequenceDB(); Sequence sequenceFromGenbank = ncbiDB.getSequence("sequence_id"); System.out.println(sequenceFromGenbank.getName()); (...) Change "sequence_id" for a ID from Genbank. :) Anderson Moura da Silva escreveu: > Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava? > > Can anybody help? > > Thanks, > > > Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a. > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Dickson S. Guedes /* * UNISUL - Universidade do Sul de Santa Catarina * ATI - Assessoria de Tecnologia da Informa??o * (0xx48) 621-3200 - http://www.unisul.br * * "Quis custodiet ipsos custodes?" */ From guedes at unisul.br Mon Apr 3 11:09:43 2006 From: guedes at unisul.br (Dickson S. Guedes) Date: Mon, 03 Apr 2006 12:09:43 -0300 Subject: [Biojava-l] Get a sequence from internet In-Reply-To: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net> References: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net> Message-ID: <44313AB7.7080309@unisul.br> Yes, Hi Anderson, You can use the NCBISequenceDB: (...) NCBISequenceDB ncbiDB = new NCBISequenceDB(); Sequence sequenceFromGenbank = ncbiDB.getSequence("sequence_id"); System.out.println(sequenceFromGenbank.getName()); (...) Change "sequence_id" for a ID from Genbank. :) Anderson Moura da Silva escreveu: > Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava? > > Can anybody help? > > Thanks, > > > Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a. > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Dickson S. Guedes /* * UNISUL - Universidade do Sul de Santa Catarina * ATI - Assessoria de Tecnologia da Informa??o * (0xx48) 621-3200 - http://www.unisul.br * * "Quis custodiet ipsos custodes?" */ From wendy.wong at gmail.com Tue Apr 4 14:22:00 2006 From: wendy.wong at gmail.com (wendy wong) Date: Tue, 4 Apr 2006 19:22:00 +0100 Subject: [Biojava-l] unsupervised training of transition weights In-Reply-To: <200603311805.25861.matthew.pocock@ncl.ac.uk> References: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> <200603311805.25861.matthew.pocock@ncl.ac.uk> Message-ID: Thanks for your advice! I am able to train a subset of transition probabilities now! I found something strange, first I changed my emission distributions to untrainabledistributions and the trainer didn't seem to be doing anything, all cycles have the same score. I then changed it back to SimpleDistribution (still keepting my getWeightImp in my custom distribution). this time it works and it doesn't seem to be modifying my emission probabilities. So it works for me - I am just curious if it is a bug or if I was doing something wrong? Thanks again! wendy On 3/31/06, Matthew Pocock wrote: > > The DP code does some caching of probabilities, I don't think there's > > any way to turn this off without modifying the DP implementations. > > > > Thomas. > > My reccolection is that if you did turn this off, the algorithm would run > very, very much more slowly. Internally to the DP objects, the distribution > probabilities (in fact, they aren't even probabilities by this stage) are > stored in a data-structure optimized for the type of lookups performed during > the dynamic programming recursions. > > Matthew > From wendy.wong at gmail.com Tue Apr 4 14:22:00 2006 From: wendy.wong at gmail.com (wendy wong) Date: Tue, 4 Apr 2006 19:22:00 +0100 Subject: [Biojava-l] unsupervised training of transition weights In-Reply-To: <200603311805.25861.matthew.pocock@ncl.ac.uk> References: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> <200603311805.25861.matthew.pocock@ncl.ac.uk> Message-ID: Thanks for your advice! I am able to train a subset of transition probabilities now! I found something strange, first I changed my emission distributions to untrainabledistributions and the trainer didn't seem to be doing anything, all cycles have the same score. I then changed it back to SimpleDistribution (still keepting my getWeightImp in my custom distribution). this time it works and it doesn't seem to be modifying my emission probabilities. So it works for me - I am just curious if it is a bug or if I was doing something wrong? Thanks again! wendy On 3/31/06, Matthew Pocock wrote: > > The DP code does some caching of probabilities, I don't think there's > > any way to turn this off without modifying the DP implementations. > > > > Thomas. > > My reccolection is that if you did turn this off, the algorithm would run > very, very much more slowly. Internally to the DP objects, the distribution > probabilities (in fact, they aren't even probabilities by this stage) are > stored in a data-structure optimized for the type of lookups performed during > the dynamic programming recursions. > > Matthew > From mthomasc at vub.ac.be Fri Apr 7 05:20:33 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Fri, 07 Apr 2006 11:20:33 +0200 Subject: [Biojava-l] [biojavax] EMBL parser error Message-ID: <44362EE1.5060804@vub.ac.be> Hello, I am currently using biojavax that I checked out today from CVS to parse an EMBL file, exported from EBI SRS server. I ran into this error : Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) at org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) ... 1 more The EMBL file is : ID DQ158013 standard; genomic DNA; VRT; 118 BP. XX AC DQ158013; XX SV DQ158013.1 XX DT 19-JAN-2006 (Rel. 86, Created) DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) XX DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. Removing the two lines that comprise the date information resolves the problem. Thanks, Morgane. -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From richard.holland at ebi.ac.uk Fri Apr 7 05:56:57 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 07 Apr 2006 10:56:57 +0100 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <44362EE1.5060804@vub.ac.be> References: <44362EE1.5060804@vub.ac.be> Message-ID: <1144403817.3958.30.camel@texas.ebi.ac.uk> That was indeed a bug. I have made a change to the date parsing in EMBLFormat and committed it to CVS. Could you test it for me please? cheers, Richard On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: > Hello, > > I am currently using biojavax that I checked out today from CVS to parse > an EMBL file, exported from EBI SRS server. > > I ran into this error : > > Exception in thread "main" org.biojava.bio.BioException: Could not read > sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > at > org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 > at > org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > ... 1 more > > The EMBL file is : > > ID DQ158013 standard; genomic DNA; VRT; 118 BP. > XX > AC DQ158013; > XX > SV DQ158013.1 > XX > DT 19-JAN-2006 (Rel. 86, Created) > DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > XX > DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > > Removing the two lines that comprise the date information resolves the > problem. > > Thanks, > > Morgane. > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From mthomasc at vub.ac.be Fri Apr 7 08:18:36 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Fri, 07 Apr 2006 14:18:36 +0200 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <1144403817.3958.30.camel@texas.ebi.ac.uk> References: <44362EE1.5060804@vub.ac.be> <1144403817.3958.30.camel@texas.ebi.ac.uk> Message-ID: <4436589C.8010501@vub.ac.be> I now get another error message with the same file : Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) at org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) Caused by: java.lang.IndexOutOfBoundsException: No group 5 at java.util.regex.Matcher.group(Matcher.java:355) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) ... 1 more Here is the complete file, for info: ID DQ158013 standard; genomic DNA; VRT; 118 BP. XX AC DQ158013; XX SV DQ158013.1 XX DT 19-JAN-2006 (Rel. 86, Created) DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) XX DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. XX KW . XX OS Triturus helveticus (palmate newt) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Amphibia; OC Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus. XX RN [1] RP 1-118 RX DOI; 10.1016/j.ympev.2005.08.012. RX PUBMED; 16198128. RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; RT "A PCR survey for posterior Hox genes in amphibians"; RL Mol. Phylogenet. Evol. 38(2):449-458(2006). XX RN [2] RP 1-118 RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; RT ; RL Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases. RL Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, RL Belgium XX FH Key Location/Qualifiers FH FT source 1..118 FT /organism="Triturus helveticus" FT /mol_type="genomic DNA" FT /clone="Thel.b9" FT /db_xref="taxon:256425" FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" FT mRNA <1..>118 FT /gene="Hoxb9" FT /product="HOXB9" FT CDS <1..>118 FT /codon_start=2 FT /gene="Hoxb9" FT /product="HOXB9" FT /db_xref="UniProtKB/TrEMBL:Q2LK47" FT /protein_id="ABA39736.1" FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" XX SQ Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other; caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc tcacccggga 60 ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca agatctgg 118 // Thanks for helping, Morgane. Richard Holland wrote: >That was indeed a bug. I have made a change to the date parsing in >EMBLFormat and committed it to CVS. Could you test it for me please? > >cheers, >Richard > >On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: > > >>Hello, >> >>I am currently using biojavax that I checked out today from CVS to parse >>an EMBL file, exported from EBI SRS server. >> >>I ran into this error : >> >>Exception in thread "main" org.biojava.bio.BioException: Could not read >>sequence >> at >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) >> at >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) >>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 >> at >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) >> at >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) >> ... 1 more >> >>The EMBL file is : >> >>ID DQ158013 standard; genomic DNA; VRT; 118 BP. >>XX >>AC DQ158013; >>XX >>SV DQ158013.1 >>XX >>DT 19-JAN-2006 (Rel. 86, Created) >>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) >>XX >>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. >> >>Removing the two lines that comprise the date information resolves the >>problem. >> >>Thanks, >> >>Morgane. >> >> >> -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From richard.holland at ebi.ac.uk Fri Apr 7 08:48:46 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 07 Apr 2006 13:48:46 +0100 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <4436589C.8010501@vub.ac.be> References: <44362EE1.5060804@vub.ac.be> <1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be> Message-ID: <1144414126.3958.32.camel@texas.ebi.ac.uk> Sorry, my bad. An off-by-one error... Check it out again and see if it works now. cheers, Richard PS. I don't have any EMBL files to test with at the moment otherwise I'd check it myself... :) On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote: > I now get another error message with the same file : > > Exception in thread "main" org.biojava.bio.BioException: Could not read > sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > at > org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > Caused by: java.lang.IndexOutOfBoundsException: No group 5 > at java.util.regex.Matcher.group(Matcher.java:355) > at > org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > ... 1 more > > Here is the complete file, for info: > > ID DQ158013 standard; genomic DNA; VRT; 118 BP. > XX > AC DQ158013; > XX > SV DQ158013.1 > XX > DT 19-JAN-2006 (Rel. 86, Created) > DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > XX > DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > XX > KW . > XX > OS Triturus helveticus (palmate newt) > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > Amphibia; > OC Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus. > XX > RN [1] > RP 1-118 > RX DOI; 10.1016/j.ympev.2005.08.012. > RX PUBMED; 16198128. > RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; > RT "A PCR survey for posterior Hox genes in amphibians"; > RL Mol. Phylogenet. Evol. 38(2):449-458(2006). > XX > RN [2] > RP 1-118 > RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; > RT ; > RL Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases. > RL Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, > Brussels 1050, > RL Belgium > XX > FH Key Location/Qualifiers > FH > FT source 1..118 > FT /organism="Triturus helveticus" > FT /mol_type="genomic DNA" > FT /clone="Thel.b9" > FT /db_xref="taxon:256425" > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > FT /codon_start=2 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT /db_xref="UniProtKB/TrEMBL:Q2LK47" > FT /protein_id="ABA39736.1" > FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" > XX > SQ Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other; > caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc > tcacccggga 60 > ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca > agatctgg 118 > // > > Thanks for helping, > > Morgane. > > Richard Holland wrote: > > >That was indeed a bug. I have made a change to the date parsing in > >EMBLFormat and committed it to CVS. Could you test it for me please? > > > >cheers, > >Richard > > > >On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: > > > > > >>Hello, > >> > >>I am currently using biojavax that I checked out today from CVS to parse > >>an EMBL file, exported from EBI SRS server. > >> > >>I ran into this error : > >> > >>Exception in thread "main" org.biojava.bio.BioException: Could not read > >>sequence > >> at > >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > >> at > >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > >>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 > >> at > >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) > >> at > >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > >> ... 1 more > >> > >>The EMBL file is : > >> > >>ID DQ158013 standard; genomic DNA; VRT; 118 BP. > >>XX > >>AC DQ158013; > >>XX > >>SV DQ158013.1 > >>XX > >>DT 19-JAN-2006 (Rel. 86, Created) > >>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > >>XX > >>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > >> > >>Removing the two lines that comprise the date information resolves the > >>problem. > >> > >>Thanks, > >> > >>Morgane. > >> > >> > >> > > -- > ********************************************************** > Morgane THOMAS-CHOLLIER, PHD Student > > Vrije Universiteit Brussels (VUB) > Laboratory of Cell Genetics > Pleinlaan 2 > 1050 Brussels > Belgium > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From richard.holland at ebi.ac.uk Fri Apr 7 09:42:10 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 07 Apr 2006 14:42:10 +0100 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <44366419.4050505@dbm.ulb.ac.be> References: <44362EE1.5060804@vub.ac.be> <1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be> <1144414126.3958.32.camel@texas.ebi.ac.uk> <44366419.4050505@dbm.ulb.ac.be> Message-ID: <1144417330.3958.34.camel@texas.ebi.ac.uk> Hi. Someone else had checked in a change to a different class, but that change was incorrect and didn't compile. It should compile now. cheers, Richard PS. Note to all those who commit changes - PLEASE check your code compiles first before committing it! On Fri, 2006-04-07 at 15:07 +0200, Morgane THOMAS-CHOLLIER wrote: > I tried to checkout biojava-live but it seems I cannot build it anymore. > I get the following error : > > compile-biojava: > [javac] Compiling 1321 source files to > /Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/ant-build/classes/biojava > [javac] > /Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/src/org/biojavax/utils/StringTools.java:97: > exception java.io.IOException is never thrown in body of corresponding > try statement > [javac] } catch (IOException e) { > [javac] ^ > [javac] Note: Some input files use or override a deprecated API. > [javac] Note: Recompile with -deprecation for details. > [javac] 1 error > > I use Mac OS X 10.3.9, java 1.4.2. > > Hope you could help, > > Cheers, > > Morgane. > > > Richard Holland wrote: > > >Sorry, my bad. An off-by-one error... > > > >Check it out again and see if it works now. > > > >cheers, > >Richard > > > >PS. I don't have any EMBL files to test with at the moment otherwise I'd > >check it myself... :) > > > > > >On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote: > > > > > >>I now get another error message with the same file : > >> > >>Exception in thread "main" org.biojava.bio.BioException: Could not read > >>sequence > >> at > >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > >> at > >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > >>Caused by: java.lang.IndexOutOfBoundsException: No group 5 > >> at java.util.regex.Matcher.group(Matcher.java:355) > >> at > >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271) > >> at > >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > >> ... 1 more > >> > >>Here is the complete file, for info: > >> > >>ID DQ158013 standard; genomic DNA; VRT; 118 BP. > >>XX > >>AC DQ158013; > >>XX > >>SV DQ158013.1 > >>XX > >>DT 19-JAN-2006 (Rel. 86, Created) > >>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > >>XX > >>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > >>XX > >>KW . > >>XX > >>OS Triturus helveticus (palmate newt) > >>OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > >>Amphibia; > >>OC Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus. > >>XX > >>RN [1] > >>RP 1-118 > >>RX DOI; 10.1016/j.ympev.2005.08.012. > >>RX PUBMED; 16198128. > >>RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; > >>RT "A PCR survey for posterior Hox genes in amphibians"; > >>RL Mol. Phylogenet. Evol. 38(2):449-458(2006). > >>XX > >>RN [2] > >>RP 1-118 > >>RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; > >>RT ; > >>RL Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases. > >>RL Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, > >>Brussels 1050, > >>RL Belgium > >>XX > >>FH Key Location/Qualifiers > >>FH > >>FT source 1..118 > >>FT /organism="Triturus helveticus" > >>FT /mol_type="genomic DNA" > >>FT /clone="Thel.b9" > >>FT /db_xref="taxon:256425" > >>FT gene <1..>118 > >>FT /gene="Hoxb9" > >>FT /note="Hoxb-9" > >>FT mRNA <1..>118 > >>FT /gene="Hoxb9" > >>FT /product="HOXB9" > >>FT CDS <1..>118 > >>FT /codon_start=2 > >>FT /gene="Hoxb9" > >>FT /product="HOXB9" > >>FT /db_xref="UniProtKB/TrEMBL:Q2LK47" > >>FT /protein_id="ABA39736.1" > >>FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" > >>XX > >>SQ Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other; > >> caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc > >>tcacccggga 60 > >> ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca > >>agatctgg 118 > >>// > >> > >>Thanks for helping, > >> > >>Morgane. > >> > >>Richard Holland wrote: > >> > >> > >> > >>>That was indeed a bug. I have made a change to the date parsing in > >>>EMBLFormat and committed it to CVS. Could you test it for me please? > >>> > >>>cheers, > >>>Richard > >>> > >>>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: > >>> > >>> > >>> > >>> > >>>>Hello, > >>>> > >>>>I am currently using biojavax that I checked out today from CVS to parse > >>>>an EMBL file, exported from EBI SRS server. > >>>> > >>>>I ran into this error : > >>>> > >>>>Exception in thread "main" org.biojava.bio.BioException: Could not read > >>>>sequence > >>>> at > >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > >>>> at > >>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > >>>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 > >>>> at > >>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) > >>>> at > >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > >>>> ... 1 more > >>>> > >>>>The EMBL file is : > >>>> > >>>>ID DQ158013 standard; genomic DNA; VRT; 118 BP. > >>>>XX > >>>>AC DQ158013; > >>>>XX > >>>>SV DQ158013.1 > >>>>XX > >>>>DT 19-JAN-2006 (Rel. 86, Created) > >>>>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > >>>>XX > >>>>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > >>>> > >>>>Removing the two lines that comprise the date information resolves the > >>>>problem. > >>>> > >>>>Thanks, > >>>> > >>>>Morgane. > >>>> > >>>> > >>>> > >>>> > >>>> > >>-- > >>********************************************************** > >>Morgane THOMAS-CHOLLIER, PHD Student > >> > >>Vrije Universiteit Brussels (VUB) > >>Laboratory of Cell Genetics > >>Pleinlaan 2 > >>1050 Brussels > >>Belgium > >> > >> > >> > > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From andreas.draeger at clever-telefonieren.de Fri Apr 7 11:43:35 2006 From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Fri, 07 Apr 2006 17:43:35 +0200 Subject: [Biojava-l] Senseless assignment Message-ID: <443688A7.1000203@clever-telefonieren.de> Hi, This assignment has no effect in class org.biojavax.ontology.SimpleComparableTriple: // Hibernate requirement - not for public use. private void setOntology(ComparableOntology descriptors) { this.ontology = ontology; } I do not know why this is necessary. Andreas -- ================================== Andreas Dr?ger PhD student Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Phone: +49-7071-29-70436 ================================== From richard.holland at ebi.ac.uk Mon Apr 10 05:26:51 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Mon, 10 Apr 2006 10:26:51 +0100 Subject: [Biojava-l] Senseless assignment In-Reply-To: <443688A7.1000203@clever-telefonieren.de> References: <443688A7.1000203@clever-telefonieren.de> Message-ID: <1144661211.3951.9.camel@texas.ebi.ac.uk> It's a typo. The method declaration should read: // Hibernate requirement - not for public use. private void setOntology(ComparableOntology ontology) { this.ontoloy = ontology; } I have fixed it in CVS. cheers, Richard On Fri, 2006-04-07 at 17:43 +0200, Andreas Dr?ger wrote: > Hi, > > This assignment has no effect in class > org.biojavax.ontology.SimpleComparableTriple: > > // Hibernate requirement - not for public use. > private void setOntology(ComparableOntology descriptors) { > this.ontology = ontology; } > > I do not know why this is necessary. > > Andreas > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From mthomasc at vub.ac.be Sat Apr 8 04:20:47 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Sat, 08 Apr 2006 10:20:47 +0200 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <1144417330.3958.34.camel@texas.ebi.ac.uk> References: <44362EE1.5060804@vub.ac.be> <1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be> <1144414126.3958.32.camel@texas.ebi.ac.uk> <44366419.4050505@dbm.ulb.ac.be> <1144417330.3958.34.camel@texas.ebi.ac.uk> Message-ID: <4437725F.9000503@vub.ac.be> It works fine now ! Thanks for your help, cheers, Morgane. Richard Holland wrote: >Hi. Someone else had checked in a change to a different class, but that >change was incorrect and didn't compile. It should compile now. > >cheers, >Richard > >PS. Note to all those who commit changes - PLEASE check your code >compiles first before committing it! > >On Fri, 2006-04-07 at 15:07 +0200, Morgane THOMAS-CHOLLIER wrote: > > >>I tried to checkout biojava-live but it seems I cannot build it anymore. >>I get the following error : >> >>compile-biojava: >> [javac] Compiling 1321 source files to >>/Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/ant-build/classes/biojava >> [javac] >>/Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/src/org/biojavax/utils/StringTools.java:97: >>exception java.io.IOException is never thrown in body of corresponding >>try statement >> [javac] } catch (IOException e) { >> [javac] ^ >> [javac] Note: Some input files use or override a deprecated API. >> [javac] Note: Recompile with -deprecation for details. >> [javac] 1 error >> >>I use Mac OS X 10.3.9, java 1.4.2. >> >>Hope you could help, >> >>Cheers, >> >>Morgane. >> >> >>Richard Holland wrote: >> >> >> >>>Sorry, my bad. An off-by-one error... >>> >>>Check it out again and see if it works now. >>> >>>cheers, >>>Richard >>> >>>PS. I don't have any EMBL files to test with at the moment otherwise I'd >>>check it myself... :) >>> >>> >>>On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote: >>> >>> >>> >>> >>>>I now get another error message with the same file : >>>> >>>>Exception in thread "main" org.biojava.bio.BioException: Could not read >>>>sequence >>>> at >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) >>>> at >>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) >>>>Caused by: java.lang.IndexOutOfBoundsException: No group 5 >>>> at java.util.regex.Matcher.group(Matcher.java:355) >>>> at >>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271) >>>> at >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) >>>> ... 1 more >>>> >>>>Here is the complete file, for info: >>>> >>>>ID DQ158013 standard; genomic DNA; VRT; 118 BP. >>>>XX >>>>AC DQ158013; >>>>XX >>>>SV DQ158013.1 >>>>XX >>>>DT 19-JAN-2006 (Rel. 86, Created) >>>>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) >>>>XX >>>>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. >>>>XX >>>>KW . >>>>XX >>>>OS Triturus helveticus (palmate newt) >>>>OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; >>>>Amphibia; >>>>OC Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus. >>>>XX >>>>RN [1] >>>>RP 1-118 >>>>RX DOI; 10.1016/j.ympev.2005.08.012. >>>>RX PUBMED; 16198128. >>>>RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; >>>>RT "A PCR survey for posterior Hox genes in amphibians"; >>>>RL Mol. Phylogenet. Evol. 38(2):449-458(2006). >>>>XX >>>>RN [2] >>>>RP 1-118 >>>>RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; >>>>RT ; >>>>RL Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases. >>>>RL Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, >>>>Brussels 1050, >>>>RL Belgium >>>>XX >>>>FH Key Location/Qualifiers >>>>FH >>>>FT source 1..118 >>>>FT /organism="Triturus helveticus" >>>>FT /mol_type="genomic DNA" >>>>FT /clone="Thel.b9" >>>>FT /db_xref="taxon:256425" >>>>FT gene <1..>118 >>>>FT /gene="Hoxb9" >>>>FT /note="Hoxb-9" >>>>FT mRNA <1..>118 >>>>FT /gene="Hoxb9" >>>>FT /product="HOXB9" >>>>FT CDS <1..>118 >>>>FT /codon_start=2 >>>>FT /gene="Hoxb9" >>>>FT /product="HOXB9" >>>>FT /db_xref="UniProtKB/TrEMBL:Q2LK47" >>>>FT /protein_id="ABA39736.1" >>>>FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" >>>>XX >>>>SQ Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other; >>>> caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc >>>>tcacccggga 60 >>>> ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca >>>>agatctgg 118 >>>>// >>>> >>>>Thanks for helping, >>>> >>>>Morgane. >>>> >>>>Richard Holland wrote: >>>> >>>> >>>> >>>> >>>> >>>>>That was indeed a bug. I have made a change to the date parsing in >>>>>EMBLFormat and committed it to CVS. Could you test it for me please? >>>>> >>>>>cheers, >>>>>Richard >>>>> >>>>>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Hello, >>>>>> >>>>>>I am currently using biojavax that I checked out today from CVS to parse >>>>>>an EMBL file, exported from EBI SRS server. >>>>>> >>>>>>I ran into this error : >>>>>> >>>>>>Exception in thread "main" org.biojava.bio.BioException: Could not read >>>>>>sequence >>>>>> at >>>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) >>>>>> at >>>>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) >>>>>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 >>>>>> at >>>>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) >>>>>> at >>>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) >>>>>> ... 1 more >>>>>> >>>>>>The EMBL file is : >>>>>> >>>>>>ID DQ158013 standard; genomic DNA; VRT; 118 BP. >>>>>>XX >>>>>>AC DQ158013; >>>>>>XX >>>>>>SV DQ158013.1 >>>>>>XX >>>>>>DT 19-JAN-2006 (Rel. 86, Created) >>>>>>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) >>>>>>XX >>>>>>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. >>>>>> >>>>>>Removing the two lines that comprise the date information resolves the >>>>>>problem. >>>>>> >>>>>>Thanks, >>>>>> >>>>>>Morgane. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>-- >>>>********************************************************** >>>>Morgane THOMAS-CHOLLIER, PHD Student >>>> >>>>Vrije Universiteit Brussels (VUB) >>>>Laboratory of Cell Genetics >>>>Pleinlaan 2 >>>>1050 Brussels >>>>Belgium >>>> >>>> >>>> >>>> >>>> >> >> -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From mthomasc at vub.ac.be Wed Apr 12 04:34:43 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Wed, 12 Apr 2006 10:34:43 +0200 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing Message-ID: <443CBBA3.9070101@vub.ac.be> Hello again, I am currently using biojavax to parse EMBL files exported from Ensembl website. Compared to the EBI files I have, they show a difference in the Features lines : sometimes, only one "/word" is present. ie: EBI file : FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" Ensembl file; FT gene complement(1..3218) FT /gene="ENSMUSG00000038227" The problem I encounter is that the parser correctly convert the "/word" into a Note, but the Note is then in relation with the immediate following feature (ie: mRNA). The current gene feature thus has no annotation. This behavior is reproducible when removing one "/word" of an EBI file. Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up with an incomplete Note, as the parser seems to split on "=" to separate the Key and the Value. Thanks for your help, Morgane. -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From jolyon.holdstock at ogt.co.uk Thu Apr 13 12:42:36 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Thu, 13 Apr 2006 17:42:36 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> Hi Morgane, I have amended the EmblFormat readSection method as below and the parsing seems to work; please test it. I think that the last bit of annotation is carried over into the next feature so before adding the new feature I dump the annotation and reset currentTag and currentVal. if (!line.startsWith(" ")) { //--------- new code starts --------------------------- if (currentTag!=null) { section.add(new String[]{currentTag,currentVal.toString()}); currentTag = null; currentVal = null; } //--------- new code ends ----------------------------- // case 1 : word value - splits into key-value on its own section.add(line.split("\\s+")); } Cheers, Jolyon -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane THOMAS-CHOLLIER Sent: 12 April 2006 09:35 To: biojava-l at open-bio.org Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Hello again, I am currently using biojavax to parse EMBL files exported from Ensembl website. Compared to the EBI files I have, they show a difference in the Features lines : sometimes, only one "/word" is present. ie: EBI file : FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" Ensembl file; FT gene complement(1..3218) FT /gene="ENSMUSG00000038227" The problem I encounter is that the parser correctly convert the "/word" into a Note, but the Note is then in relation with the immediate following feature (ie: mRNA). The current gene feature thus has no annotation. This behavior is reproducible when removing one "/word" of an EBI file. Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up with an incomplete Note, as the parser seems to split on "=" to separate the Key and the Value. Thanks for your help, Morgane. -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. From david at autohandle.com Fri Apr 14 17:29:51 2006 From: david at autohandle.com (David Scott) Date: Fri, 14 Apr 2006 14:29:51 -0700 Subject: [Biojava-l] BioJavaX.html Message-ID: <4440144F.7010603@autohandle.com> is BioJavaX.html posted somewhere - i am getting an ArrayIndexOutofBoundException on the build. thanks From david at autohandle.com Fri Apr 14 17:20:47 2006 From: david at autohandle.com (David Scott) Date: Fri, 14 Apr 2006 14:20:47 -0700 Subject: [Biojava-l] BioJavaX.html Message-ID: <4440122F.2080809@autohandle.com> is it possible to post the BioJavaX.html somewhere - i am getting an ArrayIndexOutOfBoundsException on the build docbook. i used google - but could not locate it. thanks- From mark.schreiber at novartis.com Sat Apr 15 19:19:13 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Sun, 16 Apr 2006 07:19:13 +0800 Subject: [Biojava-l] BioJavaX.html Message-ID: Could someone post the text to the wiki site temporarily. Actually it may be more sensible for this document to be hosted as a wiki page. The wiki was not available at the time that Richard wrote it so moving it may be a good idea. Any objections? Additionally some platforms have trouble building docbook html from ant (especially platforms developed in Redmond WA which we don't speak of). - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 David Scott Sent by: biojava-l-bounces at lists.open-bio.org 04/15/2006 05:20 AM To: biojava-l at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioJavaX.html is it possible to post the BioJavaX.html somewhere - i am getting an ArrayIndexOutOfBoundsException on the build docbook. i used google - but could not locate it. thanks- _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From richard.holland at ebi.ac.uk Tue Apr 18 05:21:49 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 18 Apr 2006 10:21:49 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> Message-ID: <1145352109.4188.3.camel@texas.ebi.ac.uk> I have committed an UNTESTED patch based on Jolyon's suggestion, and also attempted to fix the split-on-equals problem Morgane observed. Please let me know if there are any problems with it. As this problem affected the UniProt parser in a similar manner (much of the code is identical), the same fixes were applied there too. cheers, Richard On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > Hi Morgane, > > I have amended the EmblFormat readSection method as below and the > parsing seems to work; please test it. > > I think that the last bit of annotation is carried over into the next > feature so before adding the new feature I dump the annotation and reset > currentTag and currentVal. > > if (!line.startsWith(" ")) { > //--------- new code starts --------------------------- > if (currentTag!=null) { > section.add(new String[]{currentTag,currentVal.toString()}); > currentTag = null; > currentVal = null; > } > //--------- new code ends ----------------------------- > // case 1 : word value - splits into key-value on its own > section.add(line.split("\\s+")); > } > > Cheers, > > Jolyon > > > > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > THOMAS-CHOLLIER > Sent: 12 April 2006 09:35 > To: biojava-l at open-bio.org > Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] > > Hello again, > > I am currently using biojavax to parse EMBL files exported from Ensembl > website. > > Compared to the EBI files I have, they show a difference in the Features > > lines : > > sometimes, only one "/word" is present. ie: > > EBI file : > > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > > Ensembl file; > > FT gene complement(1..3218) > FT /gene="ENSMUSG00000038227" > > The problem I encounter is that the parser correctly convert the "/word" > > into a Note, but the Note is then in relation with the immediate > following feature (ie: mRNA). > The current gene feature thus has no annotation. > > This behavior is reproducible when removing one "/word" of an EBI file. > > Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a > > feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up > with an incomplete Note, as the parser seems to split on "=" to separate > > the Key and the Value. > > Thanks for your help, > > Morgane. > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From richard.holland at ebi.ac.uk Tue Apr 18 04:20:44 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 18 Apr 2006 09:20:44 +0100 Subject: [Biojava-l] BioJavaX.html In-Reply-To: References: Message-ID: <1145348444.4188.0.camel@texas.ebi.ac.uk> HTML version attached. I've created a placeholder on the BioJava website - could someone convert it who has the time? :) cheers, Richard On Sun, 2006-04-16 at 07:19 +0800, mark.schreiber at novartis.com wrote: > Could someone post the text to the wiki site temporarily. Actually it may > be more sensible for this document to be hosted as a wiki page. The wiki > was not available at the time that Richard wrote it so moving it may be a > good idea. Any objections? > > Additionally some platforms have trouble building docbook html from ant > (especially platforms developed in Redmond WA which we don't speak of). > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > David Scott > Sent by: biojava-l-bounces at lists.open-bio.org > 04/15/2006 05:20 AM > > > To: biojava-l at biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] BioJavaX.html > > > is it possible to post the BioJavaX.html somewhere - i am getting an > ArrayIndexOutOfBoundsException on the build docbook. i used google - > but could not locate it. > > thanks- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/biojava-l/attachments/20060418/f6e5bb6b/attachment-0001.html From J.L.Sharman at sms.ed.ac.uk Wed Apr 19 05:35:14 2006 From: J.L.Sharman at sms.ed.ac.uk (Joanna Sharman) Date: Wed, 19 Apr 2006 10:35:14 +0100 Subject: [Biojava-l] Pairwise Alignment Message-ID: <20060419103514.rwtqmzy00k0ogog8@www.sms.ed.ac.uk> Hello, I'm new to BioJava so I'm sorry if this question has been asked several times before. This is actually sort of in reply to this message from last month: http://lists.open-bio.org/pipermail/biojava-l/2006-March/005365.html I'd like to perform a simple pairwise alignment using the Smith-Waterman class I saw described here: http://www.biojava.org/wiki/BioJava:CookBook:DP:PairWise2 but I can't find the classes it mentions anywhere on the cvs. Can you point me to where they are? Also, I'm just wondering why the HMM method is preferred to the Smith-Waterman (or others)? It seems quite complicated to me, and like it might require more memory, or am I wrong? :) Cheers, Joanna From mthomasc at vub.ac.be Thu Apr 20 05:35:54 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Thu, 20 Apr 2006 11:35:54 +0200 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing In-Reply-To: <1145352109.4188.3.camel@texas.ebi.ac.uk> References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> <1145352109.4188.3.camel@texas.ebi.ac.uk> Message-ID: <444755FA.7030009@vub.ac.be> Hi, I have tested today's version from CVS. Both EBI and Ensembl files now react the same way. The last annotation of a feature is nevertheless related to its immediate following feature. e.g. : FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" FT mRNA <1..>118 FT /gene="Hoxb9" FT /product="HOXB9" FT CDS <1..>118 /note="Hoxb-9" is related to mRNA /product="HOXB9" is related to CDS Concerning the split-on-equals problem, I still observe the problem : [(#2) biojavax:note: transcript_i] for this annotation : /note="transcript_id=ENSMUST00000048680" Thanks for helping, Cheers, Morgane. Richard Holland wrote: > I have committed an UNTESTED patch based on Jolyon's suggestion, and > also attempted to fix the split-on-equals problem Morgane observed. > > Please let me know if there are any problems with it. > > As this problem affected the UniProt parser in a similar manner (much of > the code is identical), the same fixes were applied there too. > > cheers, > Richard > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > >> Hi Morgane, >> >> I have amended the EmblFormat readSection method as below and the >> parsing seems to work; please test it. >> >> I think that the last bit of annotation is carried over into the next >> feature so before adding the new feature I dump the annotation and reset >> currentTag and currentVal. >> >> if (!line.startsWith(" ")) { >> //--------- new code starts --------------------------- >> if (currentTag!=null) { >> section.add(new String[]{currentTag,currentVal.toString()}); >> currentTag = null; >> currentVal = null; >> } >> //--------- new code ends ----------------------------- >> // case 1 : word value - splits into key-value on its own >> section.add(line.split("\\s+")); >> } >> >> Cheers, >> >> Jolyon >> >> >> >> -----Original Message----- >> From: biojava-l-bounces at lists.open-bio.org >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane >> THOMAS-CHOLLIER >> Sent: 12 April 2006 09:35 >> To: biojava-l at open-bio.org >> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] >> >> Hello again, >> >> I am currently using biojavax to parse EMBL files exported from Ensembl >> website. >> >> Compared to the EBI files I have, they show a difference in the Features >> >> lines : >> >> sometimes, only one "/word" is present. ie: >> >> EBI file : >> >> FT gene <1..>118 >> FT /gene="Hoxb9" >> FT /note="Hoxb-9" >> >> Ensembl file; >> >> FT gene complement(1..3218) >> FT /gene="ENSMUSG00000038227" >> >> The problem I encounter is that the parser correctly convert the "/word" >> >> into a Note, but the Note is then in relation with the immediate >> following feature (ie: mRNA). >> The current gene feature thus has no annotation. >> >> This behavior is reproducible when removing one "/word" of an EBI file. >> >> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a >> >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up >> with an incomplete Note, as the parser seems to split on "=" to separate >> >> the Key and the Value. >> >> Thanks for your help, >> >> Morgane. >> >> -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc at vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium Tel : +32 2 629 15 22 ********************************************************** Stop Using Internet Explorer, choose FIREFOX ! From richard.holland at ebi.ac.uk Thu Apr 20 08:05:00 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 20 Apr 2006 13:05:00 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing In-Reply-To: <444755FA.7030009@vub.ac.be> References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> <1145352109.4188.3.camel@texas.ebi.ac.uk> <444755FA.7030009@vub.ac.be> Message-ID: <1145534700.4188.28.camel@texas.ebi.ac.uk> Hi. I made some small changes to the code, although nothing that would fix this kind of problem, committed it back to CVS, checked it out again, compiled, and ran a test program that read in an EMBL file with the feature table you describe below, and output it in EMBL format to another file. I then compared the two files... and found no differences! The split-on-equals problem didn't occur, and all notes appeared alongside their correct features. Could there be a problem maybe with the script you are using? I've really no idea what the problem is as I can't reproduce it based on the current CVS contents! cheers, Richard On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: > Hi, > > I have tested today's version from CVS. > > Both EBI and Ensembl files now react the same way. > The last annotation of a feature is nevertheless related to its > immediate following feature. > e.g. : > > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > > /note="Hoxb-9" is related to mRNA > /product="HOXB9" is related to CDS > > Concerning the split-on-equals problem, I still observe the problem : > > [(#2) biojavax:note: transcript_i] > > for this annotation : /note="transcript_id=ENSMUST00000048680" > > Thanks for helping, > > Cheers, > > Morgane. > > Richard Holland wrote: > > I have committed an UNTESTED patch based on Jolyon's suggestion, and > > also attempted to fix the split-on-equals problem Morgane observed. > > > > Please let me know if there are any problems with it. > > > > As this problem affected the UniProt parser in a similar manner (much of > > the code is identical), the same fixes were applied there too. > > > > cheers, > > Richard > > > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > > > >> Hi Morgane, > >> > >> I have amended the EmblFormat readSection method as below and the > >> parsing seems to work; please test it. > >> > >> I think that the last bit of annotation is carried over into the next > >> feature so before adding the new feature I dump the annotation and reset > >> currentTag and currentVal. > >> > >> if (!line.startsWith(" ")) { > >> //--------- new code starts --------------------------- > >> if (currentTag!=null) { > >> section.add(new String[]{currentTag,currentVal.toString()}); > >> currentTag = null; > >> currentVal = null; > >> } > >> //--------- new code ends ----------------------------- > >> // case 1 : word value - splits into key-value on its own > >> section.add(line.split("\\s+")); > >> } > >> > >> Cheers, > >> > >> Jolyon > >> > >> > >> > >> -----Original Message----- > >> From: biojava-l-bounces at lists.open-bio.org > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > >> THOMAS-CHOLLIER > >> Sent: 12 April 2006 09:35 > >> To: biojava-l at open-bio.org > >> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] > >> > >> Hello again, > >> > >> I am currently using biojavax to parse EMBL files exported from Ensembl > >> website. > >> > >> Compared to the EBI files I have, they show a difference in the Features > >> > >> lines : > >> > >> sometimes, only one "/word" is present. ie: > >> > >> EBI file : > >> > >> FT gene <1..>118 > >> FT /gene="Hoxb9" > >> FT /note="Hoxb-9" > >> > >> Ensembl file; > >> > >> FT gene complement(1..3218) > >> FT /gene="ENSMUSG00000038227" > >> > >> The problem I encounter is that the parser correctly convert the "/word" > >> > >> into a Note, but the Note is then in relation with the immediate > >> following feature (ie: mRNA). > >> The current gene feature thus has no annotation. > >> > >> This behavior is reproducible when removing one "/word" of an EBI file. > >> > >> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a > >> > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up > >> with an incomplete Note, as the parser seems to split on "=" to separate > >> > >> the Key and the Value. > >> > >> Thanks for your help, > >> > >> Morgane. > >> > >> > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From jolyon.holdstock at ogt.co.uk Thu Apr 20 08:08:40 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Thu, 20 Apr 2006 13:08:40 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com> I've run the sequence through the parser and it seems to work OK. I iterate through the features and then iterate through the annotations of that feature Based on the input.... FT source 1..118 FT /organism="Triturus helveticus" FT /mol_type="genomic DNA" FT /clone="Thel.b9" FT /db_xref="taxon:256425" FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" FT mRNA <1..>118 FT /gene="Hoxb9" FT /product="HOXB9" FT CDS <1..>118 FT /codon_start=2 FT /gene="Hoxb9" FT /product="HOXB9" FT /db_xref="UniProtKB/TrEMBL:Q2LK47" FT /protein_id="ABA39736.1" FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" The output is.... ======================================== Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118) Note: (#0) biojavax:mol_type: genomic DNA Note: (#1) biojavax:clone: Thel.b9 ======================================== Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>) Note: (#2) biojavax:gene: Hoxb9 Note: (#3) biojavax:note: Hoxb-9 ======================================== Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>) Note: (#4) biojavax:gene: Hoxb9 Note: (#5) biojavax:product: HOXB9 ======================================== Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>) Note: (#6) biojavax:codon_start: 2 Note: (#7) biojavax:gene: Hoxb9 Note: (#8) biojavax:product: HOXB9 Note: (#9) biojavax:protein_id: ABA39736.1 Note: (#10) biojavax:translation: KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW Note: (#11) biojavax:translation: KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW ============================================= This looks OK, the one thing I've just noticed is that the last piece of annotation of the last feature is assigned twice. Jolyon -----Original Message----- From: Richard Holland [mailto:richard.holland at ebi.ac.uk] Sent: 20 April 2006 13:05 To: mthomas at dbm.ulb.ac.be Cc: Jolyon Holdstock; biojava-l at open-bio.org Subject: Re: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Hi. I made some small changes to the code, although nothing that would fix this kind of problem, committed it back to CVS, checked it out again, compiled, and ran a test program that read in an EMBL file with the feature table you describe below, and output it in EMBL format to another file. I then compared the two files... and found no differences! The split-on-equals problem didn't occur, and all notes appeared alongside their correct features. Could there be a problem maybe with the script you are using? I've really no idea what the problem is as I can't reproduce it based on the current CVS contents! cheers, Richard On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: > Hi, > > I have tested today's version from CVS. > > Both EBI and Ensembl files now react the same way. > The last annotation of a feature is nevertheless related to its > immediate following feature. > e.g. : > > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > > /note="Hoxb-9" is related to mRNA > /product="HOXB9" is related to CDS > > Concerning the split-on-equals problem, I still observe the problem : > > [(#2) biojavax:note: transcript_i] > > for this annotation : /note="transcript_id=ENSMUST00000048680" > > Thanks for helping, > > Cheers, > > Morgane. > > Richard Holland wrote: > > I have committed an UNTESTED patch based on Jolyon's suggestion, and > > also attempted to fix the split-on-equals problem Morgane observed. > > > > Please let me know if there are any problems with it. > > > > As this problem affected the UniProt parser in a similar manner (much of > > the code is identical), the same fixes were applied there too. > > > > cheers, > > Richard > > > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > > > >> Hi Morgane, > >> > >> I have amended the EmblFormat readSection method as below and the > >> parsing seems to work; please test it. > >> > >> I think that the last bit of annotation is carried over into the next > >> feature so before adding the new feature I dump the annotation and reset > >> currentTag and currentVal. > >> > >> if (!line.startsWith(" ")) { > >> //--------- new code starts --------------------------- > >> if (currentTag!=null) { > >> section.add(new String[]{currentTag,currentVal.toString()}); > >> currentTag = null; > >> currentVal = null; > >> } > >> //--------- new code ends ----------------------------- > >> // case 1 : word value - splits into key-value on its own > >> section.add(line.split("\\s+")); > >> } > >> > >> Cheers, > >> > >> Jolyon > >> > >> > >> > >> -----Original Message----- > >> From: biojava-l-bounces at lists.open-bio.org > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > >> THOMAS-CHOLLIER > >> Sent: 12 April 2006 09:35 > >> To: biojava-l at open-bio.org > >> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] > >> > >> Hello again, > >> > >> I am currently using biojavax to parse EMBL files exported from Ensembl > >> website. > >> > >> Compared to the EBI files I have, they show a difference in the Features > >> > >> lines : > >> > >> sometimes, only one "/word" is present. ie: > >> > >> EBI file : > >> > >> FT gene <1..>118 > >> FT /gene="Hoxb9" > >> FT /note="Hoxb-9" > >> > >> Ensembl file; > >> > >> FT gene complement(1..3218) > >> FT /gene="ENSMUSG00000038227" > >> > >> The problem I encounter is that the parser correctly convert the "/word" > >> > >> into a Note, but the Note is then in relation with the immediate > >> following feature (ie: mRNA). > >> The current gene feature thus has no annotation. > >> > >> This behavior is reproducible when removing one "/word" of an EBI file. > >> > >> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a > >> > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up > >> with an incomplete Note, as the parser seems to split on "=" to separate > >> > >> the Key and the Value. > >> > >> Thanks for your help, > >> > >> Morgane. > >> > >> > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. From richard.holland at ebi.ac.uk Thu Apr 20 08:16:00 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 20 Apr 2006 13:16:00 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com> Message-ID: <1145535361.4188.33.camel@texas.ebi.ac.uk> Did you use the latest CVS version? (I committed a change that I think should have fixed that about 1 minute before my previous email). On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote: > I've run the sequence through the parser and it seems to work OK. I > iterate through the features and then iterate through the annotations of > that feature > > Based on the input.... > > FT source 1..118 > FT /organism="Triturus helveticus" > FT /mol_type="genomic DNA" > FT /clone="Thel.b9" > FT /db_xref="taxon:256425" > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > FT /codon_start=2 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT /db_xref="UniProtKB/TrEMBL:Q2LK47" > FT /protein_id="ABA39736.1" > FT > /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" > > The output is.... > > ======================================== > Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118) > Note: (#0) biojavax:mol_type: genomic DNA > Note: (#1) biojavax:clone: Thel.b9 > ======================================== > Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>) > Note: (#2) biojavax:gene: Hoxb9 > Note: (#3) biojavax:note: Hoxb-9 > ======================================== > Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>) > Note: (#4) biojavax:gene: Hoxb9 > Note: (#5) biojavax:product: HOXB9 > ======================================== > Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>) > Note: (#6) biojavax:codon_start: 2 > Note: (#7) biojavax:gene: Hoxb9 > Note: (#8) biojavax:product: HOXB9 > Note: (#9) biojavax:protein_id: ABA39736.1 > Note: (#10) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > Note: (#11) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > ============================================= > > This looks OK, the one thing I've just noticed is that the last piece of > annotation of the last feature is assigned twice. > > Jolyon > > > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 20 April 2006 13:05 > To: mthomas at dbm.ulb.ac.be > Cc: Jolyon Holdstock; biojava-l at open-bio.org > Subject: Re: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > Hi. > > I made some small changes to the code, although nothing that would fix > this kind of problem, committed it back to CVS, checked it out again, > compiled, and ran a test program that read in an EMBL file with the > feature table you describe below, and output it in EMBL format to > another file. I then compared the two files... and found no differences! > The split-on-equals problem didn't occur, and all notes appeared > alongside their correct features. > > Could there be a problem maybe with the script you are using? > > I've really no idea what the problem is as I can't reproduce it based on > the current CVS contents! > > cheers, > Richard > > On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: > > Hi, > > > > I have tested today's version from CVS. > > > > Both EBI and Ensembl files now react the same way. > > The last annotation of a feature is nevertheless related to its > > immediate following feature. > > e.g. : > > > > FT gene <1..>118 > > FT /gene="Hoxb9" > > FT /note="Hoxb-9" > > FT mRNA <1..>118 > > FT /gene="Hoxb9" > > FT /product="HOXB9" > > FT CDS <1..>118 > > > > /note="Hoxb-9" is related to mRNA > > /product="HOXB9" is related to CDS > > > > Concerning the split-on-equals problem, I still observe the problem : > > > > [(#2) biojavax:note: transcript_i] > > > > for this annotation : /note="transcript_id=ENSMUST00000048680" > > > > Thanks for helping, > > > > Cheers, > > > > Morgane. > > > > Richard Holland wrote: > > > I have committed an UNTESTED patch based on Jolyon's suggestion, and > > > also attempted to fix the split-on-equals problem Morgane observed. > > > > > > Please let me know if there are any problems with it. > > > > > > As this problem affected the UniProt parser in a similar manner > (much of > > > the code is identical), the same fixes were applied there too. > > > > > > cheers, > > > Richard > > > > > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > > > > > >> Hi Morgane, > > >> > > >> I have amended the EmblFormat readSection method as below and the > > >> parsing seems to work; please test it. > > >> > > >> I think that the last bit of annotation is carried over into the > next > > >> feature so before adding the new feature I dump the annotation and > reset > > >> currentTag and currentVal. > > >> > > >> if (!line.startsWith(" ")) { > > >> //--------- new code starts --------------------------- > > >> if (currentTag!=null) { > > >> section.add(new String[]{currentTag,currentVal.toString()}); > > >> currentTag = null; > > >> currentVal = null; > > >> } > > >> //--------- new code ends ----------------------------- > > >> // case 1 : word value - splits into key-value on its own > > >> section.add(line.split("\\s+")); > > >> } > > >> > > >> Cheers, > > >> > > >> Jolyon > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: biojava-l-bounces at lists.open-bio.org > > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > > >> THOMAS-CHOLLIER > > >> Sent: 12 April 2006 09:35 > > >> To: biojava-l at open-bio.org > > >> Subject: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > >> > > >> Hello again, > > >> > > >> I am currently using biojavax to parse EMBL files exported from > Ensembl > > >> website. > > >> > > >> Compared to the EBI files I have, they show a difference in the > Features > > >> > > >> lines : > > >> > > >> sometimes, only one "/word" is present. ie: > > >> > > >> EBI file : > > >> > > >> FT gene <1..>118 > > >> FT /gene="Hoxb9" > > >> FT /note="Hoxb-9" > > >> > > >> Ensembl file; > > >> > > >> FT gene complement(1..3218) > > >> FT /gene="ENSMUSG00000038227" > > >> > > >> The problem I encounter is that the parser correctly convert the > "/word" > > >> > > >> into a Note, but the Note is then in relation with the immediate > > >> following feature (ie: mRNA). > > >> The current gene feature thus has no annotation. > > >> > > >> This behavior is reproducible when removing one "/word" of an EBI > file. > > >> > > >> Apart from this issue, I noted that Ensembl EMBL files uses "=" > inside a > > >> > > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends > up > > >> with an incomplete Note, as the parser seems to split on "=" to > separate > > >> > > >> the Key and the Value. > > >> > > >> Thanks for your help, > > >> > > >> Morgane. > > >> > > >> > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mthomasc at vub.ac.be Thu Apr 20 08:30:10 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Thu, 20 Apr 2006 14:30:10 +0200 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Resolved] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com> Message-ID: <44477ED2.2010200@vub.ac.be> I've just updated my sources few minutes ago and everything works fine now (both annotations and split-on-equals problem). I've tested both the EBI file and Ensembl file. Thanks for fixing the problems !! Cheers, Morgane Jolyon Holdstock wrote: > No, I'll update my source. > > Thanks, > > Jolyon > > > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 20 April 2006 13:16 > To: Jolyon Holdstock > Cc: mthomas at dbm.ulb.ac.be; biojava-l at open-bio.org > Subject: RE: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > Did you use the latest CVS version? (I committed a change that I think > should have fixed that about 1 minute before my previous email). > > > On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote: > >> I've run the sequence through the parser and it seems to work OK. I >> iterate through the features and then iterate through the annotations >> > of > >> that feature >> >> Based on the input.... >> >> FT source 1..118 >> FT /organism="Triturus helveticus" >> FT /mol_type="genomic DNA" >> FT /clone="Thel.b9" >> FT /db_xref="taxon:256425" >> FT gene <1..>118 >> FT /gene="Hoxb9" >> FT /note="Hoxb-9" >> FT mRNA <1..>118 >> FT /gene="Hoxb9" >> FT /product="HOXB9" >> FT CDS <1..>118 >> FT /codon_start=2 >> FT /gene="Hoxb9" >> FT /product="HOXB9" >> FT /db_xref="UniProtKB/TrEMBL:Q2LK47" >> FT /protein_id="ABA39736.1" >> FT >> /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" >> >> The output is.... >> >> ======================================== >> Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118) >> Note: (#0) biojavax:mol_type: genomic DNA >> Note: (#1) biojavax:clone: Thel.b9 >> ======================================== >> Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>) >> Note: (#2) biojavax:gene: Hoxb9 >> Note: (#3) biojavax:note: Hoxb-9 >> ======================================== >> Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>) >> Note: (#4) biojavax:gene: Hoxb9 >> Note: (#5) biojavax:product: HOXB9 >> ======================================== >> Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>) >> Note: (#6) biojavax:codon_start: 2 >> Note: (#7) biojavax:gene: Hoxb9 >> Note: (#8) biojavax:product: HOXB9 >> Note: (#9) biojavax:protein_id: ABA39736.1 >> Note: (#10) biojavax:translation: >> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW >> Note: (#11) biojavax:translation: >> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW >> ============================================= >> >> This looks OK, the one thing I've just noticed is that the last piece >> > of > >> annotation of the last feature is assigned twice. >> >> Jolyon >> >> >> -----Original Message----- >> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] >> Sent: 20 April 2006 13:05 >> To: mthomas at dbm.ulb.ac.be >> Cc: Jolyon Holdstock; biojava-l at open-bio.org >> Subject: Re: [Biojava-l] [biojavax] EMBL parser : features >> parsing[Scanned] >> >> Hi. >> >> I made some small changes to the code, although nothing that would fix >> this kind of problem, committed it back to CVS, checked it out again, >> compiled, and ran a test program that read in an EMBL file with the >> feature table you describe below, and output it in EMBL format to >> another file. I then compared the two files... and found no >> > differences! > >> The split-on-equals problem didn't occur, and all notes appeared >> alongside their correct features. >> >> Could there be a problem maybe with the script you are using? >> >> I've really no idea what the problem is as I can't reproduce it based >> > on > >> the current CVS contents! >> >> cheers, >> Richard >> >> On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: >> >>> Hi, >>> >>> I have tested today's version from CVS. >>> >>> Both EBI and Ensembl files now react the same way. >>> The last annotation of a feature is nevertheless related to its >>> immediate following feature. >>> e.g. : >>> >>> FT gene <1..>118 >>> FT /gene="Hoxb9" >>> FT /note="Hoxb-9" >>> FT mRNA <1..>118 >>> FT /gene="Hoxb9" >>> FT /product="HOXB9" >>> FT CDS <1..>118 >>> >>> /note="Hoxb-9" is related to mRNA >>> /product="HOXB9" is related to CDS >>> >>> Concerning the split-on-equals problem, I still observe the problem >>> > : > >>> [(#2) biojavax:note: transcript_i] >>> >>> for this annotation : /note="transcript_id=ENSMUST00000048680" >>> >>> Thanks for helping, >>> >>> Cheers, >>> >>> Morgane. >>> >>> Richard Holland wrote: >>> >>>> I have committed an UNTESTED patch based on Jolyon's suggestion, >>>> > and > >>>> also attempted to fix the split-on-equals problem Morgane >>>> > observed. > >>>> Please let me know if there are any problems with it. >>>> >>>> As this problem affected the UniProt parser in a similar manner >>>> >> (much of >> >>>> the code is identical), the same fixes were applied there too. >>>> >>>> cheers, >>>> Richard >>>> >>>> On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: >>>> >>>> >>>>> Hi Morgane, >>>>> >>>>> I have amended the EmblFormat readSection method as below and the >>>>> parsing seems to work; please test it. >>>>> >>>>> I think that the last bit of annotation is carried over into the >>>>> >> next >> >>>>> feature so before adding the new feature I dump the annotation >>>>> > and > >> reset >> >>>>> currentTag and currentVal. >>>>> >>>>> if (!line.startsWith(" ")) { >>>>> //--------- new code starts --------------------------- >>>>> if (currentTag!=null) { >>>>> section.add(new String[]{currentTag,currentVal.toString()}); >>>>> currentTag = null; >>>>> currentVal = null; >>>>> } >>>>> //--------- new code ends ----------------------------- >>>>> // case 1 : word value - splits into key-value on its own >>>>> section.add(line.split("\\s+")); >>>>> } >>>>> >>>>> Cheers, >>>>> >>>>> Jolyon >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: biojava-l-bounces at lists.open-bio.org >>>>> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of >>>>> > Morgane > >>>>> THOMAS-CHOLLIER >>>>> Sent: 12 April 2006 09:35 >>>>> To: biojava-l at open-bio.org >>>>> Subject: [Biojava-l] [biojavax] EMBL parser : features >>>>> >> parsing[Scanned] >> >>>>> Hello again, >>>>> >>>>> I am currently using biojavax to parse EMBL files exported from >>>>> >> Ensembl >> >>>>> website. >>>>> >>>>> Compared to the EBI files I have, they show a difference in the >>>>> >> Features >> >>>>> lines : >>>>> >>>>> sometimes, only one "/word" is present. ie: >>>>> >>>>> EBI file : >>>>> >>>>> FT gene <1..>118 >>>>> FT /gene="Hoxb9" >>>>> FT /note="Hoxb-9" >>>>> >>>>> Ensembl file; >>>>> >>>>> FT gene complement(1..3218) >>>>> FT /gene="ENSMUSG00000038227" >>>>> >>>>> The problem I encounter is that the parser correctly convert the >>>>> >> "/word" >> >>>>> into a Note, but the Note is then in relation with the immediate >>>>> following feature (ie: mRNA). >>>>> The current gene feature thus has no annotation. >>>>> >>>>> This behavior is reproducible when removing one "/word" of an EBI >>>>> >> file. >> >>>>> Apart from this issue, I noted that Ensembl EMBL files uses "=" >>>>> >> inside a >> >>>>> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends >>>>> >> up >> >>>>> with an incomplete Note, as the parser seems to split on "=" to >>>>> >> separate >> >>>>> the Key and the Value. >>>>> >>>>> Thanks for your help, >>>>> >>>>> Morgane. >>>>> >>>>> >>>>> -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc at vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From jolyon.holdstock at ogt.co.uk Thu Apr 20 08:18:21 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Thu, 20 Apr 2006 13:18:21 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com> No, I'll update my source. Thanks, Jolyon -----Original Message----- From: Richard Holland [mailto:richard.holland at ebi.ac.uk] Sent: 20 April 2006 13:16 To: Jolyon Holdstock Cc: mthomas at dbm.ulb.ac.be; biojava-l at open-bio.org Subject: RE: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Did you use the latest CVS version? (I committed a change that I think should have fixed that about 1 minute before my previous email). On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote: > I've run the sequence through the parser and it seems to work OK. I > iterate through the features and then iterate through the annotations of > that feature > > Based on the input.... > > FT source 1..118 > FT /organism="Triturus helveticus" > FT /mol_type="genomic DNA" > FT /clone="Thel.b9" > FT /db_xref="taxon:256425" > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > FT /codon_start=2 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT /db_xref="UniProtKB/TrEMBL:Q2LK47" > FT /protein_id="ABA39736.1" > FT > /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" > > The output is.... > > ======================================== > Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118) > Note: (#0) biojavax:mol_type: genomic DNA > Note: (#1) biojavax:clone: Thel.b9 > ======================================== > Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>) > Note: (#2) biojavax:gene: Hoxb9 > Note: (#3) biojavax:note: Hoxb-9 > ======================================== > Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>) > Note: (#4) biojavax:gene: Hoxb9 > Note: (#5) biojavax:product: HOXB9 > ======================================== > Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>) > Note: (#6) biojavax:codon_start: 2 > Note: (#7) biojavax:gene: Hoxb9 > Note: (#8) biojavax:product: HOXB9 > Note: (#9) biojavax:protein_id: ABA39736.1 > Note: (#10) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > Note: (#11) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > ============================================= > > This looks OK, the one thing I've just noticed is that the last piece of > annotation of the last feature is assigned twice. > > Jolyon > > > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 20 April 2006 13:05 > To: mthomas at dbm.ulb.ac.be > Cc: Jolyon Holdstock; biojava-l at open-bio.org > Subject: Re: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > Hi. > > I made some small changes to the code, although nothing that would fix > this kind of problem, committed it back to CVS, checked it out again, > compiled, and ran a test program that read in an EMBL file with the > feature table you describe below, and output it in EMBL format to > another file. I then compared the two files... and found no differences! > The split-on-equals problem didn't occur, and all notes appeared > alongside their correct features. > > Could there be a problem maybe with the script you are using? > > I've really no idea what the problem is as I can't reproduce it based on > the current CVS contents! > > cheers, > Richard > > On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: > > Hi, > > > > I have tested today's version from CVS. > > > > Both EBI and Ensembl files now react the same way. > > The last annotation of a feature is nevertheless related to its > > immediate following feature. > > e.g. : > > > > FT gene <1..>118 > > FT /gene="Hoxb9" > > FT /note="Hoxb-9" > > FT mRNA <1..>118 > > FT /gene="Hoxb9" > > FT /product="HOXB9" > > FT CDS <1..>118 > > > > /note="Hoxb-9" is related to mRNA > > /product="HOXB9" is related to CDS > > > > Concerning the split-on-equals problem, I still observe the problem : > > > > [(#2) biojavax:note: transcript_i] > > > > for this annotation : /note="transcript_id=ENSMUST00000048680" > > > > Thanks for helping, > > > > Cheers, > > > > Morgane. > > > > Richard Holland wrote: > > > I have committed an UNTESTED patch based on Jolyon's suggestion, and > > > also attempted to fix the split-on-equals problem Morgane observed. > > > > > > Please let me know if there are any problems with it. > > > > > > As this problem affected the UniProt parser in a similar manner > (much of > > > the code is identical), the same fixes were applied there too. > > > > > > cheers, > > > Richard > > > > > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > > > > > >> Hi Morgane, > > >> > > >> I have amended the EmblFormat readSection method as below and the > > >> parsing seems to work; please test it. > > >> > > >> I think that the last bit of annotation is carried over into the > next > > >> feature so before adding the new feature I dump the annotation and > reset > > >> currentTag and currentVal. > > >> > > >> if (!line.startsWith(" ")) { > > >> //--------- new code starts --------------------------- > > >> if (currentTag!=null) { > > >> section.add(new String[]{currentTag,currentVal.toString()}); > > >> currentTag = null; > > >> currentVal = null; > > >> } > > >> //--------- new code ends ----------------------------- > > >> // case 1 : word value - splits into key-value on its own > > >> section.add(line.split("\\s+")); > > >> } > > >> > > >> Cheers, > > >> > > >> Jolyon > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: biojava-l-bounces at lists.open-bio.org > > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > > >> THOMAS-CHOLLIER > > >> Sent: 12 April 2006 09:35 > > >> To: biojava-l at open-bio.org > > >> Subject: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > >> > > >> Hello again, > > >> > > >> I am currently using biojavax to parse EMBL files exported from > Ensembl > > >> website. > > >> > > >> Compared to the EBI files I have, they show a difference in the > Features > > >> > > >> lines : > > >> > > >> sometimes, only one "/word" is present. ie: > > >> > > >> EBI file : > > >> > > >> FT gene <1..>118 > > >> FT /gene="Hoxb9" > > >> FT /note="Hoxb-9" > > >> > > >> Ensembl file; > > >> > > >> FT gene complement(1..3218) > > >> FT /gene="ENSMUSG00000038227" > > >> > > >> The problem I encounter is that the parser correctly convert the > "/word" > > >> > > >> into a Note, but the Note is then in relation with the immediate > > >> following feature (ie: mRNA). > > >> The current gene feature thus has no annotation. > > >> > > >> This behavior is reproducible when removing one "/word" of an EBI > file. > > >> > > >> Apart from this issue, I noted that Ensembl EMBL files uses "=" > inside a > > >> > > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends > up > > >> with an incomplete Note, as the parser seems to split on "=" to > separate > > >> > > >> the Key and the Value. > > >> > > >> Thanks for your help, > > >> > > >> Morgane. > > >> > > >> > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. From mark.schreiber at novartis.com Tue Apr 25 02:07:59 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 25 Apr 2006 14:07:59 +0800 Subject: [Biojava-l] Pairwise Alignment Message-ID: Hi - The appropriate classes for SW and NW pairwise alignment are in the org.biojava.bio.alignment package in the CVS (see http://code.open-bio.org/cgi/viewcvs.cgi/biojava-live/src/org/biojava/bio/alignment/?cvsroot=biojava). While SW and NW are simple they are not as flexible as the pairwise architectures that can be made with HMMs. For a standard pairwise alignment I would think that the SW and NW algorithms are fine. I'm not sure about comparative speed or memory requirements. - Mark Joanna Sharman Sent by: biojava-l-bounces at lists.open-bio.org 04/19/2006 05:35 PM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Pairwise Alignment Hello, I'm new to BioJava so I'm sorry if this question has been asked several times before. This is actually sort of in reply to this message from last month: http://lists.open-bio.org/pipermail/biojava-l/2006-March/005365.html I'd like to perform a simple pairwise alignment using the Smith-Waterman class I saw described here: http://www.biojava.org/wiki/BioJava:CookBook:DP:PairWise2 but I can't find the classes it mentions anywhere on the cvs. Can you point me to where they are? Also, I'm just wondering why the HMM method is preferred to the Smith-Waterman (or others)? It seems quite complicated to me, and like it might require more memory, or am I wrong? :) Cheers, Joanna _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From e.willighagen at science.ru.nl Wed Apr 26 12:03:47 2006 From: e.willighagen at science.ru.nl (Egon Willighagen) Date: Wed, 26 Apr 2006 18:03:47 +0200 Subject: [Biojava-l] org.biojava.bio.gui.glyph classes? Message-ID: <200604261803.47333.e.willighagen@science.ru.nl> Hi all, in the wiki I saw mention of the org.biojava.bio.gui.glyph package, which does not seem to be part of BioJava 1.4. Where can I download the code classes in that package? Egon -- Radboud University Nijmegen http://www.cac.science.ru.nl/ blog: http://chem-bla-ics.blogspot.com/ From mark.schreiber at novartis.com Wed Apr 26 21:14:38 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 27 Apr 2006 09:14:38 +0800 Subject: [Biojava-l] org.biojava.bio.gui.glyph classes? Message-ID: Hi - They are in biojava-live, which is the development version available for download via cvs. Take a look at the instructions on www.biojava.org. - Mark Egon Willighagen Sent by: biojava-l-bounces at lists.open-bio.org 04/27/2006 12:03 AM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] org.biojava.bio.gui.glyph classes? Hi all, in the wiki I saw mention of the org.biojava.bio.gui.glyph package, which does not seem to be part of BioJava 1.4. Where can I download the code classes in that package? Egon -- Radboud University Nijmegen http://www.cac.science.ru.nl/ blog: http://chem-bla-ics.blogspot.com/ _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From heatkent at gmail.com Wed Apr 26 19:22:46 2006 From: heatkent at gmail.com (Heather Kent) Date: Wed, 26 Apr 2006 18:22:46 -0500 Subject: [Biojava-l] chromatogram viewer Message-ID: I'm wondering if anyone can help me locate some source code for swing components involved in viewing chromatograms, i read a 2003 forum from biojava where Rhett Sutphin mentioned he would make some source code for a chromatogram viewer (using the chromatogramgraphic class) available but i cant seem to find it anywhere....im trying to fashion some scroll bars for my chromatogram viewer that function to scroll through the image, as well as vertically and horizontally scale the chromatgram....i have some code from an old viewer that will perform all these functions but doesnt use any of the biojava classes or swing components.... thanx heather From russ at kepler-eng.com Thu Apr 27 00:24:19 2006 From: russ at kepler-eng.com (Russ Kepler) Date: Wed, 26 Apr 2006 22:24:19 -0600 Subject: [Biojava-l] chromatogram viewer In-Reply-To: References: Message-ID: <200604262224.19525.russ@kepler-eng.com> On Wednesday 26 April 2006 05:22 pm, Heather Kent wrote: > I'm wondering if anyone can help me locate some source code for swing > components involved in viewing chromatograms, i read a 2003 forum from > biojava where Rhett Sutphin mentioned he would make some source code for a > chromatogram viewer (using the chromatogramgraphic class) available but i > cant seem to find it anywhere....im trying to fashion some scroll bars for > my chromatogram viewer that function to scroll through the image, as well > as vertically and horizontally scale the chromatgram....i have some code > from an old viewer that will perform all these functions but doesnt use any > of the biojava classes or swing components.... There's org.biojava.bio.gui.sequence.ABITraceRenderer with demo code in seqviewer.TraceViewer It should give you a start. From n.haigh at sheffield.ac.uk Thu Apr 27 09:48:59 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 14:48:59 +0100 Subject: [Biojava-l] Sun One Studio+Biojava Message-ID: <002301c66a01$5637d910$9f5ea78f@bmbpc196> I?m totally new to Java and Biojava as I'm trying to defect from Bioperl! I'm trying to use Sun One Studio for editing my java files - at least initially. I don't know how to setup Sun One Studio to find my biojava-1.4.jar file, I'm not even sure how to test if it can find it correctly. Any help on these issues would be gratefully received. As I said I'm a newbie - bear with me! Cheers Nathan ---------------------------------------------------------------------------- ------ Dr. Nathan S. Haigh Bioinformatics PostDoctoral Research Associate ? Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22 20112 Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533 569 University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22 20002 Western Bank???????????????????????????? ?????? ?????? Web: www.bioinf.shef.ac.uk Sheffield??????????????????????????????? ?????? www.petraea.shef.ac.uk S10 2TN????????????????????????????????? ?????? ---------------------------------------------------------------------------- ------ --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 14:48:56 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From richard.holland at ebi.ac.uk Thu Apr 27 10:51:23 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Apr 2006 15:51:23 +0100 Subject: [Biojava-l] Sun One Studio+Biojava In-Reply-To: <002301c66a01$5637d910$9f5ea78f@bmbpc196> References: <002301c66a01$5637d910$9f5ea78f@bmbpc196> Message-ID: <1146149483.3955.7.camel@texas.ebi.ac.uk> Sun One Studio is built on NetBeans, which is what I use to develop bits of BioJava with, so I think what works for me should work for you. Here goes...: If you are working with BioJava in apps you are developing yourself, you need to set up BioJava as a library in NetBeans. Do this by going to the Library Manager (Tools menu), creating a new library called BioJava, then using the buttons provided to locate and add the biojava-1.4.jar file to the library. You can then associate this library with any project you are working on by right-clicking on that project, choosing Properties, then click on Libraries in the tree on the left of the window that appears and use this to add the BioJava library. If you are intending to develop BioJava itself, you need to check out the entire biojava-live project from CVS. You can then set up development in NetBeans by creating a "new project from existing Ant script", and telling it where the build.xml file can be found within the BioJava project. It'll do the rest for you. Hope this helps. cheers, Richard On Thu, 2006-04-27 at 14:48 +0100, Nathan S. Haigh wrote: > I?m totally new to Java and Biojava as I'm trying to defect from Bioperl! > I'm trying to use Sun One Studio for editing my java files - at least > initially. I don't know how to setup Sun One Studio to find my > biojava-1.4.jar file, I'm not even sure how to test if it can find it > correctly. Any help on these issues would be gratefully received. As I said > I'm a newbie - bear with me! > > Cheers > Nathan > > ---------------------------------------------------------------------------- > ------ > Dr. Nathan S. Haigh > Bioinformatics PostDoctoral Research Associate > > Room B2 211 Tel: +44 (0)114 22 > 20112 > Department of Animal and Plant Sciences Mob: +44 (0)7742 533 > 569 > University of Sheffield Fax: +44 (0)114 22 > 20002 > Western Bank Web: > www.bioinf.shef.ac.uk > Sheffield > www.petraea.shef.ac.uk > S10 2TN > ---------------------------------------------------------------------------- > ------ > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0615-2, 12/04/2006 > Tested on: 27/04/2006 14:48:56 > avast! - copyright (c) 1988-2006 ALWIL Software. > http://www.avast.com > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From n.haigh at sheffield.ac.uk Thu Apr 27 11:01:56 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 16:01:56 +0100 Subject: [Biojava-l] Sun One Studio+Biojava In-Reply-To: <1146149483.3955.7.camel@texas.ebi.ac.uk> Message-ID: <003601c66a0b$86b289f0$9f5ea78f@bmbpc196> Thanks for the info - the fog is starting to lift! :o) I think I'll leave actual Biojava development for now - see how I go with actually learning Java first :o) I have a steep learning curve, as I have an application written in Perl which I use Bioperl modules and Perl/Tk for the GUI. So I'm trying to rewrite this application in Java while trying to think about OO programming.....i'm sure I'll send some really simple questions to the list over the coming weeks/months, but hopefully there won't be too many nightmares along the way! Thanks Nathan > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 27 April 2006 15:51 > To: n.haigh at sheffield.ac.uk > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] Sun One Studio+Biojava > > Sun One Studio is built on NetBeans, which is what I use to develop bits > of BioJava with, so I think what works for me should work for you. Here > goes...: > > If you are working with BioJava in apps you are developing yourself, you > need to set up BioJava as a library in NetBeans. Do this by going to the > Library Manager (Tools menu), creating a new library called BioJava, > then using the buttons provided to locate and add the biojava-1.4.jar > file to the library. You can then associate this library with any > project you are working on by right-clicking on that project, choosing > Properties, then click on Libraries in the tree on the left of the > window that appears and use this to add the BioJava library. > > If you are intending to develop BioJava itself, you need to check out > the entire biojava-live project from CVS. You can then set up > development in NetBeans by creating a "new project from existing Ant > script", and telling it where the build.xml file can be found within the > BioJava project. It'll do the rest for you. > > Hope this helps. > > cheers, > Richard > > On Thu, 2006-04-27 at 14:48 +0100, Nathan S. Haigh wrote: > > I'm totally new to Java and Biojava as I'm trying to defect from > Bioperl! > > I'm trying to use Sun One Studio for editing my java files - at least > > initially. I don't know how to setup Sun One Studio to find my > > biojava-1.4.jar file, I'm not even sure how to test if it can find it > > correctly. Any help on these issues would be gratefully received. As I > said > > I'm a newbie - bear with me! > > > > Cheers > > Nathan > > > > ------------------------------------------------------------------------ > ---- > > ------ > > Dr. Nathan S. Haigh > > Bioinformatics PostDoctoral Research Associate > > > > Room B2 211 Tel: +44 (0)114 > 22 > > 20112 > > Department of Animal and Plant Sciences Mob: +44 (0)7742 > 533 > > 569 > > University of Sheffield Fax: +44 (0)114 > 22 > > 20002 > > Western Bank Web: > > www.bioinf.shef.ac.uk > > Sheffield > > www.petraea.shef.ac.uk > > S10 2TN > > ------------------------------------------------------------------------ > ---- > > ------ > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0615-2, 12/04/2006 > > Tested on: 27/04/2006 14:48:56 > > avast! - copyright (c) 1988-2006 ALWIL Software. > > http://www.avast.com > > > > > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 16:00:23 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From n.haigh at sheffield.ac.uk Thu Apr 27 11:12:34 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 16:12:34 +0100 Subject: [Biojava-l] Creating my own classes Message-ID: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196> I?m trying to learn/think about OO programming as I?m learning Java and port a Perl app into Java ? could you tell me if this sounds reasonable for writing some of my own classes!? My application essentially defines sets of positions from an alignment - I call them CHARSETs as they are analogous to CHARSETs in the Nexus file format. I believe in Biojava the Locations object/interface (sorry, not familiar enough with correct terminology yet) is essentially the same sort of thing. In my app, the user can use several approaches to define a CHARSET e.g. a CHARSET containing just invariable sites, or a CHARSET containing sites above a given % identity. My question is this, if I were to create a class called Charset, and I create several subclasses called e.g. Invariable etc is this reasonable? Or should the class Charset contain many methods for creating a different type of CHARSET? In my app, a CHARSET needs to be associated with a particular alignment, and settings used to define the CHARSET, so my Charset class have variables such as an Alignment object, Locations objects etc. I?d like to write a method that returns a subalignment based on the CHARSETs associated alignment object and Locations object but I?m not sure how to do this. Thanks for any help/comments/corrections/critiques Nathan ---------------------------------------------------------------------------- ------ Dr. Nathan S. Haigh Bioinformatics PostDoctoral Research Associate ? Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22 20112 Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533 569 University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22 20002 Western Bank???????????????????????????? ?????? ?????? Web: www.bioinf.shef.ac.uk Sheffield??????????????????????????????? ?????? www.petraea.shef.ac.uk S10 2TN????????????????????????????????? ?????? ---------------------------------------------------------------------------- ------ --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 16:12:34 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From richard.holland at ebi.ac.uk Thu Apr 27 11:36:51 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Apr 2006 16:36:51 +0100 Subject: [Biojava-l] Creating my own classes In-Reply-To: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196> References: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196> Message-ID: <1146152212.3955.24.camel@texas.ebi.ac.uk> On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > My application essentially defines sets of positions from an alignment - I > call them CHARSETs as they are analogous to CHARSETs in the Nexus file > format. I believe in Biojava the Locations object/interface (sorry, not > familiar enough with correct terminology yet) is essentially the same sort > of thing. In my app, the user can use several approaches to define a CHARSET > e.g. a CHARSET containing just invariable sites, or a CHARSET containing > sites above a given % identity. You'd be right there. A Location in BioJava represents a range of positions. > My question is this, if I were to create a class called Charset, and I > create several subclasses called e.g. Invariable etc is this reasonable? Or > should the class Charset contain many methods for creating a different type > of CHARSET? My suggestion would be create an interface called Charset, which defines behaviour which you expect all types of Charset to exhibit. Then, implement a number of classes which implement this interface, one for each type of Charset you have, which each add their own methods or special behaviour. If a lot of the behaviour is common, you can define an abstract class called something like AbstractCharset which defines this common behaviour, and have the others extend it. > In my app, a CHARSET needs to be associated with a particular alignment, and > settings used to define the CHARSET, so my Charset class have variables such > as an Alignment object, Locations objects etc. I?d like to write a method > that returns a subalignment based on the CHARSETs associated alignment > object and Locations object but I?m not sure how to do this. BioJava Alignment objects implement the SymbolList interface, which means you can use all the methods from SymbolList to work with the Alignment, including the subList() method. cheers, Richard -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From n.haigh at sheffield.ac.uk Thu Apr 27 11:44:05 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 16:44:05 +0100 Subject: [Biojava-l] Creating my own classes In-Reply-To: <1146152212.3955.24.camel@texas.ebi.ac.uk> Message-ID: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196> Thanks Richard, I'll think about this and try to do some deciphering. The only thing I'm in need of help for is possibly some actual code that would take an Alignment object and return a subalignment based on the positions specified in a Locations object - it's difficult to make sense of a new language until you start to pick up some of the basics. Thanks Nathan > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 27 April 2006 16:37 > To: n.haigh at sheffield.ac.uk > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] Creating my own classes > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > > My application essentially defines sets of positions from an alignment - > I > > call them CHARSETs as they are analogous to CHARSETs in the Nexus file > > format. I believe in Biojava the Locations object/interface (sorry, not > > familiar enough with correct terminology yet) is essentially the same > sort > > of thing. In my app, the user can use several approaches to define a > CHARSET > > e.g. a CHARSET containing just invariable sites, or a CHARSET containing > > sites above a given % identity. > > You'd be right there. A Location in BioJava represents a range of > positions. > > > My question is this, if I were to create a class called Charset, and I > > create several subclasses called e.g. Invariable etc is this reasonable? > Or > > should the class Charset contain many methods for creating a different > type > > of CHARSET? > > My suggestion would be create an interface called Charset, which defines > behaviour which you expect all types of Charset to exhibit. Then, > implement a number of classes which implement this interface, one for > each type of Charset you have, which each add their own methods or > special behaviour. If a lot of the behaviour is common, you can define > an abstract class called something like AbstractCharset which defines > this common behaviour, and have the others extend it. > > > In my app, a CHARSET needs to be associated with a particular alignment, > and > > settings used to define the CHARSET, so my Charset class have variables > such > > as an Alignment object, Locations objects etc. I'd like to write a > method > > that returns a subalignment based on the CHARSETs associated alignment > > object and Locations object but I'm not sure how to do this. > > BioJava Alignment objects implement the SymbolList interface, which > means you can use all the methods from SymbolList to work with the > Alignment, including the subList() method. > > cheers, > Richard > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 16:44:04 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From richard.holland at ebi.ac.uk Thu Apr 27 11:55:39 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Apr 2006 16:55:39 +0100 Subject: [Biojava-l] Creating my own classes In-Reply-To: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196> References: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196> Message-ID: <1146153339.3955.30.camel@texas.ebi.ac.uk> Given some existing Location object (let's called it 'loc'), and an existing Alignment (hypothetically called 'algn'), you can do this: // Obtain the labels of all the sequences in the alignment. Set labels = new HashSet(); labels.addAll(algn.getLabels()); // Obtain a sub-alignment including all the sequences in the // original alignment. Alignment subAlignment = algn.subAlignment(labels, loc); cheers, Richard On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote: > Thanks Richard, > > I'll think about this and try to do some deciphering. The only thing I'm in > need of help for is possibly some actual code that would take an Alignment > object and return a subalignment based on the positions specified in a > Locations object - it's difficult to make sense of a new language until you > start to pick up some of the basics. > > Thanks > Nathan > > > -----Original Message----- > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > > Sent: 27 April 2006 16:37 > > To: n.haigh at sheffield.ac.uk > > Cc: biojava-l at lists.open-bio.org > > Subject: Re: [Biojava-l] Creating my own classes > > > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > > > My application essentially defines sets of positions from an alignment - > > I > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus file > > > format. I believe in Biojava the Locations object/interface (sorry, not > > > familiar enough with correct terminology yet) is essentially the same > > sort > > > of thing. In my app, the user can use several approaches to define a > > CHARSET > > > e.g. a CHARSET containing just invariable sites, or a CHARSET containing > > > sites above a given % identity. > > > > You'd be right there. A Location in BioJava represents a range of > > positions. > > > > > My question is this, if I were to create a class called Charset, and I > > > create several subclasses called e.g. Invariable etc is this reasonable? > > Or > > > should the class Charset contain many methods for creating a different > > type > > > of CHARSET? > > > > My suggestion would be create an interface called Charset, which defines > > behaviour which you expect all types of Charset to exhibit. Then, > > implement a number of classes which implement this interface, one for > > each type of Charset you have, which each add their own methods or > > special behaviour. If a lot of the behaviour is common, you can define > > an abstract class called something like AbstractCharset which defines > > this common behaviour, and have the others extend it. > > > > > In my app, a CHARSET needs to be associated with a particular alignment, > > and > > > settings used to define the CHARSET, so my Charset class have variables > > such > > > as an Alignment object, Locations objects etc. I'd like to write a > > method > > > that returns a subalignment based on the CHARSETs associated alignment > > > object and Locations object but I'm not sure how to do this. > > > > BioJava Alignment objects implement the SymbolList interface, which > > means you can use all the methods from SymbolList to work with the > > Alignment, including the subList() method. > > > > cheers, > > Richard > > > > -- > > Richard Holland (BioMart Team) > > EMBL-EBI > > Wellcome Trust Genome Campus > > Hinxton > > Cambridge CB10 1SD > > UNITED KINGDOM > > Tel: +44-(0)1223-494416 > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0615-2, 12/04/2006 > Tested on: 27/04/2006 16:44:04 > avast! - copyright (c) 1988-2006 ALWIL Software. > http://www.avast.com > > > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From n.haigh at sheffield.ac.uk Thu Apr 27 12:00:09 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 17:00:09 +0100 Subject: [Biojava-l] Creating my own classes In-Reply-To: <1146153339.3955.30.camel@texas.ebi.ac.uk> Message-ID: <000d01c66a13$a8b51380$9f5ea78f@bmbpc196> Fantastic stuff - again, I'll look into this over the coming weeks (I actually have annual leave for a week, so my flurry of e-mail will have to stop for now. Thanks again! Nathan > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 27 April 2006 16:56 > To: n.haigh at sheffield.ac.uk > Cc: biojava-l at lists.open-bio.org > Subject: RE: [Biojava-l] Creating my own classes > > Given some existing Location object (let's called it 'loc'), and an > existing Alignment (hypothetically called 'algn'), you can do this: > > // Obtain the labels of all the sequences in the alignment. > Set labels = new HashSet(); > labels.addAll(algn.getLabels()); > // Obtain a sub-alignment including all the sequences in the > // original alignment. > Alignment subAlignment = algn.subAlignment(labels, loc); > > cheers, > Richard > > > On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote: > > Thanks Richard, > > > > I'll think about this and try to do some deciphering. The only thing I'm > in > > need of help for is possibly some actual code that would take an > Alignment > > object and return a subalignment based on the positions specified in a > > Locations object - it's difficult to make sense of a new language until > you > > start to pick up some of the basics. > > > > Thanks > > Nathan > > > > > -----Original Message----- > > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > > > Sent: 27 April 2006 16:37 > > > To: n.haigh at sheffield.ac.uk > > > Cc: biojava-l at lists.open-bio.org > > > Subject: Re: [Biojava-l] Creating my own classes > > > > > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > > > > My application essentially defines sets of positions from an > alignment - > > > I > > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus > file > > > > format. I believe in Biojava the Locations object/interface (sorry, > not > > > > familiar enough with correct terminology yet) is essentially the > same > > > sort > > > > of thing. In my app, the user can use several approaches to define a > > > CHARSET > > > > e.g. a CHARSET containing just invariable sites, or a CHARSET > containing > > > > sites above a given % identity. > > > > > > You'd be right there. A Location in BioJava represents a range of > > > positions. > > > > > > > My question is this, if I were to create a class called Charset, and > I > > > > create several subclasses called e.g. Invariable etc is this > reasonable? > > > Or > > > > should the class Charset contain many methods for creating a > different > > > type > > > > of CHARSET? > > > > > > My suggestion would be create an interface called Charset, which > defines > > > behaviour which you expect all types of Charset to exhibit. Then, > > > implement a number of classes which implement this interface, one for > > > each type of Charset you have, which each add their own methods or > > > special behaviour. If a lot of the behaviour is common, you can define > > > an abstract class called something like AbstractCharset which defines > > > this common behaviour, and have the others extend it. > > > > > > > In my app, a CHARSET needs to be associated with a particular > alignment, > > > and > > > > settings used to define the CHARSET, so my Charset class have > variables > > > such > > > > as an Alignment object, Locations objects etc. I'd like to write a > > > method > > > > that returns a subalignment based on the CHARSETs associated > alignment > > > > object and Locations object but I'm not sure how to do this. > > > > > > BioJava Alignment objects implement the SymbolList interface, which > > > means you can use all the methods from SymbolList to work with the > > > Alignment, including the subList() method. > > > > > > cheers, > > > Richard > > > > > > -- > > > Richard Holland (BioMart Team) > > > EMBL-EBI > > > Wellcome Trust Genome Campus > > > Hinxton > > > Cambridge CB10 1SD > > > UNITED KINGDOM > > > Tel: +44-(0)1223-494416 > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0615-2, 12/04/2006 > > Tested on: 27/04/2006 16:44:04 > > avast! - copyright (c) 1988-2006 ALWIL Software. > > http://www.avast.com > > > > > > > > > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 17:00:06 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From david at autohandle.com Thu Apr 27 13:10:08 2006 From: david at autohandle.com (David Scott) Date: Thu, 27 Apr 2006 10:10:08 -0700 Subject: [Biojava-l] hibernate-xml mapping Message-ID: <4450FAF0.9070206@autohandle.com> what is the xml mapping in the hibernate files based on? From mark.schreiber at novartis.com Thu Apr 27 22:05:44 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 28 Apr 2006 10:05:44 +0800 Subject: [Biojava-l] Creating my own classes Message-ID: An excellent book on OO and Java is Thinking in Java by Bruce Eckell. If you come from a C or Perl background it will change the way you think about programming. You can get online versions for free, most good bookstores have hardcopies as well. - Mark "Nathan S. Haigh" Sent by: biojava-l-bounces at lists.open-bio.org 04/28/2006 12:00 AM Please respond to n.haigh To: "'Richard Holland'" cc: biojava-l at lists.open-bio.org, (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Creating my own classes Fantastic stuff - again, I'll look into this over the coming weeks (I actually have annual leave for a week, so my flurry of e-mail will have to stop for now. Thanks again! Nathan > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 27 April 2006 16:56 > To: n.haigh at sheffield.ac.uk > Cc: biojava-l at lists.open-bio.org > Subject: RE: [Biojava-l] Creating my own classes > > Given some existing Location object (let's called it 'loc'), and an > existing Alignment (hypothetically called 'algn'), you can do this: > > // Obtain the labels of all the sequences in the alignment. > Set labels = new HashSet(); > labels.addAll(algn.getLabels()); > // Obtain a sub-alignment including all the sequences in the > // original alignment. > Alignment subAlignment = algn.subAlignment(labels, loc); > > cheers, > Richard > > > On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote: > > Thanks Richard, > > > > I'll think about this and try to do some deciphering. The only thing I'm > in > > need of help for is possibly some actual code that would take an > Alignment > > object and return a subalignment based on the positions specified in a > > Locations object - it's difficult to make sense of a new language until > you > > start to pick up some of the basics. > > > > Thanks > > Nathan > > > > > -----Original Message----- > > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > > > Sent: 27 April 2006 16:37 > > > To: n.haigh at sheffield.ac.uk > > > Cc: biojava-l at lists.open-bio.org > > > Subject: Re: [Biojava-l] Creating my own classes > > > > > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > > > > My application essentially defines sets of positions from an > alignment - > > > I > > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus > file > > > > format. I believe in Biojava the Locations object/interface (sorry, > not > > > > familiar enough with correct terminology yet) is essentially the > same > > > sort > > > > of thing. In my app, the user can use several approaches to define a > > > CHARSET > > > > e.g. a CHARSET containing just invariable sites, or a CHARSET > containing > > > > sites above a given % identity. > > > > > > You'd be right there. A Location in BioJava represents a range of > > > positions. > > > > > > > My question is this, if I were to create a class called Charset, and > I > > > > create several subclasses called e.g. Invariable etc is this > reasonable? > > > Or > > > > should the class Charset contain many methods for creating a > different > > > type > > > > of CHARSET? > > > > > > My suggestion would be create an interface called Charset, which > defines > > > behaviour which you expect all types of Charset to exhibit. Then, > > > implement a number of classes which implement this interface, one for > > > each type of Charset you have, which each add their own methods or > > > special behaviour. If a lot of the behaviour is common, you can define > > > an abstract class called something like AbstractCharset which defines > > > this common behaviour, and have the others extend it. > > > > > > > In my app, a CHARSET needs to be associated with a particular > alignment, > > > and > > > > settings used to define the CHARSET, so my Charset class have > variables > > > such > > > > as an Alignment object, Locations objects etc. I'd like to write a > > > method > > > > that returns a subalignment based on the CHARSETs associated > alignment > > > > object and Locations object but I'm not sure how to do this. > > > > > > BioJava Alignment objects implement the SymbolList interface, which > > > means you can use all the methods from SymbolList to work with the > > > Alignment, including the subList() method. > > > > > > cheers, > > > Richard > > > > > > -- > > > Richard Holland (BioMart Team) > > > EMBL-EBI > > > Wellcome Trust Genome Campus > > > Hinxton > > > Cambridge CB10 1SD > > > UNITED KINGDOM > > > Tel: +44-(0)1223-494416 > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0615-2, 12/04/2006 > > Tested on: 27/04/2006 16:44:04 > > avast! - copyright (c) 1988-2006 ALWIL Software. > > http://www.avast.com > > > > > > > > > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 17:00:06 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Thu Apr 27 22:06:31 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 28 Apr 2006 10:06:31 +0800 Subject: [Biojava-l] hibernate-xml mapping Message-ID: It is based on the BioSQL schema - Mark David Scott Sent by: biojava-l-bounces at lists.open-bio.org 04/28/2006 01:10 AM To: Biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] hibernate-xml mapping what is the xml mapping in the hibernate files based on? _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From ilhami.visne at gmail.com Fri Apr 28 05:09:56 2006 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Fri, 28 Apr 2006 11:09:56 +0200 Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi Message-ID: i got a file in fasta format, which is not encoded in ansi. but it seems ok. it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta i tried to read it with SeqIOTools.readFastaDNA and this exception was thrown: org.biojava.bio.BioException: Could not read sequence at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java :104) .............. .............. Caused by: java.io.IOException: Stream does not appear to contain FASTA formatted data: ??> org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112) at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:101) "??>" there is no row like this but it seems it is hidden. How should i handle such files? thax in advance. From richard.holland at ebi.ac.uk Fri Apr 28 06:37:35 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 28 Apr 2006 11:37:35 +0100 Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi In-Reply-To: References: Message-ID: <1146220656.3955.46.camel@texas.ebi.ac.uk> I've no idea what binary format that file is in - it contains some very strange characters. It appears to contain _some_ ANSI data but with extra binary bits added to the start and end. I think you need to check the program that generated the file as it is obviously not doing what it is supposed to. Your best bet is to convert the file to ANSI or some other format understood out-of-the-box by Java. cheers, Richard On Fri, 2006-04-28 at 11:09 +0200, Ilhami Visne wrote: > i got a file in fasta format, which is not encoded in ansi. but it seems ok. > it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta > i tried to read it with SeqIOTools.readFastaDNA and this exception was > thrown: > > org.biojava.bio.BioException: Could not read sequence > at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java > :104) > .............. > .............. > Caused by: java.io.IOException: Stream does not appear to contain FASTA > formatted data: ??> > org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112) > at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:101) > > "??>" there is no row like this but it seems it is hidden. > > How should i handle such files? > > thax in advance. > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From ilhami.visne at gmail.com Fri Apr 28 05:29:07 2006 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Fri, 28 Apr 2006 11:29:07 +0200 Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi Message-ID: i got a file in fasta format, which is not encoded in ansi. but it seems ok. it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta i tried to read it with SeqIOTools.readFastaDNA and this exception was thrown: org.biojava.bio.BioException: Could not read sequence at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java :104) .............. .............. Caused by: java.io.IOException: Stream does not appear to contain FASTA formatted data: ??> org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112) at org.biojava.bio.seq.io.StreamReader.nextSequence (StreamReader.java:101) "??>" there is no row like this but it seems it is hidden. How should i handle such files? thax in advance. From richard.holland at ebi.ac.uk Fri Apr 28 09:19:30 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 28 Apr 2006 14:19:30 +0100 Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi In-Reply-To: References: <1146220656.3955.46.camel@texas.ebi.ac.uk> Message-ID: <1146230371.3955.59.camel@texas.ebi.ac.uk> Thinking about this a bit more, I think you meant ASCII when you said ANSI? FASTA format is very strictly defined. It is a file containing a number sequences each with their own header, which starts with a '>' symbol. You can indeed use any character you like within the header, which ends at the first new-line after the '>' (newline is ASCII 10 or 13, or both, depending on your OS). No whitespace is allowed at the start or end of the file or between or within sequences. The problem with your file is that the unusual characters are appearing at the start of the file before the first header, and maybe also during the sequence itself although I didn't look that closely. Hence it breaks the FASTA format specification. The problem here lies with the program that is generating your FASTA file. BioJava is behaving correctly. cheers, Richard On Fri, 2006-04-28 at 15:00 +0200, Ilhami Visne wrote: > I thought already to convert the file to ANSI. Sequence part must > contain only ansi-chararacters but header or other annotaion must not > contain only ansi characters. if i convert it to ansi, doesn't it may > cause to lose some data? > > On 4/28/06, Richard Holland wrote: > I've no idea what binary format that file is in - it contains > some very > strange characters. It appears to contain _some_ ANSI data but > with > extra binary bits added to the start and end. I think you need > to check > the program that generated the file as it is obviously not > doing what it > is supposed to. > > Your best bet is to convert the file to ANSI or some other > format > understood out-of-the-box by Java. > > cheers, > Richard > > On Fri, 2006-04-28 at 11:09 +0200, Ilhami Visne wrote: > > i got a file in fasta format, which is not encoded in ansi. > but it seems ok. > > it can be downloaded here: > http://stud3.tuwien.ac.at/~e0125935/try3.fasta > > i tried to read it with SeqIOTools.readFastaDNA and this > exception was > > thrown: > > > > org.biojava.bio.BioException: Could not read sequence > > at org.biojava.bio.seq.io.StreamReader.nextSequence > (StreamReader.java > > :104) > > .............. > > .............. > > Caused by: java.io.IOException: Stream does not appear to > contain FASTA > > formatted data: ??> > > org.biojava.bio.seq.io.FastaFormat.readSequence > (FastaFormat.java:112) > > at org.biojava.bio.seq.io.StreamReader.nextSequence > (StreamReader.java:101) > > > > "??>" there is no row like this but it seems it is hidden. > > > > How should i handle such files? > > > > thax in advance. > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From anderson.moura at telemar-rj.com.br Mon Apr 3 14:09:23 2006 From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva) Date: Mon, 3 Apr 2006 11:09:23 -0300 Subject: [Biojava-l] Get a sequence from internet Message-ID: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net> Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava? Can anybody help? Thanks, Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a. From anderson.moura at telemar-rj.com.br Mon Apr 3 15:54:01 2006 From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva) Date: Mon, 3 Apr 2006 12:54:01 -0300 Subject: [Biojava-l] RES: Get a sequence from internet Message-ID: <3C39C09ED334F243838953854BE43FB6025C7F40@MAILBX02.telemar.corp.net> Nice!! It work only with the sequence ID? Can I search by the name of the sequence? Thanks a lot! -----Mensagem original----- De: Dickson S. Guedes [mailto:guedes at unisul.br] Enviada em: segunda-feira, 3 de abril de 2006 12:10 Para: Anderson Moura da Silva Cc: biojava-l at lists.open-bio.org Assunto: Re: [Biojava-l] Get a sequence from internet Yes, Hi Anderson, You can use the NCBISequenceDB: (...) NCBISequenceDB ncbiDB = new NCBISequenceDB(); Sequence sequenceFromGenbank = ncbiDB.getSequence("sequence_id"); System.out.println(sequenceFromGenbank.getName()); (...) Change "sequence_id" for a ID from Genbank. :) Anderson Moura da Silva escreveu: > Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava? > > Can anybody help? > > Thanks, > > > Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a. > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Dickson S. Guedes /* * UNISUL - Universidade do Sul de Santa Catarina * ATI - Assessoria de Tecnologia da Informa??o * (0xx48) 621-3200 - http://www.unisul.br * * "Quis custodiet ipsos custodes?" */ From guedes at unisul.br Mon Apr 3 15:09:43 2006 From: guedes at unisul.br (Dickson S. Guedes) Date: Mon, 03 Apr 2006 12:09:43 -0300 Subject: [Biojava-l] Get a sequence from internet In-Reply-To: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net> References: <3C39C09ED334F243838953854BE43FB6025A1F01@MAILBX02.telemar.corp.net> Message-ID: <44313AB7.7080309@unisul.br> Yes, Hi Anderson, You can use the NCBISequenceDB: (...) NCBISequenceDB ncbiDB = new NCBISequenceDB(); Sequence sequenceFromGenbank = ncbiDB.getSequence("sequence_id"); System.out.println(sequenceFromGenbank.getName()); (...) Change "sequence_id" for a ID from Genbank. :) Anderson Moura da Silva escreveu: > Is possible to get a sequence from one of the sources on the internet, like from the Swissprot, using biojava? > > Can anybody help? > > Thanks, > > > Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a. > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Dickson S. Guedes /* * UNISUL - Universidade do Sul de Santa Catarina * ATI - Assessoria de Tecnologia da Informa??o * (0xx48) 621-3200 - http://www.unisul.br * * "Quis custodiet ipsos custodes?" */ From wendy.wong at gmail.com Tue Apr 4 18:22:00 2006 From: wendy.wong at gmail.com (wendy wong) Date: Tue, 4 Apr 2006 19:22:00 +0100 Subject: [Biojava-l] unsupervised training of transition weights In-Reply-To: <200603311805.25861.matthew.pocock@ncl.ac.uk> References: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> <200603311805.25861.matthew.pocock@ncl.ac.uk> Message-ID: Thanks for your advice! I am able to train a subset of transition probabilities now! I found something strange, first I changed my emission distributions to untrainabledistributions and the trainer didn't seem to be doing anything, all cycles have the same score. I then changed it back to SimpleDistribution (still keepting my getWeightImp in my custom distribution). this time it works and it doesn't seem to be modifying my emission probabilities. So it works for me - I am just curious if it is a bug or if I was doing something wrong? Thanks again! wendy On 3/31/06, Matthew Pocock wrote: > > The DP code does some caching of probabilities, I don't think there's > > any way to turn this off without modifying the DP implementations. > > > > Thomas. > > My reccolection is that if you did turn this off, the algorithm would run > very, very much more slowly. Internally to the DP objects, the distribution > probabilities (in fact, they aren't even probabilities by this stage) are > stored in a data-structure optimized for the type of lookups performed during > the dynamic programming recursions. > > Matthew > From wendy.wong at gmail.com Tue Apr 4 18:22:00 2006 From: wendy.wong at gmail.com (wendy wong) Date: Tue, 4 Apr 2006 19:22:00 +0100 Subject: [Biojava-l] unsupervised training of transition weights In-Reply-To: <200603311805.25861.matthew.pocock@ncl.ac.uk> References: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> <200603311805.25861.matthew.pocock@ncl.ac.uk> Message-ID: Thanks for your advice! I am able to train a subset of transition probabilities now! I found something strange, first I changed my emission distributions to untrainabledistributions and the trainer didn't seem to be doing anything, all cycles have the same score. I then changed it back to SimpleDistribution (still keepting my getWeightImp in my custom distribution). this time it works and it doesn't seem to be modifying my emission probabilities. So it works for me - I am just curious if it is a bug or if I was doing something wrong? Thanks again! wendy On 3/31/06, Matthew Pocock wrote: > > The DP code does some caching of probabilities, I don't think there's > > any way to turn this off without modifying the DP implementations. > > > > Thomas. > > My reccolection is that if you did turn this off, the algorithm would run > very, very much more slowly. Internally to the DP objects, the distribution > probabilities (in fact, they aren't even probabilities by this stage) are > stored in a data-structure optimized for the type of lookups performed during > the dynamic programming recursions. > > Matthew > From mthomasc at vub.ac.be Fri Apr 7 09:20:33 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Fri, 07 Apr 2006 11:20:33 +0200 Subject: [Biojava-l] [biojavax] EMBL parser error Message-ID: <44362EE1.5060804@vub.ac.be> Hello, I am currently using biojavax that I checked out today from CVS to parse an EMBL file, exported from EBI SRS server. I ran into this error : Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) at org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) ... 1 more The EMBL file is : ID DQ158013 standard; genomic DNA; VRT; 118 BP. XX AC DQ158013; XX SV DQ158013.1 XX DT 19-JAN-2006 (Rel. 86, Created) DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) XX DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. Removing the two lines that comprise the date information resolves the problem. Thanks, Morgane. -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From richard.holland at ebi.ac.uk Fri Apr 7 09:56:57 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 07 Apr 2006 10:56:57 +0100 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <44362EE1.5060804@vub.ac.be> References: <44362EE1.5060804@vub.ac.be> Message-ID: <1144403817.3958.30.camel@texas.ebi.ac.uk> That was indeed a bug. I have made a change to the date parsing in EMBLFormat and committed it to CVS. Could you test it for me please? cheers, Richard On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: > Hello, > > I am currently using biojavax that I checked out today from CVS to parse > an EMBL file, exported from EBI SRS server. > > I ran into this error : > > Exception in thread "main" org.biojava.bio.BioException: Could not read > sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > at > org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 > at > org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > ... 1 more > > The EMBL file is : > > ID DQ158013 standard; genomic DNA; VRT; 118 BP. > XX > AC DQ158013; > XX > SV DQ158013.1 > XX > DT 19-JAN-2006 (Rel. 86, Created) > DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > XX > DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > > Removing the two lines that comprise the date information resolves the > problem. > > Thanks, > > Morgane. > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From mthomasc at vub.ac.be Fri Apr 7 12:18:36 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Fri, 07 Apr 2006 14:18:36 +0200 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <1144403817.3958.30.camel@texas.ebi.ac.uk> References: <44362EE1.5060804@vub.ac.be> <1144403817.3958.30.camel@texas.ebi.ac.uk> Message-ID: <4436589C.8010501@vub.ac.be> I now get another error message with the same file : Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) at org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) Caused by: java.lang.IndexOutOfBoundsException: No group 5 at java.util.regex.Matcher.group(Matcher.java:355) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) ... 1 more Here is the complete file, for info: ID DQ158013 standard; genomic DNA; VRT; 118 BP. XX AC DQ158013; XX SV DQ158013.1 XX DT 19-JAN-2006 (Rel. 86, Created) DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) XX DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. XX KW . XX OS Triturus helveticus (palmate newt) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Amphibia; OC Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus. XX RN [1] RP 1-118 RX DOI; 10.1016/j.ympev.2005.08.012. RX PUBMED; 16198128. RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; RT "A PCR survey for posterior Hox genes in amphibians"; RL Mol. Phylogenet. Evol. 38(2):449-458(2006). XX RN [2] RP 1-118 RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; RT ; RL Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases. RL Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, RL Belgium XX FH Key Location/Qualifiers FH FT source 1..118 FT /organism="Triturus helveticus" FT /mol_type="genomic DNA" FT /clone="Thel.b9" FT /db_xref="taxon:256425" FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" FT mRNA <1..>118 FT /gene="Hoxb9" FT /product="HOXB9" FT CDS <1..>118 FT /codon_start=2 FT /gene="Hoxb9" FT /product="HOXB9" FT /db_xref="UniProtKB/TrEMBL:Q2LK47" FT /protein_id="ABA39736.1" FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" XX SQ Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other; caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc tcacccggga 60 ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca agatctgg 118 // Thanks for helping, Morgane. Richard Holland wrote: >That was indeed a bug. I have made a change to the date parsing in >EMBLFormat and committed it to CVS. Could you test it for me please? > >cheers, >Richard > >On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: > > >>Hello, >> >>I am currently using biojavax that I checked out today from CVS to parse >>an EMBL file, exported from EBI SRS server. >> >>I ran into this error : >> >>Exception in thread "main" org.biojava.bio.BioException: Could not read >>sequence >> at >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) >> at >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) >>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 >> at >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) >> at >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) >> ... 1 more >> >>The EMBL file is : >> >>ID DQ158013 standard; genomic DNA; VRT; 118 BP. >>XX >>AC DQ158013; >>XX >>SV DQ158013.1 >>XX >>DT 19-JAN-2006 (Rel. 86, Created) >>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) >>XX >>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. >> >>Removing the two lines that comprise the date information resolves the >>problem. >> >>Thanks, >> >>Morgane. >> >> >> -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From richard.holland at ebi.ac.uk Fri Apr 7 12:48:46 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 07 Apr 2006 13:48:46 +0100 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <4436589C.8010501@vub.ac.be> References: <44362EE1.5060804@vub.ac.be> <1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be> Message-ID: <1144414126.3958.32.camel@texas.ebi.ac.uk> Sorry, my bad. An off-by-one error... Check it out again and see if it works now. cheers, Richard PS. I don't have any EMBL files to test with at the moment otherwise I'd check it myself... :) On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote: > I now get another error message with the same file : > > Exception in thread "main" org.biojava.bio.BioException: Could not read > sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > at > org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > Caused by: java.lang.IndexOutOfBoundsException: No group 5 > at java.util.regex.Matcher.group(Matcher.java:355) > at > org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > ... 1 more > > Here is the complete file, for info: > > ID DQ158013 standard; genomic DNA; VRT; 118 BP. > XX > AC DQ158013; > XX > SV DQ158013.1 > XX > DT 19-JAN-2006 (Rel. 86, Created) > DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > XX > DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > XX > KW . > XX > OS Triturus helveticus (palmate newt) > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > Amphibia; > OC Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus. > XX > RN [1] > RP 1-118 > RX DOI; 10.1016/j.ympev.2005.08.012. > RX PUBMED; 16198128. > RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; > RT "A PCR survey for posterior Hox genes in amphibians"; > RL Mol. Phylogenet. Evol. 38(2):449-458(2006). > XX > RN [2] > RP 1-118 > RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; > RT ; > RL Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases. > RL Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, > Brussels 1050, > RL Belgium > XX > FH Key Location/Qualifiers > FH > FT source 1..118 > FT /organism="Triturus helveticus" > FT /mol_type="genomic DNA" > FT /clone="Thel.b9" > FT /db_xref="taxon:256425" > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > FT /codon_start=2 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT /db_xref="UniProtKB/TrEMBL:Q2LK47" > FT /protein_id="ABA39736.1" > FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" > XX > SQ Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other; > caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc > tcacccggga 60 > ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca > agatctgg 118 > // > > Thanks for helping, > > Morgane. > > Richard Holland wrote: > > >That was indeed a bug. I have made a change to the date parsing in > >EMBLFormat and committed it to CVS. Could you test it for me please? > > > >cheers, > >Richard > > > >On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: > > > > > >>Hello, > >> > >>I am currently using biojavax that I checked out today from CVS to parse > >>an EMBL file, exported from EBI SRS server. > >> > >>I ran into this error : > >> > >>Exception in thread "main" org.biojava.bio.BioException: Could not read > >>sequence > >> at > >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > >> at > >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > >>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 > >> at > >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) > >> at > >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > >> ... 1 more > >> > >>The EMBL file is : > >> > >>ID DQ158013 standard; genomic DNA; VRT; 118 BP. > >>XX > >>AC DQ158013; > >>XX > >>SV DQ158013.1 > >>XX > >>DT 19-JAN-2006 (Rel. 86, Created) > >>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > >>XX > >>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > >> > >>Removing the two lines that comprise the date information resolves the > >>problem. > >> > >>Thanks, > >> > >>Morgane. > >> > >> > >> > > -- > ********************************************************** > Morgane THOMAS-CHOLLIER, PHD Student > > Vrije Universiteit Brussels (VUB) > Laboratory of Cell Genetics > Pleinlaan 2 > 1050 Brussels > Belgium > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From richard.holland at ebi.ac.uk Fri Apr 7 13:42:10 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 07 Apr 2006 14:42:10 +0100 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <44366419.4050505@dbm.ulb.ac.be> References: <44362EE1.5060804@vub.ac.be> <1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be> <1144414126.3958.32.camel@texas.ebi.ac.uk> <44366419.4050505@dbm.ulb.ac.be> Message-ID: <1144417330.3958.34.camel@texas.ebi.ac.uk> Hi. Someone else had checked in a change to a different class, but that change was incorrect and didn't compile. It should compile now. cheers, Richard PS. Note to all those who commit changes - PLEASE check your code compiles first before committing it! On Fri, 2006-04-07 at 15:07 +0200, Morgane THOMAS-CHOLLIER wrote: > I tried to checkout biojava-live but it seems I cannot build it anymore. > I get the following error : > > compile-biojava: > [javac] Compiling 1321 source files to > /Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/ant-build/classes/biojava > [javac] > /Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/src/org/biojavax/utils/StringTools.java:97: > exception java.io.IOException is never thrown in body of corresponding > try statement > [javac] } catch (IOException e) { > [javac] ^ > [javac] Note: Some input files use or override a deprecated API. > [javac] Note: Recompile with -deprecation for details. > [javac] 1 error > > I use Mac OS X 10.3.9, java 1.4.2. > > Hope you could help, > > Cheers, > > Morgane. > > > Richard Holland wrote: > > >Sorry, my bad. An off-by-one error... > > > >Check it out again and see if it works now. > > > >cheers, > >Richard > > > >PS. I don't have any EMBL files to test with at the moment otherwise I'd > >check it myself... :) > > > > > >On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote: > > > > > >>I now get another error message with the same file : > >> > >>Exception in thread "main" org.biojava.bio.BioException: Could not read > >>sequence > >> at > >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > >> at > >>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > >>Caused by: java.lang.IndexOutOfBoundsException: No group 5 > >> at java.util.regex.Matcher.group(Matcher.java:355) > >> at > >>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271) > >> at > >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > >> ... 1 more > >> > >>Here is the complete file, for info: > >> > >>ID DQ158013 standard; genomic DNA; VRT; 118 BP. > >>XX > >>AC DQ158013; > >>XX > >>SV DQ158013.1 > >>XX > >>DT 19-JAN-2006 (Rel. 86, Created) > >>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > >>XX > >>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > >>XX > >>KW . > >>XX > >>OS Triturus helveticus (palmate newt) > >>OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > >>Amphibia; > >>OC Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus. > >>XX > >>RN [1] > >>RP 1-118 > >>RX DOI; 10.1016/j.ympev.2005.08.012. > >>RX PUBMED; 16198128. > >>RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; > >>RT "A PCR survey for posterior Hox genes in amphibians"; > >>RL Mol. Phylogenet. Evol. 38(2):449-458(2006). > >>XX > >>RN [2] > >>RP 1-118 > >>RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; > >>RT ; > >>RL Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases. > >>RL Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, > >>Brussels 1050, > >>RL Belgium > >>XX > >>FH Key Location/Qualifiers > >>FH > >>FT source 1..118 > >>FT /organism="Triturus helveticus" > >>FT /mol_type="genomic DNA" > >>FT /clone="Thel.b9" > >>FT /db_xref="taxon:256425" > >>FT gene <1..>118 > >>FT /gene="Hoxb9" > >>FT /note="Hoxb-9" > >>FT mRNA <1..>118 > >>FT /gene="Hoxb9" > >>FT /product="HOXB9" > >>FT CDS <1..>118 > >>FT /codon_start=2 > >>FT /gene="Hoxb9" > >>FT /product="HOXB9" > >>FT /db_xref="UniProtKB/TrEMBL:Q2LK47" > >>FT /protein_id="ABA39736.1" > >>FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" > >>XX > >>SQ Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other; > >> caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc > >>tcacccggga 60 > >> ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca > >>agatctgg 118 > >>// > >> > >>Thanks for helping, > >> > >>Morgane. > >> > >>Richard Holland wrote: > >> > >> > >> > >>>That was indeed a bug. I have made a change to the date parsing in > >>>EMBLFormat and committed it to CVS. Could you test it for me please? > >>> > >>>cheers, > >>>Richard > >>> > >>>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: > >>> > >>> > >>> > >>> > >>>>Hello, > >>>> > >>>>I am currently using biojavax that I checked out today from CVS to parse > >>>>an EMBL file, exported from EBI SRS server. > >>>> > >>>>I ran into this error : > >>>> > >>>>Exception in thread "main" org.biojava.bio.BioException: Could not read > >>>>sequence > >>>> at > >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > >>>> at > >>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) > >>>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 > >>>> at > >>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) > >>>> at > >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > >>>> ... 1 more > >>>> > >>>>The EMBL file is : > >>>> > >>>>ID DQ158013 standard; genomic DNA; VRT; 118 BP. > >>>>XX > >>>>AC DQ158013; > >>>>XX > >>>>SV DQ158013.1 > >>>>XX > >>>>DT 19-JAN-2006 (Rel. 86, Created) > >>>>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) > >>>>XX > >>>>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. > >>>> > >>>>Removing the two lines that comprise the date information resolves the > >>>>problem. > >>>> > >>>>Thanks, > >>>> > >>>>Morgane. > >>>> > >>>> > >>>> > >>>> > >>>> > >>-- > >>********************************************************** > >>Morgane THOMAS-CHOLLIER, PHD Student > >> > >>Vrije Universiteit Brussels (VUB) > >>Laboratory of Cell Genetics > >>Pleinlaan 2 > >>1050 Brussels > >>Belgium > >> > >> > >> > > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From andreas.draeger at clever-telefonieren.de Fri Apr 7 15:43:35 2006 From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Fri, 07 Apr 2006 17:43:35 +0200 Subject: [Biojava-l] Senseless assignment Message-ID: <443688A7.1000203@clever-telefonieren.de> Hi, This assignment has no effect in class org.biojavax.ontology.SimpleComparableTriple: // Hibernate requirement - not for public use. private void setOntology(ComparableOntology descriptors) { this.ontology = ontology; } I do not know why this is necessary. Andreas -- ================================== Andreas Dr?ger PhD student Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Phone: +49-7071-29-70436 ================================== From richard.holland at ebi.ac.uk Mon Apr 10 09:26:51 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Mon, 10 Apr 2006 10:26:51 +0100 Subject: [Biojava-l] Senseless assignment In-Reply-To: <443688A7.1000203@clever-telefonieren.de> References: <443688A7.1000203@clever-telefonieren.de> Message-ID: <1144661211.3951.9.camel@texas.ebi.ac.uk> It's a typo. The method declaration should read: // Hibernate requirement - not for public use. private void setOntology(ComparableOntology ontology) { this.ontoloy = ontology; } I have fixed it in CVS. cheers, Richard On Fri, 2006-04-07 at 17:43 +0200, Andreas Dr?ger wrote: > Hi, > > This assignment has no effect in class > org.biojavax.ontology.SimpleComparableTriple: > > // Hibernate requirement - not for public use. > private void setOntology(ComparableOntology descriptors) { > this.ontology = ontology; } > > I do not know why this is necessary. > > Andreas > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From mthomasc at vub.ac.be Sat Apr 8 08:20:47 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Sat, 08 Apr 2006 10:20:47 +0200 Subject: [Biojava-l] [biojavax] EMBL parser error In-Reply-To: <1144417330.3958.34.camel@texas.ebi.ac.uk> References: <44362EE1.5060804@vub.ac.be> <1144403817.3958.30.camel@texas.ebi.ac.uk> <4436589C.8010501@vub.ac.be> <1144414126.3958.32.camel@texas.ebi.ac.uk> <44366419.4050505@dbm.ulb.ac.be> <1144417330.3958.34.camel@texas.ebi.ac.uk> Message-ID: <4437725F.9000503@vub.ac.be> It works fine now ! Thanks for your help, cheers, Morgane. Richard Holland wrote: >Hi. Someone else had checked in a change to a different class, but that >change was incorrect and didn't compile. It should compile now. > >cheers, >Richard > >PS. Note to all those who commit changes - PLEASE check your code >compiles first before committing it! > >On Fri, 2006-04-07 at 15:07 +0200, Morgane THOMAS-CHOLLIER wrote: > > >>I tried to checkout biojava-live but it seems I cannot build it anymore. >>I get the following error : >> >>compile-biojava: >> [javac] Compiling 1321 source files to >>/Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/ant-build/classes/biojava >> [javac] >>/Users/morgane/Documents/PHD/KermitDB/Kermit_workspace/biojava-live3/src/org/biojavax/utils/StringTools.java:97: >>exception java.io.IOException is never thrown in body of corresponding >>try statement >> [javac] } catch (IOException e) { >> [javac] ^ >> [javac] Note: Some input files use or override a deprecated API. >> [javac] Note: Recompile with -deprecation for details. >> [javac] 1 error >> >>I use Mac OS X 10.3.9, java 1.4.2. >> >>Hope you could help, >> >>Cheers, >> >>Morgane. >> >> >>Richard Holland wrote: >> >> >> >>>Sorry, my bad. An off-by-one error... >>> >>>Check it out again and see if it works now. >>> >>>cheers, >>>Richard >>> >>>PS. I don't have any EMBL files to test with at the moment otherwise I'd >>>check it myself... :) >>> >>> >>>On Fri, 2006-04-07 at 14:18 +0200, Morgane THOMAS-CHOLLIER wrote: >>> >>> >>> >>> >>>>I now get another error message with the same file : >>>> >>>>Exception in thread "main" org.biojava.bio.BioException: Could not read >>>>sequence >>>> at >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) >>>> at >>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) >>>>Caused by: java.lang.IndexOutOfBoundsException: No group 5 >>>> at java.util.regex.Matcher.group(Matcher.java:355) >>>> at >>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271) >>>> at >>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) >>>> ... 1 more >>>> >>>>Here is the complete file, for info: >>>> >>>>ID DQ158013 standard; genomic DNA; VRT; 118 BP. >>>>XX >>>>AC DQ158013; >>>>XX >>>>SV DQ158013.1 >>>>XX >>>>DT 19-JAN-2006 (Rel. 86, Created) >>>>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) >>>>XX >>>>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. >>>>XX >>>>KW . >>>>XX >>>>OS Triturus helveticus (palmate newt) >>>>OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; >>>>Amphibia; >>>>OC Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus. >>>>XX >>>>RN [1] >>>>RP 1-118 >>>>RX DOI; 10.1016/j.ympev.2005.08.012. >>>>RX PUBMED; 16198128. >>>>RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; >>>>RT "A PCR survey for posterior Hox genes in amphibians"; >>>>RL Mol. Phylogenet. Evol. 38(2):449-458(2006). >>>>XX >>>>RN [2] >>>>RP 1-118 >>>>RA Mannaert A., Roelants K., Bossuyt F., Leyns L.; >>>>RT ; >>>>RL Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases. >>>>RL Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, >>>>Brussels 1050, >>>>RL Belgium >>>>XX >>>>FH Key Location/Qualifiers >>>>FH >>>>FT source 1..118 >>>>FT /organism="Triturus helveticus" >>>>FT /mol_type="genomic DNA" >>>>FT /clone="Thel.b9" >>>>FT /db_xref="taxon:256425" >>>>FT gene <1..>118 >>>>FT /gene="Hoxb9" >>>>FT /note="Hoxb-9" >>>>FT mRNA <1..>118 >>>>FT /gene="Hoxb9" >>>>FT /product="HOXB9" >>>>FT CDS <1..>118 >>>>FT /codon_start=2 >>>>FT /gene="Hoxb9" >>>>FT /product="HOXB9" >>>>FT /db_xref="UniProtKB/TrEMBL:Q2LK47" >>>>FT /protein_id="ABA39736.1" >>>>FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" >>>>XX >>>>SQ Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other; >>>> caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc >>>>tcacccggga 60 >>>> ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca >>>>agatctgg 118 >>>>// >>>> >>>>Thanks for helping, >>>> >>>>Morgane. >>>> >>>>Richard Holland wrote: >>>> >>>> >>>> >>>> >>>> >>>>>That was indeed a bug. I have made a change to the date parsing in >>>>>EMBLFormat and committed it to CVS. Could you test it for me please? >>>>> >>>>>cheers, >>>>>Richard >>>>> >>>>>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Hello, >>>>>> >>>>>>I am currently using biojavax that I checked out today from CVS to parse >>>>>>an EMBL file, exported from EBI SRS server. >>>>>> >>>>>>I ran into this error : >>>>>> >>>>>>Exception in thread "main" org.biojava.bio.BioException: Could not read >>>>>>sequence >>>>>> at >>>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) >>>>>> at >>>>>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34) >>>>>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86 >>>>>> at >>>>>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278) >>>>>> at >>>>>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) >>>>>> ... 1 more >>>>>> >>>>>>The EMBL file is : >>>>>> >>>>>>ID DQ158013 standard; genomic DNA; VRT; 118 BP. >>>>>>XX >>>>>>AC DQ158013; >>>>>>XX >>>>>>SV DQ158013.1 >>>>>>XX >>>>>>DT 19-JAN-2006 (Rel. 86, Created) >>>>>>DT 19-JAN-2006 (Rel. 86, Last updated, Version 1) >>>>>>XX >>>>>>DE Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds. >>>>>> >>>>>>Removing the two lines that comprise the date information resolves the >>>>>>problem. >>>>>> >>>>>>Thanks, >>>>>> >>>>>>Morgane. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>-- >>>>********************************************************** >>>>Morgane THOMAS-CHOLLIER, PHD Student >>>> >>>>Vrije Universiteit Brussels (VUB) >>>>Laboratory of Cell Genetics >>>>Pleinlaan 2 >>>>1050 Brussels >>>>Belgium >>>> >>>> >>>> >>>> >>>> >> >> -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From mthomasc at vub.ac.be Wed Apr 12 08:34:43 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Wed, 12 Apr 2006 10:34:43 +0200 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing Message-ID: <443CBBA3.9070101@vub.ac.be> Hello again, I am currently using biojavax to parse EMBL files exported from Ensembl website. Compared to the EBI files I have, they show a difference in the Features lines : sometimes, only one "/word" is present. ie: EBI file : FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" Ensembl file; FT gene complement(1..3218) FT /gene="ENSMUSG00000038227" The problem I encounter is that the parser correctly convert the "/word" into a Note, but the Note is then in relation with the immediate following feature (ie: mRNA). The current gene feature thus has no annotation. This behavior is reproducible when removing one "/word" of an EBI file. Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up with an incomplete Note, as the parser seems to split on "=" to separate the Key and the Value. Thanks for your help, Morgane. -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From jolyon.holdstock at ogt.co.uk Thu Apr 13 16:42:36 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Thu, 13 Apr 2006 17:42:36 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> Hi Morgane, I have amended the EmblFormat readSection method as below and the parsing seems to work; please test it. I think that the last bit of annotation is carried over into the next feature so before adding the new feature I dump the annotation and reset currentTag and currentVal. if (!line.startsWith(" ")) { //--------- new code starts --------------------------- if (currentTag!=null) { section.add(new String[]{currentTag,currentVal.toString()}); currentTag = null; currentVal = null; } //--------- new code ends ----------------------------- // case 1 : word value - splits into key-value on its own section.add(line.split("\\s+")); } Cheers, Jolyon -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane THOMAS-CHOLLIER Sent: 12 April 2006 09:35 To: biojava-l at open-bio.org Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Hello again, I am currently using biojavax to parse EMBL files exported from Ensembl website. Compared to the EBI files I have, they show a difference in the Features lines : sometimes, only one "/word" is present. ie: EBI file : FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" Ensembl file; FT gene complement(1..3218) FT /gene="ENSMUSG00000038227" The problem I encounter is that the parser correctly convert the "/word" into a Note, but the Note is then in relation with the immediate following feature (ie: mRNA). The current gene feature thus has no annotation. This behavior is reproducible when removing one "/word" of an EBI file. Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up with an incomplete Note, as the parser seems to split on "=" to separate the Key and the Value. Thanks for your help, Morgane. -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. From david at autohandle.com Fri Apr 14 21:29:51 2006 From: david at autohandle.com (David Scott) Date: Fri, 14 Apr 2006 14:29:51 -0700 Subject: [Biojava-l] BioJavaX.html Message-ID: <4440144F.7010603@autohandle.com> is BioJavaX.html posted somewhere - i am getting an ArrayIndexOutofBoundException on the build. thanks From david at autohandle.com Fri Apr 14 21:20:47 2006 From: david at autohandle.com (David Scott) Date: Fri, 14 Apr 2006 14:20:47 -0700 Subject: [Biojava-l] BioJavaX.html Message-ID: <4440122F.2080809@autohandle.com> is it possible to post the BioJavaX.html somewhere - i am getting an ArrayIndexOutOfBoundsException on the build docbook. i used google - but could not locate it. thanks- From mark.schreiber at novartis.com Sat Apr 15 23:19:13 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Sun, 16 Apr 2006 07:19:13 +0800 Subject: [Biojava-l] BioJavaX.html Message-ID: Could someone post the text to the wiki site temporarily. Actually it may be more sensible for this document to be hosted as a wiki page. The wiki was not available at the time that Richard wrote it so moving it may be a good idea. Any objections? Additionally some platforms have trouble building docbook html from ant (especially platforms developed in Redmond WA which we don't speak of). - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 David Scott Sent by: biojava-l-bounces at lists.open-bio.org 04/15/2006 05:20 AM To: biojava-l at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioJavaX.html is it possible to post the BioJavaX.html somewhere - i am getting an ArrayIndexOutOfBoundsException on the build docbook. i used google - but could not locate it. thanks- _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From richard.holland at ebi.ac.uk Tue Apr 18 09:21:49 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 18 Apr 2006 10:21:49 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> Message-ID: <1145352109.4188.3.camel@texas.ebi.ac.uk> I have committed an UNTESTED patch based on Jolyon's suggestion, and also attempted to fix the split-on-equals problem Morgane observed. Please let me know if there are any problems with it. As this problem affected the UniProt parser in a similar manner (much of the code is identical), the same fixes were applied there too. cheers, Richard On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > Hi Morgane, > > I have amended the EmblFormat readSection method as below and the > parsing seems to work; please test it. > > I think that the last bit of annotation is carried over into the next > feature so before adding the new feature I dump the annotation and reset > currentTag and currentVal. > > if (!line.startsWith(" ")) { > //--------- new code starts --------------------------- > if (currentTag!=null) { > section.add(new String[]{currentTag,currentVal.toString()}); > currentTag = null; > currentVal = null; > } > //--------- new code ends ----------------------------- > // case 1 : word value - splits into key-value on its own > section.add(line.split("\\s+")); > } > > Cheers, > > Jolyon > > > > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > THOMAS-CHOLLIER > Sent: 12 April 2006 09:35 > To: biojava-l at open-bio.org > Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] > > Hello again, > > I am currently using biojavax to parse EMBL files exported from Ensembl > website. > > Compared to the EBI files I have, they show a difference in the Features > > lines : > > sometimes, only one "/word" is present. ie: > > EBI file : > > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > > Ensembl file; > > FT gene complement(1..3218) > FT /gene="ENSMUSG00000038227" > > The problem I encounter is that the parser correctly convert the "/word" > > into a Note, but the Note is then in relation with the immediate > following feature (ie: mRNA). > The current gene feature thus has no annotation. > > This behavior is reproducible when removing one "/word" of an EBI file. > > Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a > > feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up > with an incomplete Note, as the parser seems to split on "=" to separate > > the Key and the Value. > > Thanks for your help, > > Morgane. > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From richard.holland at ebi.ac.uk Tue Apr 18 08:20:44 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 18 Apr 2006 09:20:44 +0100 Subject: [Biojava-l] BioJavaX.html In-Reply-To: References: Message-ID: <1145348444.4188.0.camel@texas.ebi.ac.uk> HTML version attached. I've created a placeholder on the BioJava website - could someone convert it who has the time? :) cheers, Richard On Sun, 2006-04-16 at 07:19 +0800, mark.schreiber at novartis.com wrote: > Could someone post the text to the wiki site temporarily. Actually it may > be more sensible for this document to be hosted as a wiki page. The wiki > was not available at the time that Richard wrote it so moving it may be a > good idea. Any objections? > > Additionally some platforms have trouble building docbook html from ant > (especially platforms developed in Redmond WA which we don't speak of). > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > David Scott > Sent by: biojava-l-bounces at lists.open-bio.org > 04/15/2006 05:20 AM > > > To: biojava-l at biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] BioJavaX.html > > > is it possible to post the BioJavaX.html somewhere - i am getting an > ArrayIndexOutOfBoundsException on the build docbook. i used google - > but could not locate it. > > thanks- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.L.Sharman at sms.ed.ac.uk Wed Apr 19 09:35:14 2006 From: J.L.Sharman at sms.ed.ac.uk (Joanna Sharman) Date: Wed, 19 Apr 2006 10:35:14 +0100 Subject: [Biojava-l] Pairwise Alignment Message-ID: <20060419103514.rwtqmzy00k0ogog8@www.sms.ed.ac.uk> Hello, I'm new to BioJava so I'm sorry if this question has been asked several times before. This is actually sort of in reply to this message from last month: http://lists.open-bio.org/pipermail/biojava-l/2006-March/005365.html I'd like to perform a simple pairwise alignment using the Smith-Waterman class I saw described here: http://www.biojava.org/wiki/BioJava:CookBook:DP:PairWise2 but I can't find the classes it mentions anywhere on the cvs. Can you point me to where they are? Also, I'm just wondering why the HMM method is preferred to the Smith-Waterman (or others)? It seems quite complicated to me, and like it might require more memory, or am I wrong? :) Cheers, Joanna From mthomasc at vub.ac.be Thu Apr 20 09:35:54 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Thu, 20 Apr 2006 11:35:54 +0200 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing In-Reply-To: <1145352109.4188.3.camel@texas.ebi.ac.uk> References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> <1145352109.4188.3.camel@texas.ebi.ac.uk> Message-ID: <444755FA.7030009@vub.ac.be> Hi, I have tested today's version from CVS. Both EBI and Ensembl files now react the same way. The last annotation of a feature is nevertheless related to its immediate following feature. e.g. : FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" FT mRNA <1..>118 FT /gene="Hoxb9" FT /product="HOXB9" FT CDS <1..>118 /note="Hoxb-9" is related to mRNA /product="HOXB9" is related to CDS Concerning the split-on-equals problem, I still observe the problem : [(#2) biojavax:note: transcript_i] for this annotation : /note="transcript_id=ENSMUST00000048680" Thanks for helping, Cheers, Morgane. Richard Holland wrote: > I have committed an UNTESTED patch based on Jolyon's suggestion, and > also attempted to fix the split-on-equals problem Morgane observed. > > Please let me know if there are any problems with it. > > As this problem affected the UniProt parser in a similar manner (much of > the code is identical), the same fixes were applied there too. > > cheers, > Richard > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > >> Hi Morgane, >> >> I have amended the EmblFormat readSection method as below and the >> parsing seems to work; please test it. >> >> I think that the last bit of annotation is carried over into the next >> feature so before adding the new feature I dump the annotation and reset >> currentTag and currentVal. >> >> if (!line.startsWith(" ")) { >> //--------- new code starts --------------------------- >> if (currentTag!=null) { >> section.add(new String[]{currentTag,currentVal.toString()}); >> currentTag = null; >> currentVal = null; >> } >> //--------- new code ends ----------------------------- >> // case 1 : word value - splits into key-value on its own >> section.add(line.split("\\s+")); >> } >> >> Cheers, >> >> Jolyon >> >> >> >> -----Original Message----- >> From: biojava-l-bounces at lists.open-bio.org >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane >> THOMAS-CHOLLIER >> Sent: 12 April 2006 09:35 >> To: biojava-l at open-bio.org >> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] >> >> Hello again, >> >> I am currently using biojavax to parse EMBL files exported from Ensembl >> website. >> >> Compared to the EBI files I have, they show a difference in the Features >> >> lines : >> >> sometimes, only one "/word" is present. ie: >> >> EBI file : >> >> FT gene <1..>118 >> FT /gene="Hoxb9" >> FT /note="Hoxb-9" >> >> Ensembl file; >> >> FT gene complement(1..3218) >> FT /gene="ENSMUSG00000038227" >> >> The problem I encounter is that the parser correctly convert the "/word" >> >> into a Note, but the Note is then in relation with the immediate >> following feature (ie: mRNA). >> The current gene feature thus has no annotation. >> >> This behavior is reproducible when removing one "/word" of an EBI file. >> >> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a >> >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up >> with an incomplete Note, as the parser seems to split on "=" to separate >> >> the Key and the Value. >> >> Thanks for your help, >> >> Morgane. >> >> -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc at vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium Tel : +32 2 629 15 22 ********************************************************** Stop Using Internet Explorer, choose FIREFOX ! From richard.holland at ebi.ac.uk Thu Apr 20 12:05:00 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 20 Apr 2006 13:05:00 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing In-Reply-To: <444755FA.7030009@vub.ac.be> References: <588D0DD225D05746B5D8CAE1BE971F3FB8470B@EUCLID.internal.ogtip.com> <1145352109.4188.3.camel@texas.ebi.ac.uk> <444755FA.7030009@vub.ac.be> Message-ID: <1145534700.4188.28.camel@texas.ebi.ac.uk> Hi. I made some small changes to the code, although nothing that would fix this kind of problem, committed it back to CVS, checked it out again, compiled, and ran a test program that read in an EMBL file with the feature table you describe below, and output it in EMBL format to another file. I then compared the two files... and found no differences! The split-on-equals problem didn't occur, and all notes appeared alongside their correct features. Could there be a problem maybe with the script you are using? I've really no idea what the problem is as I can't reproduce it based on the current CVS contents! cheers, Richard On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: > Hi, > > I have tested today's version from CVS. > > Both EBI and Ensembl files now react the same way. > The last annotation of a feature is nevertheless related to its > immediate following feature. > e.g. : > > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > > /note="Hoxb-9" is related to mRNA > /product="HOXB9" is related to CDS > > Concerning the split-on-equals problem, I still observe the problem : > > [(#2) biojavax:note: transcript_i] > > for this annotation : /note="transcript_id=ENSMUST00000048680" > > Thanks for helping, > > Cheers, > > Morgane. > > Richard Holland wrote: > > I have committed an UNTESTED patch based on Jolyon's suggestion, and > > also attempted to fix the split-on-equals problem Morgane observed. > > > > Please let me know if there are any problems with it. > > > > As this problem affected the UniProt parser in a similar manner (much of > > the code is identical), the same fixes were applied there too. > > > > cheers, > > Richard > > > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > > > >> Hi Morgane, > >> > >> I have amended the EmblFormat readSection method as below and the > >> parsing seems to work; please test it. > >> > >> I think that the last bit of annotation is carried over into the next > >> feature so before adding the new feature I dump the annotation and reset > >> currentTag and currentVal. > >> > >> if (!line.startsWith(" ")) { > >> //--------- new code starts --------------------------- > >> if (currentTag!=null) { > >> section.add(new String[]{currentTag,currentVal.toString()}); > >> currentTag = null; > >> currentVal = null; > >> } > >> //--------- new code ends ----------------------------- > >> // case 1 : word value - splits into key-value on its own > >> section.add(line.split("\\s+")); > >> } > >> > >> Cheers, > >> > >> Jolyon > >> > >> > >> > >> -----Original Message----- > >> From: biojava-l-bounces at lists.open-bio.org > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > >> THOMAS-CHOLLIER > >> Sent: 12 April 2006 09:35 > >> To: biojava-l at open-bio.org > >> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] > >> > >> Hello again, > >> > >> I am currently using biojavax to parse EMBL files exported from Ensembl > >> website. > >> > >> Compared to the EBI files I have, they show a difference in the Features > >> > >> lines : > >> > >> sometimes, only one "/word" is present. ie: > >> > >> EBI file : > >> > >> FT gene <1..>118 > >> FT /gene="Hoxb9" > >> FT /note="Hoxb-9" > >> > >> Ensembl file; > >> > >> FT gene complement(1..3218) > >> FT /gene="ENSMUSG00000038227" > >> > >> The problem I encounter is that the parser correctly convert the "/word" > >> > >> into a Note, but the Note is then in relation with the immediate > >> following feature (ie: mRNA). > >> The current gene feature thus has no annotation. > >> > >> This behavior is reproducible when removing one "/word" of an EBI file. > >> > >> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a > >> > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up > >> with an incomplete Note, as the parser seems to split on "=" to separate > >> > >> the Key and the Value. > >> > >> Thanks for your help, > >> > >> Morgane. > >> > >> > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From jolyon.holdstock at ogt.co.uk Thu Apr 20 12:08:40 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Thu, 20 Apr 2006 13:08:40 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com> I've run the sequence through the parser and it seems to work OK. I iterate through the features and then iterate through the annotations of that feature Based on the input.... FT source 1..118 FT /organism="Triturus helveticus" FT /mol_type="genomic DNA" FT /clone="Thel.b9" FT /db_xref="taxon:256425" FT gene <1..>118 FT /gene="Hoxb9" FT /note="Hoxb-9" FT mRNA <1..>118 FT /gene="Hoxb9" FT /product="HOXB9" FT CDS <1..>118 FT /codon_start=2 FT /gene="Hoxb9" FT /product="HOXB9" FT /db_xref="UniProtKB/TrEMBL:Q2LK47" FT /protein_id="ABA39736.1" FT /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" The output is.... ======================================== Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118) Note: (#0) biojavax:mol_type: genomic DNA Note: (#1) biojavax:clone: Thel.b9 ======================================== Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>) Note: (#2) biojavax:gene: Hoxb9 Note: (#3) biojavax:note: Hoxb-9 ======================================== Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>) Note: (#4) biojavax:gene: Hoxb9 Note: (#5) biojavax:product: HOXB9 ======================================== Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>) Note: (#6) biojavax:codon_start: 2 Note: (#7) biojavax:gene: Hoxb9 Note: (#8) biojavax:product: HOXB9 Note: (#9) biojavax:protein_id: ABA39736.1 Note: (#10) biojavax:translation: KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW Note: (#11) biojavax:translation: KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW ============================================= This looks OK, the one thing I've just noticed is that the last piece of annotation of the last feature is assigned twice. Jolyon -----Original Message----- From: Richard Holland [mailto:richard.holland at ebi.ac.uk] Sent: 20 April 2006 13:05 To: mthomas at dbm.ulb.ac.be Cc: Jolyon Holdstock; biojava-l at open-bio.org Subject: Re: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Hi. I made some small changes to the code, although nothing that would fix this kind of problem, committed it back to CVS, checked it out again, compiled, and ran a test program that read in an EMBL file with the feature table you describe below, and output it in EMBL format to another file. I then compared the two files... and found no differences! The split-on-equals problem didn't occur, and all notes appeared alongside their correct features. Could there be a problem maybe with the script you are using? I've really no idea what the problem is as I can't reproduce it based on the current CVS contents! cheers, Richard On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: > Hi, > > I have tested today's version from CVS. > > Both EBI and Ensembl files now react the same way. > The last annotation of a feature is nevertheless related to its > immediate following feature. > e.g. : > > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > > /note="Hoxb-9" is related to mRNA > /product="HOXB9" is related to CDS > > Concerning the split-on-equals problem, I still observe the problem : > > [(#2) biojavax:note: transcript_i] > > for this annotation : /note="transcript_id=ENSMUST00000048680" > > Thanks for helping, > > Cheers, > > Morgane. > > Richard Holland wrote: > > I have committed an UNTESTED patch based on Jolyon's suggestion, and > > also attempted to fix the split-on-equals problem Morgane observed. > > > > Please let me know if there are any problems with it. > > > > As this problem affected the UniProt parser in a similar manner (much of > > the code is identical), the same fixes were applied there too. > > > > cheers, > > Richard > > > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > > > >> Hi Morgane, > >> > >> I have amended the EmblFormat readSection method as below and the > >> parsing seems to work; please test it. > >> > >> I think that the last bit of annotation is carried over into the next > >> feature so before adding the new feature I dump the annotation and reset > >> currentTag and currentVal. > >> > >> if (!line.startsWith(" ")) { > >> //--------- new code starts --------------------------- > >> if (currentTag!=null) { > >> section.add(new String[]{currentTag,currentVal.toString()}); > >> currentTag = null; > >> currentVal = null; > >> } > >> //--------- new code ends ----------------------------- > >> // case 1 : word value - splits into key-value on its own > >> section.add(line.split("\\s+")); > >> } > >> > >> Cheers, > >> > >> Jolyon > >> > >> > >> > >> -----Original Message----- > >> From: biojava-l-bounces at lists.open-bio.org > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > >> THOMAS-CHOLLIER > >> Sent: 12 April 2006 09:35 > >> To: biojava-l at open-bio.org > >> Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] > >> > >> Hello again, > >> > >> I am currently using biojavax to parse EMBL files exported from Ensembl > >> website. > >> > >> Compared to the EBI files I have, they show a difference in the Features > >> > >> lines : > >> > >> sometimes, only one "/word" is present. ie: > >> > >> EBI file : > >> > >> FT gene <1..>118 > >> FT /gene="Hoxb9" > >> FT /note="Hoxb-9" > >> > >> Ensembl file; > >> > >> FT gene complement(1..3218) > >> FT /gene="ENSMUSG00000038227" > >> > >> The problem I encounter is that the parser correctly convert the "/word" > >> > >> into a Note, but the Note is then in relation with the immediate > >> following feature (ie: mRNA). > >> The current gene feature thus has no annotation. > >> > >> This behavior is reproducible when removing one "/word" of an EBI file. > >> > >> Apart from this issue, I noted that Ensembl EMBL files uses "=" inside a > >> > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends up > >> with an incomplete Note, as the parser seems to split on "=" to separate > >> > >> the Key and the Value. > >> > >> Thanks for your help, > >> > >> Morgane. > >> > >> > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. From richard.holland at ebi.ac.uk Thu Apr 20 12:16:00 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 20 Apr 2006 13:16:00 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3FB84AA0@EUCLID.internal.ogtip.com> Message-ID: <1145535361.4188.33.camel@texas.ebi.ac.uk> Did you use the latest CVS version? (I committed a change that I think should have fixed that about 1 minute before my previous email). On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote: > I've run the sequence through the parser and it seems to work OK. I > iterate through the features and then iterate through the annotations of > that feature > > Based on the input.... > > FT source 1..118 > FT /organism="Triturus helveticus" > FT /mol_type="genomic DNA" > FT /clone="Thel.b9" > FT /db_xref="taxon:256425" > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > FT /codon_start=2 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT /db_xref="UniProtKB/TrEMBL:Q2LK47" > FT /protein_id="ABA39736.1" > FT > /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" > > The output is.... > > ======================================== > Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118) > Note: (#0) biojavax:mol_type: genomic DNA > Note: (#1) biojavax:clone: Thel.b9 > ======================================== > Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>) > Note: (#2) biojavax:gene: Hoxb9 > Note: (#3) biojavax:note: Hoxb-9 > ======================================== > Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>) > Note: (#4) biojavax:gene: Hoxb9 > Note: (#5) biojavax:product: HOXB9 > ======================================== > Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>) > Note: (#6) biojavax:codon_start: 2 > Note: (#7) biojavax:gene: Hoxb9 > Note: (#8) biojavax:product: HOXB9 > Note: (#9) biojavax:protein_id: ABA39736.1 > Note: (#10) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > Note: (#11) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > ============================================= > > This looks OK, the one thing I've just noticed is that the last piece of > annotation of the last feature is assigned twice. > > Jolyon > > > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 20 April 2006 13:05 > To: mthomas at dbm.ulb.ac.be > Cc: Jolyon Holdstock; biojava-l at open-bio.org > Subject: Re: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > Hi. > > I made some small changes to the code, although nothing that would fix > this kind of problem, committed it back to CVS, checked it out again, > compiled, and ran a test program that read in an EMBL file with the > feature table you describe below, and output it in EMBL format to > another file. I then compared the two files... and found no differences! > The split-on-equals problem didn't occur, and all notes appeared > alongside their correct features. > > Could there be a problem maybe with the script you are using? > > I've really no idea what the problem is as I can't reproduce it based on > the current CVS contents! > > cheers, > Richard > > On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: > > Hi, > > > > I have tested today's version from CVS. > > > > Both EBI and Ensembl files now react the same way. > > The last annotation of a feature is nevertheless related to its > > immediate following feature. > > e.g. : > > > > FT gene <1..>118 > > FT /gene="Hoxb9" > > FT /note="Hoxb-9" > > FT mRNA <1..>118 > > FT /gene="Hoxb9" > > FT /product="HOXB9" > > FT CDS <1..>118 > > > > /note="Hoxb-9" is related to mRNA > > /product="HOXB9" is related to CDS > > > > Concerning the split-on-equals problem, I still observe the problem : > > > > [(#2) biojavax:note: transcript_i] > > > > for this annotation : /note="transcript_id=ENSMUST00000048680" > > > > Thanks for helping, > > > > Cheers, > > > > Morgane. > > > > Richard Holland wrote: > > > I have committed an UNTESTED patch based on Jolyon's suggestion, and > > > also attempted to fix the split-on-equals problem Morgane observed. > > > > > > Please let me know if there are any problems with it. > > > > > > As this problem affected the UniProt parser in a similar manner > (much of > > > the code is identical), the same fixes were applied there too. > > > > > > cheers, > > > Richard > > > > > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > > > > > >> Hi Morgane, > > >> > > >> I have amended the EmblFormat readSection method as below and the > > >> parsing seems to work; please test it. > > >> > > >> I think that the last bit of annotation is carried over into the > next > > >> feature so before adding the new feature I dump the annotation and > reset > > >> currentTag and currentVal. > > >> > > >> if (!line.startsWith(" ")) { > > >> //--------- new code starts --------------------------- > > >> if (currentTag!=null) { > > >> section.add(new String[]{currentTag,currentVal.toString()}); > > >> currentTag = null; > > >> currentVal = null; > > >> } > > >> //--------- new code ends ----------------------------- > > >> // case 1 : word value - splits into key-value on its own > > >> section.add(line.split("\\s+")); > > >> } > > >> > > >> Cheers, > > >> > > >> Jolyon > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: biojava-l-bounces at lists.open-bio.org > > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > > >> THOMAS-CHOLLIER > > >> Sent: 12 April 2006 09:35 > > >> To: biojava-l at open-bio.org > > >> Subject: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > >> > > >> Hello again, > > >> > > >> I am currently using biojavax to parse EMBL files exported from > Ensembl > > >> website. > > >> > > >> Compared to the EBI files I have, they show a difference in the > Features > > >> > > >> lines : > > >> > > >> sometimes, only one "/word" is present. ie: > > >> > > >> EBI file : > > >> > > >> FT gene <1..>118 > > >> FT /gene="Hoxb9" > > >> FT /note="Hoxb-9" > > >> > > >> Ensembl file; > > >> > > >> FT gene complement(1..3218) > > >> FT /gene="ENSMUSG00000038227" > > >> > > >> The problem I encounter is that the parser correctly convert the > "/word" > > >> > > >> into a Note, but the Note is then in relation with the immediate > > >> following feature (ie: mRNA). > > >> The current gene feature thus has no annotation. > > >> > > >> This behavior is reproducible when removing one "/word" of an EBI > file. > > >> > > >> Apart from this issue, I noted that Ensembl EMBL files uses "=" > inside a > > >> > > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends > up > > >> with an incomplete Note, as the parser seems to split on "=" to > separate > > >> > > >> the Key and the Value. > > >> > > >> Thanks for your help, > > >> > > >> Morgane. > > >> > > >> > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mthomasc at vub.ac.be Thu Apr 20 12:30:10 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Thu, 20 Apr 2006 14:30:10 +0200 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Resolved] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com> Message-ID: <44477ED2.2010200@vub.ac.be> I've just updated my sources few minutes ago and everything works fine now (both annotations and split-on-equals problem). I've tested both the EBI file and Ensembl file. Thanks for fixing the problems !! Cheers, Morgane Jolyon Holdstock wrote: > No, I'll update my source. > > Thanks, > > Jolyon > > > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 20 April 2006 13:16 > To: Jolyon Holdstock > Cc: mthomas at dbm.ulb.ac.be; biojava-l at open-bio.org > Subject: RE: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > Did you use the latest CVS version? (I committed a change that I think > should have fixed that about 1 minute before my previous email). > > > On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote: > >> I've run the sequence through the parser and it seems to work OK. I >> iterate through the features and then iterate through the annotations >> > of > >> that feature >> >> Based on the input.... >> >> FT source 1..118 >> FT /organism="Triturus helveticus" >> FT /mol_type="genomic DNA" >> FT /clone="Thel.b9" >> FT /db_xref="taxon:256425" >> FT gene <1..>118 >> FT /gene="Hoxb9" >> FT /note="Hoxb-9" >> FT mRNA <1..>118 >> FT /gene="Hoxb9" >> FT /product="HOXB9" >> FT CDS <1..>118 >> FT /codon_start=2 >> FT /gene="Hoxb9" >> FT /product="HOXB9" >> FT /db_xref="UniProtKB/TrEMBL:Q2LK47" >> FT /protein_id="ABA39736.1" >> FT >> /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" >> >> The output is.... >> >> ======================================== >> Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118) >> Note: (#0) biojavax:mol_type: genomic DNA >> Note: (#1) biojavax:clone: Thel.b9 >> ======================================== >> Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>) >> Note: (#2) biojavax:gene: Hoxb9 >> Note: (#3) biojavax:note: Hoxb-9 >> ======================================== >> Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>) >> Note: (#4) biojavax:gene: Hoxb9 >> Note: (#5) biojavax:product: HOXB9 >> ======================================== >> Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>) >> Note: (#6) biojavax:codon_start: 2 >> Note: (#7) biojavax:gene: Hoxb9 >> Note: (#8) biojavax:product: HOXB9 >> Note: (#9) biojavax:protein_id: ABA39736.1 >> Note: (#10) biojavax:translation: >> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW >> Note: (#11) biojavax:translation: >> KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW >> ============================================= >> >> This looks OK, the one thing I've just noticed is that the last piece >> > of > >> annotation of the last feature is assigned twice. >> >> Jolyon >> >> >> -----Original Message----- >> From: Richard Holland [mailto:richard.holland at ebi.ac.uk] >> Sent: 20 April 2006 13:05 >> To: mthomas at dbm.ulb.ac.be >> Cc: Jolyon Holdstock; biojava-l at open-bio.org >> Subject: Re: [Biojava-l] [biojavax] EMBL parser : features >> parsing[Scanned] >> >> Hi. >> >> I made some small changes to the code, although nothing that would fix >> this kind of problem, committed it back to CVS, checked it out again, >> compiled, and ran a test program that read in an EMBL file with the >> feature table you describe below, and output it in EMBL format to >> another file. I then compared the two files... and found no >> > differences! > >> The split-on-equals problem didn't occur, and all notes appeared >> alongside their correct features. >> >> Could there be a problem maybe with the script you are using? >> >> I've really no idea what the problem is as I can't reproduce it based >> > on > >> the current CVS contents! >> >> cheers, >> Richard >> >> On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: >> >>> Hi, >>> >>> I have tested today's version from CVS. >>> >>> Both EBI and Ensembl files now react the same way. >>> The last annotation of a feature is nevertheless related to its >>> immediate following feature. >>> e.g. : >>> >>> FT gene <1..>118 >>> FT /gene="Hoxb9" >>> FT /note="Hoxb-9" >>> FT mRNA <1..>118 >>> FT /gene="Hoxb9" >>> FT /product="HOXB9" >>> FT CDS <1..>118 >>> >>> /note="Hoxb-9" is related to mRNA >>> /product="HOXB9" is related to CDS >>> >>> Concerning the split-on-equals problem, I still observe the problem >>> > : > >>> [(#2) biojavax:note: transcript_i] >>> >>> for this annotation : /note="transcript_id=ENSMUST00000048680" >>> >>> Thanks for helping, >>> >>> Cheers, >>> >>> Morgane. >>> >>> Richard Holland wrote: >>> >>>> I have committed an UNTESTED patch based on Jolyon's suggestion, >>>> > and > >>>> also attempted to fix the split-on-equals problem Morgane >>>> > observed. > >>>> Please let me know if there are any problems with it. >>>> >>>> As this problem affected the UniProt parser in a similar manner >>>> >> (much of >> >>>> the code is identical), the same fixes were applied there too. >>>> >>>> cheers, >>>> Richard >>>> >>>> On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: >>>> >>>> >>>>> Hi Morgane, >>>>> >>>>> I have amended the EmblFormat readSection method as below and the >>>>> parsing seems to work; please test it. >>>>> >>>>> I think that the last bit of annotation is carried over into the >>>>> >> next >> >>>>> feature so before adding the new feature I dump the annotation >>>>> > and > >> reset >> >>>>> currentTag and currentVal. >>>>> >>>>> if (!line.startsWith(" ")) { >>>>> //--------- new code starts --------------------------- >>>>> if (currentTag!=null) { >>>>> section.add(new String[]{currentTag,currentVal.toString()}); >>>>> currentTag = null; >>>>> currentVal = null; >>>>> } >>>>> //--------- new code ends ----------------------------- >>>>> // case 1 : word value - splits into key-value on its own >>>>> section.add(line.split("\\s+")); >>>>> } >>>>> >>>>> Cheers, >>>>> >>>>> Jolyon >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: biojava-l-bounces at lists.open-bio.org >>>>> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of >>>>> > Morgane > >>>>> THOMAS-CHOLLIER >>>>> Sent: 12 April 2006 09:35 >>>>> To: biojava-l at open-bio.org >>>>> Subject: [Biojava-l] [biojavax] EMBL parser : features >>>>> >> parsing[Scanned] >> >>>>> Hello again, >>>>> >>>>> I am currently using biojavax to parse EMBL files exported from >>>>> >> Ensembl >> >>>>> website. >>>>> >>>>> Compared to the EBI files I have, they show a difference in the >>>>> >> Features >> >>>>> lines : >>>>> >>>>> sometimes, only one "/word" is present. ie: >>>>> >>>>> EBI file : >>>>> >>>>> FT gene <1..>118 >>>>> FT /gene="Hoxb9" >>>>> FT /note="Hoxb-9" >>>>> >>>>> Ensembl file; >>>>> >>>>> FT gene complement(1..3218) >>>>> FT /gene="ENSMUSG00000038227" >>>>> >>>>> The problem I encounter is that the parser correctly convert the >>>>> >> "/word" >> >>>>> into a Note, but the Note is then in relation with the immediate >>>>> following feature (ie: mRNA). >>>>> The current gene feature thus has no annotation. >>>>> >>>>> This behavior is reproducible when removing one "/word" of an EBI >>>>> >> file. >> >>>>> Apart from this issue, I noted that Ensembl EMBL files uses "=" >>>>> >> inside a >> >>>>> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends >>>>> >> up >> >>>>> with an incomplete Note, as the parser seems to split on "=" to >>>>> >> separate >> >>>>> the Key and the Value. >>>>> >>>>> Thanks for your help, >>>>> >>>>> Morgane. >>>>> >>>>> >>>>> -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc at vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From jolyon.holdstock at ogt.co.uk Thu Apr 20 12:18:21 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Thu, 20 Apr 2006 13:18:21 +0100 Subject: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FB84AA8@EUCLID.internal.ogtip.com> No, I'll update my source. Thanks, Jolyon -----Original Message----- From: Richard Holland [mailto:richard.holland at ebi.ac.uk] Sent: 20 April 2006 13:16 To: Jolyon Holdstock Cc: mthomas at dbm.ulb.ac.be; biojava-l at open-bio.org Subject: RE: [Biojava-l] [biojavax] EMBL parser : features parsing[Scanned] Did you use the latest CVS version? (I committed a change that I think should have fixed that about 1 minute before my previous email). On Thu, 2006-04-20 at 13:08 +0100, Jolyon Holdstock wrote: > I've run the sequence through the parser and it seems to work OK. I > iterate through the features and then iterate through the annotations of > that feature > > Based on the input.... > > FT source 1..118 > FT /organism="Triturus helveticus" > FT /mol_type="genomic DNA" > FT /clone="Thel.b9" > FT /db_xref="taxon:256425" > FT gene <1..>118 > FT /gene="Hoxb9" > FT /note="Hoxb-9" > FT mRNA <1..>118 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT CDS <1..>118 > FT /codon_start=2 > FT /gene="Hoxb9" > FT /product="HOXB9" > FT /db_xref="UniProtKB/TrEMBL:Q2LK47" > FT /protein_id="ABA39736.1" > FT > /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW" > > The output is.... > > ======================================== > Feature: (#0) lcl:DQ158013/DQ158013.1:source,EMBL(1..118) > Note: (#0) biojavax:mol_type: genomic DNA > Note: (#1) biojavax:clone: Thel.b9 > ======================================== > Feature: (#1) lcl:DQ158013/DQ158013.1:gene,EMBL(<1..118>) > Note: (#2) biojavax:gene: Hoxb9 > Note: (#3) biojavax:note: Hoxb-9 > ======================================== > Feature: (#2) lcl:DQ158013/DQ158013.1:mRNA,EMBL(<1..118>) > Note: (#4) biojavax:gene: Hoxb9 > Note: (#5) biojavax:product: HOXB9 > ======================================== > Feature: (#3) lcl:DQ158013/DQ158013.1:CDS,EMBL(<1..118>) > Note: (#6) biojavax:codon_start: 2 > Note: (#7) biojavax:gene: Hoxb9 > Note: (#8) biojavax:product: HOXB9 > Note: (#9) biojavax:protein_id: ABA39736.1 > Note: (#10) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > Note: (#11) biojavax:translation: > KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW > ============================================= > > This looks OK, the one thing I've just noticed is that the last piece of > annotation of the last feature is assigned twice. > > Jolyon > > > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 20 April 2006 13:05 > To: mthomas at dbm.ulb.ac.be > Cc: Jolyon Holdstock; biojava-l at open-bio.org > Subject: Re: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > Hi. > > I made some small changes to the code, although nothing that would fix > this kind of problem, committed it back to CVS, checked it out again, > compiled, and ran a test program that read in an EMBL file with the > feature table you describe below, and output it in EMBL format to > another file. I then compared the two files... and found no differences! > The split-on-equals problem didn't occur, and all notes appeared > alongside their correct features. > > Could there be a problem maybe with the script you are using? > > I've really no idea what the problem is as I can't reproduce it based on > the current CVS contents! > > cheers, > Richard > > On Thu, 2006-04-20 at 11:35 +0200, Morgane THOMAS-CHOLLIER wrote: > > Hi, > > > > I have tested today's version from CVS. > > > > Both EBI and Ensembl files now react the same way. > > The last annotation of a feature is nevertheless related to its > > immediate following feature. > > e.g. : > > > > FT gene <1..>118 > > FT /gene="Hoxb9" > > FT /note="Hoxb-9" > > FT mRNA <1..>118 > > FT /gene="Hoxb9" > > FT /product="HOXB9" > > FT CDS <1..>118 > > > > /note="Hoxb-9" is related to mRNA > > /product="HOXB9" is related to CDS > > > > Concerning the split-on-equals problem, I still observe the problem : > > > > [(#2) biojavax:note: transcript_i] > > > > for this annotation : /note="transcript_id=ENSMUST00000048680" > > > > Thanks for helping, > > > > Cheers, > > > > Morgane. > > > > Richard Holland wrote: > > > I have committed an UNTESTED patch based on Jolyon's suggestion, and > > > also attempted to fix the split-on-equals problem Morgane observed. > > > > > > Please let me know if there are any problems with it. > > > > > > As this problem affected the UniProt parser in a similar manner > (much of > > > the code is identical), the same fixes were applied there too. > > > > > > cheers, > > > Richard > > > > > > On Thu, 2006-04-13 at 17:42 +0100, Jolyon Holdstock wrote: > > > > > >> Hi Morgane, > > >> > > >> I have amended the EmblFormat readSection method as below and the > > >> parsing seems to work; please test it. > > >> > > >> I think that the last bit of annotation is carried over into the > next > > >> feature so before adding the new feature I dump the annotation and > reset > > >> currentTag and currentVal. > > >> > > >> if (!line.startsWith(" ")) { > > >> //--------- new code starts --------------------------- > > >> if (currentTag!=null) { > > >> section.add(new String[]{currentTag,currentVal.toString()}); > > >> currentTag = null; > > >> currentVal = null; > > >> } > > >> //--------- new code ends ----------------------------- > > >> // case 1 : word value - splits into key-value on its own > > >> section.add(line.split("\\s+")); > > >> } > > >> > > >> Cheers, > > >> > > >> Jolyon > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: biojava-l-bounces at lists.open-bio.org > > >> [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Morgane > > >> THOMAS-CHOLLIER > > >> Sent: 12 April 2006 09:35 > > >> To: biojava-l at open-bio.org > > >> Subject: [Biojava-l] [biojavax] EMBL parser : features > parsing[Scanned] > > >> > > >> Hello again, > > >> > > >> I am currently using biojavax to parse EMBL files exported from > Ensembl > > >> website. > > >> > > >> Compared to the EBI files I have, they show a difference in the > Features > > >> > > >> lines : > > >> > > >> sometimes, only one "/word" is present. ie: > > >> > > >> EBI file : > > >> > > >> FT gene <1..>118 > > >> FT /gene="Hoxb9" > > >> FT /note="Hoxb-9" > > >> > > >> Ensembl file; > > >> > > >> FT gene complement(1..3218) > > >> FT /gene="ENSMUSG00000038227" > > >> > > >> The problem I encounter is that the parser correctly convert the > "/word" > > >> > > >> into a Note, but the Note is then in relation with the immediate > > >> following feature (ie: mRNA). > > >> The current gene feature thus has no annotation. > > >> > > >> This behavior is reproducible when removing one "/word" of an EBI > file. > > >> > > >> Apart from this issue, I noted that Ensembl EMBL files uses "=" > inside a > > >> > > >> feature (ie: /note="transcript_id=ENSMUST00000048680") which ends > up > > >> with an incomplete Note, as the parser seems to split on "=" to > separate > > >> > > >> the Key and the Value. > > >> > > >> Thanks for your help, > > >> > > >> Morgane. > > >> > > >> > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. From mark.schreiber at novartis.com Tue Apr 25 06:07:59 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 25 Apr 2006 14:07:59 +0800 Subject: [Biojava-l] Pairwise Alignment Message-ID: Hi - The appropriate classes for SW and NW pairwise alignment are in the org.biojava.bio.alignment package in the CVS (see http://code.open-bio.org/cgi/viewcvs.cgi/biojava-live/src/org/biojava/bio/alignment/?cvsroot=biojava). While SW and NW are simple they are not as flexible as the pairwise architectures that can be made with HMMs. For a standard pairwise alignment I would think that the SW and NW algorithms are fine. I'm not sure about comparative speed or memory requirements. - Mark Joanna Sharman Sent by: biojava-l-bounces at lists.open-bio.org 04/19/2006 05:35 PM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Pairwise Alignment Hello, I'm new to BioJava so I'm sorry if this question has been asked several times before. This is actually sort of in reply to this message from last month: http://lists.open-bio.org/pipermail/biojava-l/2006-March/005365.html I'd like to perform a simple pairwise alignment using the Smith-Waterman class I saw described here: http://www.biojava.org/wiki/BioJava:CookBook:DP:PairWise2 but I can't find the classes it mentions anywhere on the cvs. Can you point me to where they are? Also, I'm just wondering why the HMM method is preferred to the Smith-Waterman (or others)? It seems quite complicated to me, and like it might require more memory, or am I wrong? :) Cheers, Joanna _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From e.willighagen at science.ru.nl Wed Apr 26 16:03:47 2006 From: e.willighagen at science.ru.nl (Egon Willighagen) Date: Wed, 26 Apr 2006 18:03:47 +0200 Subject: [Biojava-l] org.biojava.bio.gui.glyph classes? Message-ID: <200604261803.47333.e.willighagen@science.ru.nl> Hi all, in the wiki I saw mention of the org.biojava.bio.gui.glyph package, which does not seem to be part of BioJava 1.4. Where can I download the code classes in that package? Egon -- Radboud University Nijmegen http://www.cac.science.ru.nl/ blog: http://chem-bla-ics.blogspot.com/ From mark.schreiber at novartis.com Thu Apr 27 01:14:38 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 27 Apr 2006 09:14:38 +0800 Subject: [Biojava-l] org.biojava.bio.gui.glyph classes? Message-ID: Hi - They are in biojava-live, which is the development version available for download via cvs. Take a look at the instructions on www.biojava.org. - Mark Egon Willighagen Sent by: biojava-l-bounces at lists.open-bio.org 04/27/2006 12:03 AM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] org.biojava.bio.gui.glyph classes? Hi all, in the wiki I saw mention of the org.biojava.bio.gui.glyph package, which does not seem to be part of BioJava 1.4. Where can I download the code classes in that package? Egon -- Radboud University Nijmegen http://www.cac.science.ru.nl/ blog: http://chem-bla-ics.blogspot.com/ _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From heatkent at gmail.com Wed Apr 26 23:22:46 2006 From: heatkent at gmail.com (Heather Kent) Date: Wed, 26 Apr 2006 18:22:46 -0500 Subject: [Biojava-l] chromatogram viewer Message-ID: I'm wondering if anyone can help me locate some source code for swing components involved in viewing chromatograms, i read a 2003 forum from biojava where Rhett Sutphin mentioned he would make some source code for a chromatogram viewer (using the chromatogramgraphic class) available but i cant seem to find it anywhere....im trying to fashion some scroll bars for my chromatogram viewer that function to scroll through the image, as well as vertically and horizontally scale the chromatgram....i have some code from an old viewer that will perform all these functions but doesnt use any of the biojava classes or swing components.... thanx heather From russ at kepler-eng.com Thu Apr 27 04:24:19 2006 From: russ at kepler-eng.com (Russ Kepler) Date: Wed, 26 Apr 2006 22:24:19 -0600 Subject: [Biojava-l] chromatogram viewer In-Reply-To: References: Message-ID: <200604262224.19525.russ@kepler-eng.com> On Wednesday 26 April 2006 05:22 pm, Heather Kent wrote: > I'm wondering if anyone can help me locate some source code for swing > components involved in viewing chromatograms, i read a 2003 forum from > biojava where Rhett Sutphin mentioned he would make some source code for a > chromatogram viewer (using the chromatogramgraphic class) available but i > cant seem to find it anywhere....im trying to fashion some scroll bars for > my chromatogram viewer that function to scroll through the image, as well > as vertically and horizontally scale the chromatgram....i have some code > from an old viewer that will perform all these functions but doesnt use any > of the biojava classes or swing components.... There's org.biojava.bio.gui.sequence.ABITraceRenderer with demo code in seqviewer.TraceViewer It should give you a start. From n.haigh at sheffield.ac.uk Thu Apr 27 13:48:59 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 14:48:59 +0100 Subject: [Biojava-l] Sun One Studio+Biojava Message-ID: <002301c66a01$5637d910$9f5ea78f@bmbpc196> I?m totally new to Java and Biojava as I'm trying to defect from Bioperl! I'm trying to use Sun One Studio for editing my java files - at least initially. I don't know how to setup Sun One Studio to find my biojava-1.4.jar file, I'm not even sure how to test if it can find it correctly. Any help on these issues would be gratefully received. As I said I'm a newbie - bear with me! Cheers Nathan ---------------------------------------------------------------------------- ------ Dr. Nathan S. Haigh Bioinformatics PostDoctoral Research Associate ? Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22 20112 Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533 569 University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22 20002 Western Bank???????????????????????????? ?????? ?????? Web: www.bioinf.shef.ac.uk Sheffield??????????????????????????????? ?????? www.petraea.shef.ac.uk S10 2TN????????????????????????????????? ?????? ---------------------------------------------------------------------------- ------ --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 14:48:56 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From richard.holland at ebi.ac.uk Thu Apr 27 14:51:23 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Apr 2006 15:51:23 +0100 Subject: [Biojava-l] Sun One Studio+Biojava In-Reply-To: <002301c66a01$5637d910$9f5ea78f@bmbpc196> References: <002301c66a01$5637d910$9f5ea78f@bmbpc196> Message-ID: <1146149483.3955.7.camel@texas.ebi.ac.uk> Sun One Studio is built on NetBeans, which is what I use to develop bits of BioJava with, so I think what works for me should work for you. Here goes...: If you are working with BioJava in apps you are developing yourself, you need to set up BioJava as a library in NetBeans. Do this by going to the Library Manager (Tools menu), creating a new library called BioJava, then using the buttons provided to locate and add the biojava-1.4.jar file to the library. You can then associate this library with any project you are working on by right-clicking on that project, choosing Properties, then click on Libraries in the tree on the left of the window that appears and use this to add the BioJava library. If you are intending to develop BioJava itself, you need to check out the entire biojava-live project from CVS. You can then set up development in NetBeans by creating a "new project from existing Ant script", and telling it where the build.xml file can be found within the BioJava project. It'll do the rest for you. Hope this helps. cheers, Richard On Thu, 2006-04-27 at 14:48 +0100, Nathan S. Haigh wrote: > I?m totally new to Java and Biojava as I'm trying to defect from Bioperl! > I'm trying to use Sun One Studio for editing my java files - at least > initially. I don't know how to setup Sun One Studio to find my > biojava-1.4.jar file, I'm not even sure how to test if it can find it > correctly. Any help on these issues would be gratefully received. As I said > I'm a newbie - bear with me! > > Cheers > Nathan > > ---------------------------------------------------------------------------- > ------ > Dr. Nathan S. Haigh > Bioinformatics PostDoctoral Research Associate > > Room B2 211 Tel: +44 (0)114 22 > 20112 > Department of Animal and Plant Sciences Mob: +44 (0)7742 533 > 569 > University of Sheffield Fax: +44 (0)114 22 > 20002 > Western Bank Web: > www.bioinf.shef.ac.uk > Sheffield > www.petraea.shef.ac.uk > S10 2TN > ---------------------------------------------------------------------------- > ------ > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0615-2, 12/04/2006 > Tested on: 27/04/2006 14:48:56 > avast! - copyright (c) 1988-2006 ALWIL Software. > http://www.avast.com > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From n.haigh at sheffield.ac.uk Thu Apr 27 15:01:56 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 16:01:56 +0100 Subject: [Biojava-l] Sun One Studio+Biojava In-Reply-To: <1146149483.3955.7.camel@texas.ebi.ac.uk> Message-ID: <003601c66a0b$86b289f0$9f5ea78f@bmbpc196> Thanks for the info - the fog is starting to lift! :o) I think I'll leave actual Biojava development for now - see how I go with actually learning Java first :o) I have a steep learning curve, as I have an application written in Perl which I use Bioperl modules and Perl/Tk for the GUI. So I'm trying to rewrite this application in Java while trying to think about OO programming.....i'm sure I'll send some really simple questions to the list over the coming weeks/months, but hopefully there won't be too many nightmares along the way! Thanks Nathan > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 27 April 2006 15:51 > To: n.haigh at sheffield.ac.uk > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] Sun One Studio+Biojava > > Sun One Studio is built on NetBeans, which is what I use to develop bits > of BioJava with, so I think what works for me should work for you. Here > goes...: > > If you are working with BioJava in apps you are developing yourself, you > need to set up BioJava as a library in NetBeans. Do this by going to the > Library Manager (Tools menu), creating a new library called BioJava, > then using the buttons provided to locate and add the biojava-1.4.jar > file to the library. You can then associate this library with any > project you are working on by right-clicking on that project, choosing > Properties, then click on Libraries in the tree on the left of the > window that appears and use this to add the BioJava library. > > If you are intending to develop BioJava itself, you need to check out > the entire biojava-live project from CVS. You can then set up > development in NetBeans by creating a "new project from existing Ant > script", and telling it where the build.xml file can be found within the > BioJava project. It'll do the rest for you. > > Hope this helps. > > cheers, > Richard > > On Thu, 2006-04-27 at 14:48 +0100, Nathan S. Haigh wrote: > > I'm totally new to Java and Biojava as I'm trying to defect from > Bioperl! > > I'm trying to use Sun One Studio for editing my java files - at least > > initially. I don't know how to setup Sun One Studio to find my > > biojava-1.4.jar file, I'm not even sure how to test if it can find it > > correctly. Any help on these issues would be gratefully received. As I > said > > I'm a newbie - bear with me! > > > > Cheers > > Nathan > > > > ------------------------------------------------------------------------ > ---- > > ------ > > Dr. Nathan S. Haigh > > Bioinformatics PostDoctoral Research Associate > > > > Room B2 211 Tel: +44 (0)114 > 22 > > 20112 > > Department of Animal and Plant Sciences Mob: +44 (0)7742 > 533 > > 569 > > University of Sheffield Fax: +44 (0)114 > 22 > > 20002 > > Western Bank Web: > > www.bioinf.shef.ac.uk > > Sheffield > > www.petraea.shef.ac.uk > > S10 2TN > > ------------------------------------------------------------------------ > ---- > > ------ > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0615-2, 12/04/2006 > > Tested on: 27/04/2006 14:48:56 > > avast! - copyright (c) 1988-2006 ALWIL Software. > > http://www.avast.com > > > > > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 16:00:23 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From n.haigh at sheffield.ac.uk Thu Apr 27 15:12:34 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 16:12:34 +0100 Subject: [Biojava-l] Creating my own classes Message-ID: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196> I?m trying to learn/think about OO programming as I?m learning Java and port a Perl app into Java ? could you tell me if this sounds reasonable for writing some of my own classes!? My application essentially defines sets of positions from an alignment - I call them CHARSETs as they are analogous to CHARSETs in the Nexus file format. I believe in Biojava the Locations object/interface (sorry, not familiar enough with correct terminology yet) is essentially the same sort of thing. In my app, the user can use several approaches to define a CHARSET e.g. a CHARSET containing just invariable sites, or a CHARSET containing sites above a given % identity. My question is this, if I were to create a class called Charset, and I create several subclasses called e.g. Invariable etc is this reasonable? Or should the class Charset contain many methods for creating a different type of CHARSET? In my app, a CHARSET needs to be associated with a particular alignment, and settings used to define the CHARSET, so my Charset class have variables such as an Alignment object, Locations objects etc. I?d like to write a method that returns a subalignment based on the CHARSETs associated alignment object and Locations object but I?m not sure how to do this. Thanks for any help/comments/corrections/critiques Nathan ---------------------------------------------------------------------------- ------ Dr. Nathan S. Haigh Bioinformatics PostDoctoral Research Associate ? Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22 20112 Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533 569 University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22 20002 Western Bank???????????????????????????? ?????? ?????? Web: www.bioinf.shef.ac.uk Sheffield??????????????????????????????? ?????? www.petraea.shef.ac.uk S10 2TN????????????????????????????????? ?????? ---------------------------------------------------------------------------- ------ --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 16:12:34 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From richard.holland at ebi.ac.uk Thu Apr 27 15:36:51 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Apr 2006 16:36:51 +0100 Subject: [Biojava-l] Creating my own classes In-Reply-To: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196> References: <003701c66a0d$037cafa0$9f5ea78f@bmbpc196> Message-ID: <1146152212.3955.24.camel@texas.ebi.ac.uk> On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > My application essentially defines sets of positions from an alignment - I > call them CHARSETs as they are analogous to CHARSETs in the Nexus file > format. I believe in Biojava the Locations object/interface (sorry, not > familiar enough with correct terminology yet) is essentially the same sort > of thing. In my app, the user can use several approaches to define a CHARSET > e.g. a CHARSET containing just invariable sites, or a CHARSET containing > sites above a given % identity. You'd be right there. A Location in BioJava represents a range of positions. > My question is this, if I were to create a class called Charset, and I > create several subclasses called e.g. Invariable etc is this reasonable? Or > should the class Charset contain many methods for creating a different type > of CHARSET? My suggestion would be create an interface called Charset, which defines behaviour which you expect all types of Charset to exhibit. Then, implement a number of classes which implement this interface, one for each type of Charset you have, which each add their own methods or special behaviour. If a lot of the behaviour is common, you can define an abstract class called something like AbstractCharset which defines this common behaviour, and have the others extend it. > In my app, a CHARSET needs to be associated with a particular alignment, and > settings used to define the CHARSET, so my Charset class have variables such > as an Alignment object, Locations objects etc. I?d like to write a method > that returns a subalignment based on the CHARSETs associated alignment > object and Locations object but I?m not sure how to do this. BioJava Alignment objects implement the SymbolList interface, which means you can use all the methods from SymbolList to work with the Alignment, including the subList() method. cheers, Richard -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From n.haigh at sheffield.ac.uk Thu Apr 27 15:44:05 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 16:44:05 +0100 Subject: [Biojava-l] Creating my own classes In-Reply-To: <1146152212.3955.24.camel@texas.ebi.ac.uk> Message-ID: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196> Thanks Richard, I'll think about this and try to do some deciphering. The only thing I'm in need of help for is possibly some actual code that would take an Alignment object and return a subalignment based on the positions specified in a Locations object - it's difficult to make sense of a new language until you start to pick up some of the basics. Thanks Nathan > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 27 April 2006 16:37 > To: n.haigh at sheffield.ac.uk > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] Creating my own classes > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > > My application essentially defines sets of positions from an alignment - > I > > call them CHARSETs as they are analogous to CHARSETs in the Nexus file > > format. I believe in Biojava the Locations object/interface (sorry, not > > familiar enough with correct terminology yet) is essentially the same > sort > > of thing. In my app, the user can use several approaches to define a > CHARSET > > e.g. a CHARSET containing just invariable sites, or a CHARSET containing > > sites above a given % identity. > > You'd be right there. A Location in BioJava represents a range of > positions. > > > My question is this, if I were to create a class called Charset, and I > > create several subclasses called e.g. Invariable etc is this reasonable? > Or > > should the class Charset contain many methods for creating a different > type > > of CHARSET? > > My suggestion would be create an interface called Charset, which defines > behaviour which you expect all types of Charset to exhibit. Then, > implement a number of classes which implement this interface, one for > each type of Charset you have, which each add their own methods or > special behaviour. If a lot of the behaviour is common, you can define > an abstract class called something like AbstractCharset which defines > this common behaviour, and have the others extend it. > > > In my app, a CHARSET needs to be associated with a particular alignment, > and > > settings used to define the CHARSET, so my Charset class have variables > such > > as an Alignment object, Locations objects etc. I'd like to write a > method > > that returns a subalignment based on the CHARSETs associated alignment > > object and Locations object but I'm not sure how to do this. > > BioJava Alignment objects implement the SymbolList interface, which > means you can use all the methods from SymbolList to work with the > Alignment, including the subList() method. > > cheers, > Richard > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 16:44:04 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From richard.holland at ebi.ac.uk Thu Apr 27 15:55:39 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Apr 2006 16:55:39 +0100 Subject: [Biojava-l] Creating my own classes In-Reply-To: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196> References: <000201c66a11$6a4b35e0$9f5ea78f@bmbpc196> Message-ID: <1146153339.3955.30.camel@texas.ebi.ac.uk> Given some existing Location object (let's called it 'loc'), and an existing Alignment (hypothetically called 'algn'), you can do this: // Obtain the labels of all the sequences in the alignment. Set labels = new HashSet(); labels.addAll(algn.getLabels()); // Obtain a sub-alignment including all the sequences in the // original alignment. Alignment subAlignment = algn.subAlignment(labels, loc); cheers, Richard On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote: > Thanks Richard, > > I'll think about this and try to do some deciphering. The only thing I'm in > need of help for is possibly some actual code that would take an Alignment > object and return a subalignment based on the positions specified in a > Locations object - it's difficult to make sense of a new language until you > start to pick up some of the basics. > > Thanks > Nathan > > > -----Original Message----- > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > > Sent: 27 April 2006 16:37 > > To: n.haigh at sheffield.ac.uk > > Cc: biojava-l at lists.open-bio.org > > Subject: Re: [Biojava-l] Creating my own classes > > > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > > > My application essentially defines sets of positions from an alignment - > > I > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus file > > > format. I believe in Biojava the Locations object/interface (sorry, not > > > familiar enough with correct terminology yet) is essentially the same > > sort > > > of thing. In my app, the user can use several approaches to define a > > CHARSET > > > e.g. a CHARSET containing just invariable sites, or a CHARSET containing > > > sites above a given % identity. > > > > You'd be right there. A Location in BioJava represents a range of > > positions. > > > > > My question is this, if I were to create a class called Charset, and I > > > create several subclasses called e.g. Invariable etc is this reasonable? > > Or > > > should the class Charset contain many methods for creating a different > > type > > > of CHARSET? > > > > My suggestion would be create an interface called Charset, which defines > > behaviour which you expect all types of Charset to exhibit. Then, > > implement a number of classes which implement this interface, one for > > each type of Charset you have, which each add their own methods or > > special behaviour. If a lot of the behaviour is common, you can define > > an abstract class called something like AbstractCharset which defines > > this common behaviour, and have the others extend it. > > > > > In my app, a CHARSET needs to be associated with a particular alignment, > > and > > > settings used to define the CHARSET, so my Charset class have variables > > such > > > as an Alignment object, Locations objects etc. I'd like to write a > > method > > > that returns a subalignment based on the CHARSETs associated alignment > > > object and Locations object but I'm not sure how to do this. > > > > BioJava Alignment objects implement the SymbolList interface, which > > means you can use all the methods from SymbolList to work with the > > Alignment, including the subList() method. > > > > cheers, > > Richard > > > > -- > > Richard Holland (BioMart Team) > > EMBL-EBI > > Wellcome Trust Genome Campus > > Hinxton > > Cambridge CB10 1SD > > UNITED KINGDOM > > Tel: +44-(0)1223-494416 > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0615-2, 12/04/2006 > Tested on: 27/04/2006 16:44:04 > avast! - copyright (c) 1988-2006 ALWIL Software. > http://www.avast.com > > > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From n.haigh at sheffield.ac.uk Thu Apr 27 16:00:09 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 27 Apr 2006 17:00:09 +0100 Subject: [Biojava-l] Creating my own classes In-Reply-To: <1146153339.3955.30.camel@texas.ebi.ac.uk> Message-ID: <000d01c66a13$a8b51380$9f5ea78f@bmbpc196> Fantastic stuff - again, I'll look into this over the coming weeks (I actually have annual leave for a week, so my flurry of e-mail will have to stop for now. Thanks again! Nathan > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 27 April 2006 16:56 > To: n.haigh at sheffield.ac.uk > Cc: biojava-l at lists.open-bio.org > Subject: RE: [Biojava-l] Creating my own classes > > Given some existing Location object (let's called it 'loc'), and an > existing Alignment (hypothetically called 'algn'), you can do this: > > // Obtain the labels of all the sequences in the alignment. > Set labels = new HashSet(); > labels.addAll(algn.getLabels()); > // Obtain a sub-alignment including all the sequences in the > // original alignment. > Alignment subAlignment = algn.subAlignment(labels, loc); > > cheers, > Richard > > > On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote: > > Thanks Richard, > > > > I'll think about this and try to do some deciphering. The only thing I'm > in > > need of help for is possibly some actual code that would take an > Alignment > > object and return a subalignment based on the positions specified in a > > Locations object - it's difficult to make sense of a new language until > you > > start to pick up some of the basics. > > > > Thanks > > Nathan > > > > > -----Original Message----- > > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > > > Sent: 27 April 2006 16:37 > > > To: n.haigh at sheffield.ac.uk > > > Cc: biojava-l at lists.open-bio.org > > > Subject: Re: [Biojava-l] Creating my own classes > > > > > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > > > > My application essentially defines sets of positions from an > alignment - > > > I > > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus > file > > > > format. I believe in Biojava the Locations object/interface (sorry, > not > > > > familiar enough with correct terminology yet) is essentially the > same > > > sort > > > > of thing. In my app, the user can use several approaches to define a > > > CHARSET > > > > e.g. a CHARSET containing just invariable sites, or a CHARSET > containing > > > > sites above a given % identity. > > > > > > You'd be right there. A Location in BioJava represents a range of > > > positions. > > > > > > > My question is this, if I were to create a class called Charset, and > I > > > > create several subclasses called e.g. Invariable etc is this > reasonable? > > > Or > > > > should the class Charset contain many methods for creating a > different > > > type > > > > of CHARSET? > > > > > > My suggestion would be create an interface called Charset, which > defines > > > behaviour which you expect all types of Charset to exhibit. Then, > > > implement a number of classes which implement this interface, one for > > > each type of Charset you have, which each add their own methods or > > > special behaviour. If a lot of the behaviour is common, you can define > > > an abstract class called something like AbstractCharset which defines > > > this common behaviour, and have the others extend it. > > > > > > > In my app, a CHARSET needs to be associated with a particular > alignment, > > > and > > > > settings used to define the CHARSET, so my Charset class have > variables > > > such > > > > as an Alignment object, Locations objects etc. I'd like to write a > > > method > > > > that returns a subalignment based on the CHARSETs associated > alignment > > > > object and Locations object but I'm not sure how to do this. > > > > > > BioJava Alignment objects implement the SymbolList interface, which > > > means you can use all the methods from SymbolList to work with the > > > Alignment, including the subList() method. > > > > > > cheers, > > > Richard > > > > > > -- > > > Richard Holland (BioMart Team) > > > EMBL-EBI > > > Wellcome Trust Genome Campus > > > Hinxton > > > Cambridge CB10 1SD > > > UNITED KINGDOM > > > Tel: +44-(0)1223-494416 > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0615-2, 12/04/2006 > > Tested on: 27/04/2006 16:44:04 > > avast! - copyright (c) 1988-2006 ALWIL Software. > > http://www.avast.com > > > > > > > > > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 17:00:06 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From david at autohandle.com Thu Apr 27 17:10:08 2006 From: david at autohandle.com (David Scott) Date: Thu, 27 Apr 2006 10:10:08 -0700 Subject: [Biojava-l] hibernate-xml mapping Message-ID: <4450FAF0.9070206@autohandle.com> what is the xml mapping in the hibernate files based on? From mark.schreiber at novartis.com Fri Apr 28 02:05:44 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 28 Apr 2006 10:05:44 +0800 Subject: [Biojava-l] Creating my own classes Message-ID: An excellent book on OO and Java is Thinking in Java by Bruce Eckell. If you come from a C or Perl background it will change the way you think about programming. You can get online versions for free, most good bookstores have hardcopies as well. - Mark "Nathan S. Haigh" Sent by: biojava-l-bounces at lists.open-bio.org 04/28/2006 12:00 AM Please respond to n.haigh To: "'Richard Holland'" cc: biojava-l at lists.open-bio.org, (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Creating my own classes Fantastic stuff - again, I'll look into this over the coming weeks (I actually have annual leave for a week, so my flurry of e-mail will have to stop for now. Thanks again! Nathan > -----Original Message----- > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > Sent: 27 April 2006 16:56 > To: n.haigh at sheffield.ac.uk > Cc: biojava-l at lists.open-bio.org > Subject: RE: [Biojava-l] Creating my own classes > > Given some existing Location object (let's called it 'loc'), and an > existing Alignment (hypothetically called 'algn'), you can do this: > > // Obtain the labels of all the sequences in the alignment. > Set labels = new HashSet(); > labels.addAll(algn.getLabels()); > // Obtain a sub-alignment including all the sequences in the > // original alignment. > Alignment subAlignment = algn.subAlignment(labels, loc); > > cheers, > Richard > > > On Thu, 2006-04-27 at 16:44 +0100, Nathan S. Haigh wrote: > > Thanks Richard, > > > > I'll think about this and try to do some deciphering. The only thing I'm > in > > need of help for is possibly some actual code that would take an > Alignment > > object and return a subalignment based on the positions specified in a > > Locations object - it's difficult to make sense of a new language until > you > > start to pick up some of the basics. > > > > Thanks > > Nathan > > > > > -----Original Message----- > > > From: Richard Holland [mailto:richard.holland at ebi.ac.uk] > > > Sent: 27 April 2006 16:37 > > > To: n.haigh at sheffield.ac.uk > > > Cc: biojava-l at lists.open-bio.org > > > Subject: Re: [Biojava-l] Creating my own classes > > > > > > On Thu, 2006-04-27 at 16:12 +0100, Nathan S. Haigh wrote: > > > > My application essentially defines sets of positions from an > alignment - > > > I > > > > call them CHARSETs as they are analogous to CHARSETs in the Nexus > file > > > > format. I believe in Biojava the Locations object/interface (sorry, > not > > > > familiar enough with correct terminology yet) is essentially the > same > > > sort > > > > of thing. In my app, the user can use several approaches to define a > > > CHARSET > > > > e.g. a CHARSET containing just invariable sites, or a CHARSET > containing > > > > sites above a given % identity. > > > > > > You'd be right there. A Location in BioJava represents a range of > > > positions. > > > > > > > My question is this, if I were to create a class called Charset, and > I > > > > create several subclasses called e.g. Invariable etc is this > reasonable? > > > Or > > > > should the class Charset contain many methods for creating a > different > > > type > > > > of CHARSET? > > > > > > My suggestion would be create an interface called Charset, which > defines > > > behaviour which you expect all types of Charset to exhibit. Then, > > > implement a number of classes which implement this interface, one for > > > each type of Charset you have, which each add their own methods or > > > special behaviour. If a lot of the behaviour is common, you can define > > > an abstract class called something like AbstractCharset which defines > > > this common behaviour, and have the others extend it. > > > > > > > In my app, a CHARSET needs to be associated with a particular > alignment, > > > and > > > > settings used to define the CHARSET, so my Charset class have > variables > > > such > > > > as an Alignment object, Locations objects etc. I'd like to write a > > > method > > > > that returns a subalignment based on the CHARSETs associated > alignment > > > > object and Locations object but I'm not sure how to do this. > > > > > > BioJava Alignment objects implement the SymbolList interface, which > > > means you can use all the methods from SymbolList to work with the > > > Alignment, including the subList() method. > > > > > > cheers, > > > Richard > > > > > > -- > > > Richard Holland (BioMart Team) > > > EMBL-EBI > > > Wellcome Trust Genome Campus > > > Hinxton > > > Cambridge CB10 1SD > > > UNITED KINGDOM > > > Tel: +44-(0)1223-494416 > > > > --- > > avast! Antivirus: Outbound message clean. > > Virus Database (VPS): 0615-2, 12/04/2006 > > Tested on: 27/04/2006 16:44:04 > > avast! - copyright (c) 1988-2006 ALWIL Software. > > http://www.avast.com > > > > > > > > > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0615-2, 12/04/2006 Tested on: 27/04/2006 17:00:06 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Fri Apr 28 02:06:31 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 28 Apr 2006 10:06:31 +0800 Subject: [Biojava-l] hibernate-xml mapping Message-ID: It is based on the BioSQL schema - Mark David Scott Sent by: biojava-l-bounces at lists.open-bio.org 04/28/2006 01:10 AM To: Biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] hibernate-xml mapping what is the xml mapping in the hibernate files based on? _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From ilhami.visne at gmail.com Fri Apr 28 09:09:56 2006 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Fri, 28 Apr 2006 11:09:56 +0200 Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi Message-ID: i got a file in fasta format, which is not encoded in ansi. but it seems ok. it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta i tried to read it with SeqIOTools.readFastaDNA and this exception was thrown: org.biojava.bio.BioException: Could not read sequence at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java :104) .............. .............. Caused by: java.io.IOException: Stream does not appear to contain FASTA formatted data: ??> org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112) at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:101) "??>" there is no row like this but it seems it is hidden. How should i handle such files? thax in advance. From richard.holland at ebi.ac.uk Fri Apr 28 10:37:35 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 28 Apr 2006 11:37:35 +0100 Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi In-Reply-To: References: Message-ID: <1146220656.3955.46.camel@texas.ebi.ac.uk> I've no idea what binary format that file is in - it contains some very strange characters. It appears to contain _some_ ANSI data but with extra binary bits added to the start and end. I think you need to check the program that generated the file as it is obviously not doing what it is supposed to. Your best bet is to convert the file to ANSI or some other format understood out-of-the-box by Java. cheers, Richard On Fri, 2006-04-28 at 11:09 +0200, Ilhami Visne wrote: > i got a file in fasta format, which is not encoded in ansi. but it seems ok. > it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta > i tried to read it with SeqIOTools.readFastaDNA and this exception was > thrown: > > org.biojava.bio.BioException: Could not read sequence > at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java > :104) > .............. > .............. > Caused by: java.io.IOException: Stream does not appear to contain FASTA > formatted data: ??> > org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112) > at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:101) > > "??>" there is no row like this but it seems it is hidden. > > How should i handle such files? > > thax in advance. > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From ilhami.visne at gmail.com Fri Apr 28 09:29:07 2006 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Fri, 28 Apr 2006 11:29:07 +0200 Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi Message-ID: i got a file in fasta format, which is not encoded in ansi. but it seems ok. it can be downloaded here: http://stud3.tuwien.ac.at/~e0125935/try3.fasta i tried to read it with SeqIOTools.readFastaDNA and this exception was thrown: org.biojava.bio.BioException: Could not read sequence at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java :104) .............. .............. Caused by: java.io.IOException: Stream does not appear to contain FASTA formatted data: ??> org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:112) at org.biojava.bio.seq.io.StreamReader.nextSequence (StreamReader.java:101) "??>" there is no row like this but it seems it is hidden. How should i handle such files? thax in advance. From richard.holland at ebi.ac.uk Fri Apr 28 13:19:30 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 28 Apr 2006 14:19:30 +0100 Subject: [Biojava-l] Reading a fasta file which is not encoded in ansi In-Reply-To: References: <1146220656.3955.46.camel@texas.ebi.ac.uk> Message-ID: <1146230371.3955.59.camel@texas.ebi.ac.uk> Thinking about this a bit more, I think you meant ASCII when you said ANSI? FASTA format is very strictly defined. It is a file containing a number sequences each with their own header, which starts with a '>' symbol. You can indeed use any character you like within the header, which ends at the first new-line after the '>' (newline is ASCII 10 or 13, or both, depending on your OS). No whitespace is allowed at the start or end of the file or between or within sequences. The problem with your file is that the unusual characters are appearing at the start of the file before the first header, and maybe also during the sequence itself although I didn't look that closely. Hence it breaks the FASTA format specification. The problem here lies with the program that is generating your FASTA file. BioJava is behaving correctly. cheers, Richard On Fri, 2006-04-28 at 15:00 +0200, Ilhami Visne wrote: > I thought already to convert the file to ANSI. Sequence part must > contain only ansi-chararacters but header or other annotaion must not > contain only ansi characters. if i convert it to ansi, doesn't it may > cause to lose some data? > > On 4/28/06, Richard Holland wrote: > I've no idea what binary format that file is in - it contains > some very > strange characters. It appears to contain _some_ ANSI data but > with > extra binary bits added to the start and end. I think you need > to check > the program that generated the file as it is obviously not > doing what it > is supposed to. > > Your best bet is to convert the file to ANSI or some other > format > understood out-of-the-box by Java. > > cheers, > Richard > > On Fri, 2006-04-28 at 11:09 +0200, Ilhami Visne wrote: > > i got a file in fasta format, which is not encoded in ansi. > but it seems ok. > > it can be downloaded here: > http://stud3.tuwien.ac.at/~e0125935/try3.fasta > > i tried to read it with SeqIOTools.readFastaDNA and this > exception was > > thrown: > > > > org.biojava.bio.BioException: Could not read sequence > > at org.biojava.bio.seq.io.StreamReader.nextSequence > (StreamReader.java > > :104) > > .............. > > .............. > > Caused by: java.io.IOException: Stream does not appear to > contain FASTA > > formatted data: ??> > > org.biojava.bio.seq.io.FastaFormat.readSequence > (FastaFormat.java:112) > > at org.biojava.bio.seq.io.StreamReader.nextSequence > (StreamReader.java:101) > > > > "??>" there is no row like this but it seems it is hidden. > > > > How should i handle such files? > > > > thax in advance. > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416