From jolyon.holdstock at ogt.co.uk Tue Jan 3 09:21:46 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Tue Jan 3 11:17:52 2006 Subject: [Biojava-l] Embl parser problem Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F83C2C3@EUCLID.internal.ogtip.com> Hi, I have an application using BioJava1.4pre1.4 that loads an embl or genbank file. If I load an embl file via the genbank option a BioException error is thrown. But if I load a genbank file via the embl option no BioException is thrown and the sequence is created although it is not correct e.g. sequence.length() returns 0 An example of code using the sequence file from the BioJava demos String fileName = "C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1 21903.genbank"; try { seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); System.out.println("URN: " + seq.getURN()); System.out.println("Length: " + seq.length()); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } The output is: URN: sequence/embl:SION Length: 0 If I use the matching embl sequence from the demos the output is: URN: sequence/embl:AL121903 Length: 80600 I've used BioJava1.4 with the same outcome. Should I be parsing the file an alternative way? Thanks, Jolyon From mark.schreiber at novartis.com Tue Jan 3 20:09:49 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jan 3 20:13:15 2006 Subject: [Biojava-l] Embl parser problem Message-ID: Hi - A BioException would be expected when parsing an embl file via the genbank option. I is surprising you don't get one when parsing a genbank file via the embl option although it clearly has not worked properly. You should only ever parse a file with the appropriate read method. Please note that if you have access to CVS you could download the development version of the new parsers (biojavax) which do a much better job. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/03/2006 10:21 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Embl parser problem Hi, I have an application using BioJava1.4pre1.4 that loads an embl or genbank file. If I load an embl file via the genbank option a BioException error is thrown. But if I load a genbank file via the embl option no BioException is thrown and the sequence is created although it is not correct e.g. sequence.length() returns 0 An example of code using the sequence file from the BioJava demos String fileName = "C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1 21903.genbank"; try { seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); System.out.println("URN: " + seq.getURN()); System.out.println("Length: " + seq.length()); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } The output is: URN: sequence/embl:SION Length: 0 If I use the matching embl sequence from the demos the output is: URN: sequence/embl:AL121903 Length: 80600 I've used BioJava1.4 with the same outcome. Should I be parsing the file an alternative way? Thanks, Jolyon _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From jolyon.holdstock at ogt.co.uk Wed Jan 4 05:54:25 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Wed Jan 4 05:51:07 2006 Subject: [Biojava-l] Embl parser problem[Scanned] Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F8A1F08@EUCLID.internal.ogtip.com> Thanks for the help. I have downloaded the dev version and tried to build it. I have no experience with Ant (I'm running v1.6.1) and the build fails. The output from this is: Buildfile: build.xml init: [echo] Building biojava-live [echo] Java Home: c:\j2sdk1.4.2_04\jre [echo] JUnit present: ${junit.present} [echo] JUnit supported by Ant: true [echo] HSQLDB driver present: ${sqlDriver.hsqldb} prepare: prepare-biojava: compile-biojava: [javac] Compiling 1279 source files to C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava [javac] C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] private ProjectedFeatureHolder pfh; [javac] ^ [javac] C:\Downloads\Java\BioJava\biojava- live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] pfh = new ProjectedFeatureHolder(new TranslateFlipContext(this,seq,seq.length()+1,true)); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -deprecation for details. [javac] 2 errors -----Original Message----- From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Sent: 04 January 2006 01:10 To: Jolyon Holdstock Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Subject: Re: [Biojava-l] Embl parser problem[Scanned] Hi - A BioException would be expected when parsing an embl file via the genbank option. I is surprising you don't get one when parsing a genbank file via the embl option although it clearly has not worked properly. You should only ever parse a file with the appropriate read method. Please note that if you have access to CVS you could download the development version of the new parsers (biojavax) which do a much better job. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/03/2006 10:21 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Embl parser problem Hi, I have an application using BioJava1.4pre1.4 that loads an embl or genbank file. If I load an embl file via the genbank option a BioException error is thrown. But if I load a genbank file via the embl option no BioException is thrown and the sequence is created although it is not correct e.g. sequence.length() returns 0 An example of code using the sequence file from the BioJava demos String fileName = "C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1 21903.genbank"; try { seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); System.out.println("URN: " + seq.getURN()); System.out.println("Length: " + seq.length()); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } The output is: URN: sequence/embl:SION Length: 0 If I use the matching embl sequence from the demos the output is: URN: sequence/embl:AL121903 Length: 80600 I've used BioJava1.4 with the same outcome. Should I be parsing the file an alternative way? Thanks, Jolyon _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. From alexg at compugen.co.il Wed Jan 4 13:06:41 2006 From: alexg at compugen.co.il (Alex Golubev) Date: Wed Jan 4 13:14:58 2006 Subject: [Biojava-l] amino acid to nucleic acid alignment Message-ID: <5F0B7D17FC20E7489AAEE5043D6390EC03288A@cmail.il.cgen.biz> Hi, I'm trying to align amino acids to nucleic acids. I'm using gapped sequences both for the protein and for the DNA. I have several problems and I would very appreciate if someone could help. 1. How can I parse DNA nucleic acids and get codons. I would like to start with DNA that look like this "ATGTAT" and get a protein that look like this "MY". I'm using "Alphabet alpha = DNATools.getCodonAlphabet();" but I can't find tokenization to parse the DNA string (does this make any sense?). 2. My other problem is that there are frame shifts and my gapped DNA look actually like this "AT-G-TAT". Is there any way to get/translate locations from the codon symbols list to/from the DNA symbols list? I would appreciate any clue whether all of this make any sense. Thanks, Alex Golubev. From smh1008 at cam.ac.uk Wed Jan 4 15:52:19 2006 From: smh1008 at cam.ac.uk (David Huen) Date: Wed Jan 4 16:07:53 2006 Subject: [Biojava-l] amino acid to nucleic acid alignment In-Reply-To: <5F0B7D17FC20E7489AAEE5043D6390EC03288A@cmail.il.cgen.biz> References: <5F0B7D17FC20E7489AAEE5043D6390EC03288A@cmail.il.cgen.biz> Message-ID: On Jan 4 2006, Alex Golubev wrote: >Hi, > > I'm trying to align amino acids to nucleic acids. I'm using gapped > sequences both for the protein and for the DNA. I have several problems > and I would very appreciate if someone could help. 1. How can I parse DNA > nucleic acids and get codons. I would like to start with DNA that look > like this "ATGTAT" and get a protein that look like this "MY". I'm using > "Alphabet alpha = DNATools.getCodonAlphabet();" but I can't find > tokenization to parse the DNA string (does this make any sense?). You can convert a SymbolList in the DNA alphabet into the equivalent symbol list in the codon alphabet (DNAxDNAxDNA) by using SymbolListViews.orderNSymbolList(...). > 2. My > other problem is that there are frame shifts and my gapped DNA look > actually like this "AT-G-TAT". Is there any way to get/translate > locations from the codon symbols list to/from the DNA symbols list? > Ouch. What do you really want to do here? Regards, David Huen From mark.schreiber at novartis.com Wed Jan 4 20:22:21 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jan 4 20:19:06 2006 Subject: [Biojava-l] Embl parser problem[Scanned] Message-ID: Hi - When you do the CVS update or checkout make sure you use the -Pd options. The -d option prunes empty directories (old stuff not included in biojava-live anymore). It seems that you have got both an old copy and a new copy of the projected feature holder. The -P option pulls new directories (new packages since your last update). Or maybe i've got them mixed up, anyhow use both. Try doing a CVS upate -Pd and then running ant. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/04/2006 06:54 PM To: , cc: (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] Embl parser problem[Scanned] Thanks for the help. I have downloaded the dev version and tried to build it. I have no experience with Ant (I'm running v1.6.1) and the build fails. The output from this is: Buildfile: build.xml init: [echo] Building biojava-live [echo] Java Home: c:\j2sdk1.4.2_04\jre [echo] JUnit present: ${junit.present} [echo] JUnit supported by Ant: true [echo] HSQLDB driver present: ${sqlDriver.hsqldb} prepare: prepare-biojava: compile-biojava: [javac] Compiling 1279 source files to C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava [javac] C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] private ProjectedFeatureHolder pfh; [javac] ^ [javac] C:\Downloads\Java\BioJava\biojava- live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] pfh = new ProjectedFeatureHolder(new TranslateFlipContext(this,seq,seq.length()+1,true)); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -deprecation for details. [javac] 2 errors -----Original Message----- From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Sent: 04 January 2006 01:10 To: Jolyon Holdstock Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Subject: Re: [Biojava-l] Embl parser problem[Scanned] Hi - A BioException would be expected when parsing an embl file via the genbank option. I is surprising you don't get one when parsing a genbank file via the embl option although it clearly has not worked properly. You should only ever parse a file with the appropriate read method. Please note that if you have access to CVS you could download the development version of the new parsers (biojavax) which do a much better job. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/03/2006 10:21 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Embl parser problem Hi, I have an application using BioJava1.4pre1.4 that loads an embl or genbank file. If I load an embl file via the genbank option a BioException error is thrown. But if I load a genbank file via the embl option no BioException is thrown and the sequence is created although it is not correct e.g. sequence.length() returns 0 An example of code using the sequence file from the BioJava demos String fileName = "C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1 21903.genbank"; try { seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); System.out.println("URN: " + seq.getURN()); System.out.println("Length: " + seq.length()); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } The output is: URN: sequence/embl:SION Length: 0 If I use the matching embl sequence from the demos the output is: URN: sequence/embl:AL121903 Length: 80600 I've used BioJava1.4 with the same outcome. Should I be parsing the file an alternative way? Thanks, Jolyon _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From jolyon.holdstock at ogt.co.uk Thu Jan 5 05:07:45 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Thu Jan 5 05:41:12 2006 Subject: [Biojava-l] Embl parser problem[Scanned] Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F8A1FEC@EUCLID.internal.ogtip.com> Hi I ran cvs update -Pd and then repeated the Ant command. I can see it has updated as I'm trying to compile an extra source file [javac] Compiling 1280 source files But the build fails with the same error. Is there a work around I could use? Thanks Jolyon -----Original Message----- From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Sent: 05 January 2006 01:22 To: Jolyon Holdstock Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Subject: RE: [Biojava-l] Embl parser problem[Scanned] Hi - When you do the CVS update or checkout make sure you use the -Pd options. The -d option prunes empty directories (old stuff not included in biojava-live anymore). It seems that you have got both an old copy and a new copy of the projected feature holder. The -P option pulls new directories (new packages since your last update). Or maybe i've got them mixed up, anyhow use both. Try doing a CVS upate -Pd and then running ant. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/04/2006 06:54 PM To: , cc: (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] Embl parser problem[Scanned] Thanks for the help. I have downloaded the dev version and tried to build it. I have no experience with Ant (I'm running v1.6.1) and the build fails. The output from this is: Buildfile: build.xml init: [echo] Building biojava-live [echo] Java Home: c:\j2sdk1.4.2_04\jre [echo] JUnit present: ${junit.present} [echo] JUnit supported by Ant: true [echo] HSQLDB driver present: ${sqlDriver.hsqldb} prepare: prepare-biojava: compile-biojava: [javac] Compiling 1279 source files to C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava [javac] C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] private ProjectedFeatureHolder pfh; [javac] ^ [javac] C:\Downloads\Java\BioJava\biojava- live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] pfh = new ProjectedFeatureHolder(new TranslateFlipContext(this,seq,seq.length()+1,true)); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -deprecation for details. [javac] 2 errors -----Original Message----- From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Sent: 04 January 2006 01:10 To: Jolyon Holdstock Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Subject: Re: [Biojava-l] Embl parser problem[Scanned] Hi - A BioException would be expected when parsing an embl file via the genbank option. I is surprising you don't get one when parsing a genbank file via the embl option although it clearly has not worked properly. You should only ever parse a file with the appropriate read method. Please note that if you have access to CVS you could download the development version of the new parsers (biojavax) which do a much better job. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/03/2006 10:21 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Embl parser problem Hi, I have an application using BioJava1.4pre1.4 that loads an embl or genbank file. If I load an embl file via the genbank option a BioException error is thrown. But if I load a genbank file via the embl option no BioException is thrown and the sequence is created although it is not correct e.g. sequence.length() returns 0 An example of code using the sequence file from the BioJava demos String fileName = "C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1 21903.genbank"; try { seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); System.out.println("URN: " + seq.getURN()); System.out.println("Length: " + seq.length()); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } The output is: URN: sequence/embl:SION Length: 0 If I use the matching embl sequence from the demos the output is: URN: sequence/embl:AL121903 Length: 80600 I've used BioJava1.4 with the same outcome. Should I be parsing the file an alternative way? Thanks, Jolyon _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. From mark.schreiber at novartis.com Thu Jan 5 20:10:44 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Jan 5 20:07:35 2006 Subject: [Biojava-l] Embl parser problem[Scanned] Message-ID: There should only be one copy of the ProjectedFeatureHolder (org.biojava.bio.seq.projection.ProjectedFeatureHolder), Try deleting your biojava-live directory and doing a fresh checkout, make sure you use the -Pd options during the checkout. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/05/2006 06:07 PM To: Mark Schreiber/GP/Novartis@PH cc: biojava-l-bounces@portal.open-bio.org, biojava-l@biojava.org Subject: RE: [Biojava-l] Embl parser problem[Scanned] Hi I ran cvs update -Pd and then repeated the Ant command. I can see it has updated as I'm trying to compile an extra source file [javac] Compiling 1280 source files But the build fails with the same error. Is there a work around I could use? Thanks Jolyon -----Original Message----- From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Sent: 05 January 2006 01:22 To: Jolyon Holdstock Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Subject: RE: [Biojava-l] Embl parser problem[Scanned] Hi - When you do the CVS update or checkout make sure you use the -Pd options. The -d option prunes empty directories (old stuff not included in biojava-live anymore). It seems that you have got both an old copy and a new copy of the projected feature holder. The -P option pulls new directories (new packages since your last update). Or maybe i've got them mixed up, anyhow use both. Try doing a CVS upate -Pd and then running ant. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/04/2006 06:54 PM To: , cc: (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] Embl parser problem[Scanned] Thanks for the help. I have downloaded the dev version and tried to build it. I have no experience with Ant (I'm running v1.6.1) and the build fails. The output from this is: Buildfile: build.xml init: [echo] Building biojava-live [echo] Java Home: c:\j2sdk1.4.2_04\jre [echo] JUnit present: ${junit.present} [echo] JUnit supported by Ant: true [echo] HSQLDB driver present: ${sqlDriver.hsqldb} prepare: prepare-biojava: compile-biojava: [javac] Compiling 1279 source files to C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava [javac] C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] private ProjectedFeatureHolder pfh; [javac] ^ [javac] C:\Downloads\Java\BioJava\biojava- live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] pfh = new ProjectedFeatureHolder(new TranslateFlipContext(this,seq,seq.length()+1,true)); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -deprecation for details. [javac] 2 errors -----Original Message----- From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Sent: 04 January 2006 01:10 To: Jolyon Holdstock Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Subject: Re: [Biojava-l] Embl parser problem[Scanned] Hi - A BioException would be expected when parsing an embl file via the genbank option. I is surprising you don't get one when parsing a genbank file via the embl option although it clearly has not worked properly. You should only ever parse a file with the appropriate read method. Please note that if you have access to CVS you could download the development version of the new parsers (biojavax) which do a much better job. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/03/2006 10:21 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Embl parser problem Hi, I have an application using BioJava1.4pre1.4 that loads an embl or genbank file. If I load an embl file via the genbank option a BioException error is thrown. But if I load a genbank file via the embl option no BioException is thrown and the sequence is created although it is not correct e.g. sequence.length() returns 0 An example of code using the sequence file from the BioJava demos String fileName = "C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1 21903.genbank"; try { seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); System.out.println("URN: " + seq.getURN()); System.out.println("Length: " + seq.length()); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } The output is: URN: sequence/embl:SION Length: 0 If I use the matching embl sequence from the demos the output is: URN: sequence/embl:AL121903 Length: 80600 I've used BioJava1.4 with the same outcome. Should I be parsing the file an alternative way? Thanks, Jolyon _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From jolyon.holdstock at ogt.co.uk Fri Jan 6 04:56:27 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Fri Jan 6 04:53:01 2006 Subject: [Biojava-l] Embl parser problem[Scanned] Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F8A20FF@EUCLID.internal.ogtip.com> Hi Mark, Thanks for your help. I have deleted the original download and repeated the cvs checkout with the command cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava checkout -P biojava-live I couldn't use -Pd with the checkout command (I'm using cvs 1.11.17). I repeated the build and got the same error. I checked the download and there is only one copy of the ProjectedFeatureHolder in org.biojava.bio.seq.projection where it should be; so I'm not sure why Ant believes there is a second one in org.biojava.bio.seq Jolyon -----Original Message----- From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Sent: 06 January 2006 01:11 To: Jolyon Holdstock Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Subject: RE: [Biojava-l] Embl parser problem[Scanned] There should only be one copy of the ProjectedFeatureHolder (org.biojava.bio.seq.projection.ProjectedFeatureHolder), Try deleting your biojava-live directory and doing a fresh checkout, make sure you use the -Pd options during the checkout. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/05/2006 06:07 PM To: Mark Schreiber/GP/Novartis@PH cc: biojava-l-bounces@portal.open-bio.org, biojava-l@biojava.org Subject: RE: [Biojava-l] Embl parser problem[Scanned] Hi I ran cvs update -Pd and then repeated the Ant command. I can see it has updated as I'm trying to compile an extra source file [javac] Compiling 1280 source files But the build fails with the same error. Is there a work around I could use? Thanks Jolyon -----Original Message----- From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Sent: 05 January 2006 01:22 To: Jolyon Holdstock Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Subject: RE: [Biojava-l] Embl parser problem[Scanned] Hi - When you do the CVS update or checkout make sure you use the -Pd options. The -d option prunes empty directories (old stuff not included in biojava-live anymore). It seems that you have got both an old copy and a new copy of the projected feature holder. The -P option pulls new directories (new packages since your last update). Or maybe i've got them mixed up, anyhow use both. Try doing a CVS upate -Pd and then running ant. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/04/2006 06:54 PM To: , cc: (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] Embl parser problem[Scanned] Thanks for the help. I have downloaded the dev version and tried to build it. I have no experience with Ant (I'm running v1.6.1) and the build fails. The output from this is: Buildfile: build.xml init: [echo] Building biojava-live [echo] Java Home: c:\j2sdk1.4.2_04\jre [echo] JUnit present: ${junit.present} [echo] JUnit supported by Ant: true [echo] HSQLDB driver present: ${sqlDriver.hsqldb} prepare: prepare-biojava: compile-biojava: [javac] Compiling 1279 source files to C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava [javac] C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] private ProjectedFeatureHolder pfh; [javac] ^ [javac] C:\Downloads\Java\BioJava\biojava- live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to ProjectedFeatureHolder is ambiguous, both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in org.biojava.bio.seq.projection and class org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match [javac] pfh = new ProjectedFeatureHolder(new TranslateFlipContext(this,seq,seq.length()+1,true)); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -deprecation for details. [javac] 2 errors -----Original Message----- From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Sent: 04 January 2006 01:10 To: Jolyon Holdstock Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Subject: Re: [Biojava-l] Embl parser problem[Scanned] Hi - A BioException would be expected when parsing an embl file via the genbank option. I is surprising you don't get one when parsing a genbank file via the embl option although it clearly has not worked properly. You should only ever parse a file with the appropriate read method. Please note that if you have access to CVS you could download the development version of the new parsers (biojavax) which do a much better job. - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 01/03/2006 10:21 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Embl parser problem Hi, I have an application using BioJava1.4pre1.4 that loads an embl or genbank file. If I load an embl file via the genbank option a BioException error is thrown. But if I load a genbank file via the embl option no BioException is thrown and the sequence is created although it is not correct e.g. sequence.length() returns 0 An example of code using the sequence file from the BioJava demos String fileName = "C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1 21903.genbank"; try { seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); System.out.println("URN: " + seq.getURN()); System.out.println("Length: " + seq.length()); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } The output is: URN: sequence/embl:SION Length: 0 If I use the matching embl sequence from the demos the output is: URN: sequence/embl:AL121903 Length: 80600 I've used BioJava1.4 with the same outcome. Should I be parsing the file an alternative way? Thanks, Jolyon _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Group of Companies Security Systems. From td2 at sanger.ac.uk Fri Jan 6 04:34:05 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Fri Jan 6 05:07:49 2006 Subject: [Biojava-l] Embl parser problem[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F8A1FEC@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F8A1FEC@EUCLID.internal.ogtip.com> Message-ID: <88BF1DBE-FB06-46F0-84A9-5751FF12307D@sanger.ac.uk> On 5 Jan 2006, at 10:07, Jolyon Holdstock wrote: > Hi > > I ran cvs update -Pd and then repeated the Ant command. > > I can see it has updated as I'm trying to compile an extra source file > > [javac] Compiling 1280 source files > > But the build fails with the same error. > > Is there a work around I could use? I'm wondering if you might have an old version of BioJava lying around on your CLASSPATH or in a JDK extensions directory? There's only one copy of ProjectedFeatureHolder in the source tree but long ago in a galaxy far, far away it used to live in bio.seq rather than bio.seq.projection. I suspect you have a copy that pre-dates this move. Alternatively, you could just update the import statements to import individual classes: import org.biojava.bio.seq.projection.ProjectedFeatureHolder; instead of import org.biojava.bio.seq.projection.*; Thomas. From christoph.gille at charite.de Fri Jan 6 15:37:46 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri Jan 6 15:41:43 2006 Subject: [Biojava-l] tiny problem with converting java 1.5 to 1.4 Message-ID: <51334.192.168.220.203.1136579866.squirrel@webmail.charite.de> Recently I discussed that Biojava could be changed to Java version 1.5 without breaking compatibility since the novel tool Retroweaver allows to run Java 1.5 programs on older JREs. I started to use enums in my program and did not encounter any problems related to retroweaving. However there is one nasty problem which shows up only at runtime: In Java 1.5 but not in 1.4 exists the method StringBuffer#insert(int, CharSequence) In Java 1.4 and 1.5 exists the method. StringBuffer#insert(int, Object) After compiling with the javac version 1.5 and retroweaving one gets a NoSuchMethodError runtime error because #insert(int, CharSequence) does not exist in the 1.4 runtime library. The workaround is simple - just casting StringBuffer to Object so that the method #insert(int, Object) is taken instead of #insert(int, CharSequence). I already told the author of retroweaver. Otherwise retroweaver works very well. From wendy.wong at gmail.com Tue Jan 10 16:00:30 2006 From: wendy.wong at gmail.com (wendy wong) Date: Tue Jan 10 18:48:25 2006 Subject: [Biojava-l] Generalized HMM in biojava? Message-ID: Hi, I was wondering if it is possible to use the biojava library to construct a generalized HMM? thanks, Wendy From mark.schreiber at novartis.com Tue Jan 10 22:39:48 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jan 10 22:36:37 2006 Subject: [Biojava-l] Generalized HMM in biojava? Message-ID: Depending on what you mean by generalized.... You can create lots of custom HMM architechtures using the DP packages of biojava. - Mark wendy wong Sent by: biojava-l-bounces@portal.open-bio.org 01/11/2006 05:00 AM Please respond to sww8 To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Generalized HMM in biojava? Hi, I was wondering if it is possible to use the biojava library to construct a generalized HMM? thanks, Wendy _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From wendy.wong at gmail.com Wed Jan 11 04:37:34 2006 From: wendy.wong at gmail.com (wendy wong) Date: Wed Jan 11 04:58:59 2006 Subject: [Biojava-l] Generalized HMM in biojava? In-Reply-To: References: Message-ID: what I mean by Generalized HMM is that each state emits a sequence of symbols (fixed length though), which doesn't seen very straight forward in biojava? thanks, wendy On 1/11/06, mark.schreiber@novartis.com wrote: > Depending on what you mean by generalized.... > > You can create lots of custom HMM architechtures using the DP packages of > biojava. > > - Mark > > > > > > wendy wong > Sent by: biojava-l-bounces@portal.open-bio.org > 01/11/2006 05:00 AM > Please respond to sww8 > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Generalized HMM in biojava? > > > Hi, > > I was wondering if it is possible to use the biojava library to > construct a generalized HMM? > > thanks, > Wendy > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > From koeberle at mpiib-berlin.mpg.de Wed Jan 11 05:45:17 2006 From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Wed Jan 11 05:48:28 2006 Subject: [Biojava-l] Sort Features Message-ID: <43C4E1BD.3060602@mpiib-berlin.mpg.de> Hi, exists a way to get Features from a FeatureHolder sorted by Location? thanks, Christian -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle@mpiib-berlin.mpg.de From td2 at sanger.ac.uk Wed Jan 11 06:08:11 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Wed Jan 11 06:04:36 2006 Subject: [Biojava-l] Sort Features In-Reply-To: <43C4E1BD.3060602@mpiib-berlin.mpg.de> References: <43C4E1BD.3060602@mpiib-berlin.mpg.de> Message-ID: <7EF216E9-51B0-4E9A-89C6-6291736C8193@sanger.ac.uk> On 11 Jan 2006, at 10:45, Christian K?berle wrote: > Hi, > > exists a way to get Features from a FeatureHolder sorted by Location? You guarantee a specific iteration order from a FeatureHolder (unless you write your own implementation). You can, however, dump some features into a List or Set then sort them there. FeatureHolder fh = ...; List l = new ArrayList(); for (Iterator i = fh.features(); i.hasNext(); ) { l.add(i.next()); } Collections.sort(l, Feature.byLocationOrder); Thomas. From matthew.pocock at ncl.ac.uk Wed Jan 11 06:27:00 2006 From: matthew.pocock at ncl.ac.uk (Matthew Pocock) Date: Wed Jan 11 06:39:44 2006 Subject: [Biojava-l] Generalized HMM in biojava? In-Reply-To: References: Message-ID: <200601111127.00973.matthew.pocock@ncl.ac.uk> If each state emits a fixed number of symbols then you can just do an HMM where the emissions are over alpha^length. If you want the symbols to overlap then use an order-n distribution. Matthew On Wednesday 11 January 2006 09:37, wendy wong wrote: > what I mean by Generalized HMM is that each state emits a sequence of > symbols (fixed length though), which doesn't seen very straight > forward in biojava? > > thanks, > wendy > > On 1/11/06, mark.schreiber@novartis.com wrote: > > Depending on what you mean by generalized.... > > > > You can create lots of custom HMM architechtures using the DP packages of > > biojava. > > > > - Mark > > > > > > > > > > > > wendy wong > > Sent by: biojava-l-bounces@portal.open-bio.org > > 01/11/2006 05:00 AM > > Please respond to sww8 > > > > > > To: biojava-l@biojava.org > > cc: (bcc: Mark Schreiber/GP/Novartis) > > Subject: [Biojava-l] Generalized HMM in biojava? > > > > > > Hi, > > > > I was wondering if it is possible to use the biojava library to > > construct a generalized HMM? > > > > thanks, > > Wendy > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l From srouane at hotmail.com Wed Jan 11 08:36:40 2006 From: srouane at hotmail.com (Simon Rouane) Date: Wed Jan 11 08:50:17 2006 Subject: [Biojava-l] getting involved In-Reply-To: <43C4E1BD.3060602@mpiib-berlin.mpg.de> Message-ID: I'm a commercial Java developer who's worked on a fair few systems integration, LIMS and Datamart implementations in the past and I'd love to get involved in this project. Can anyone give me some hints as to what the first steps are? Thanks, Simon Rouane. From wendy.wong at gmail.com Wed Jan 11 11:03:11 2006 From: wendy.wong at gmail.com (wendy wong) Date: Wed Jan 11 11:06:59 2006 Subject: [Biojava-l] Generalized HMM in biojava? In-Reply-To: <200601111127.00973.matthew.pocock@ncl.ac.uk> References: <200601111127.00973.matthew.pocock@ncl.ac.uk> Message-ID: Thanks! Now I have two questions about the SimpleEmissionState class: 1. advance: I am not entirely sure what it does. So if my state emits 4 symbols at a time do I set it to {4}? 2. Each of my sites can emit up to more than 100 alphabets and if each state emits 4 symbols at a time the number of alphabet for each state is 100^4. I am a bit concerned about setting up the distributions (too much memory consumption?). Is there a function that I can overload so that the probability of each emission alphabet can be calculated on the run? Thanks for your help! wendy On 1/11/06, Matthew Pocock wrote: > If each state emits a fixed number of symbols then you can just do an HMM > where the emissions are over alpha^length. If you want the symbols to overlap > then use an order-n distribution. > > Matthew > > On Wednesday 11 January 2006 09:37, wendy wong wrote: > > what I mean by Generalized HMM is that each state emits a sequence of > > symbols (fixed length though), which doesn't seen very straight > > forward in biojava? > > > > thanks, > > wendy > > > > On 1/11/06, mark.schreiber@novartis.com wrote: > > > Depending on what you mean by generalized.... > > > > > > You can create lots of custom HMM architechtures using the DP packages of > > > biojava. > > > > > > - Mark > > > > > > > > > > > > > > > > > > wendy wong > > > Sent by: biojava-l-bounces@portal.open-bio.org > > > 01/11/2006 05:00 AM > > > Please respond to sww8 > > > > > > > > > To: biojava-l@biojava.org > > > cc: (bcc: Mark Schreiber/GP/Novartis) > > > Subject: [Biojava-l] Generalized HMM in biojava? > > > > > > > > > Hi, > > > > > > I was wondering if it is possible to use the biojava library to > > > construct a generalized HMM? > > > > > > thanks, > > > Wendy > > > > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l@biojava.org > > > http://biojava.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at novartis.com Wed Jan 11 21:46:42 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jan 11 21:43:14 2006 Subject: [Biojava-l] getting involved Message-ID: It really comes down to what you want to do. Right now we need people to stress test the new biojavax packages available in CVS. Some more Unit tests for biojavax would also be great. Especially ones that test for cases identified in stress testing. If you have other ideas that would also be cool. - Mark "Simon Rouane" Sent by: biojava-l-bounces@portal.open-bio.org 01/11/2006 09:36 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] getting involved I'm a commercial Java developer who's worked on a fair few systems integration, LIMS and Datamart implementations in the past and I'd love to get involved in this project. Can anyone give me some hints as to what the first steps are? Thanks, Simon Rouane. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From srouane at hotmail.com Thu Jan 12 05:05:17 2006 From: srouane at hotmail.com (Simon Rouane) Date: Thu Jan 12 05:18:49 2006 Subject: [Biojava-l] getting involved In-Reply-To: Message-ID: Thanks for everyones comments. I'll do a bit more reading and then get back to you... Is your testing done using JUNIT? Simon. >From: mark.schreiber@novartis.com >To: "Simon Rouane" >CC: biojava-l@biojava.org, biojava-l-bounces@portal.open-bio.org >Subject: Re: [Biojava-l] getting involved >Date: Thu, 12 Jan 2006 10:46:42 +0800 > >It really comes down to what you want to do. > >Right now we need people to stress test the new biojavax packages >available in CVS. Some more Unit tests for biojavax would also be great. >Especially ones that test for cases identified in stress testing. > >If you have other ideas that would also be cool. > >- Mark > > > > > >"Simon Rouane" >Sent by: biojava-l-bounces@portal.open-bio.org >01/11/2006 09:36 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] getting involved > > >I'm a commercial Java developer who's worked on a fair few systems >integration, LIMS and Datamart implementations in the past and I'd love to > >get involved in this project. > >Can anyone give me some hints as to what the first steps are? > >Thanks, > >Simon Rouane. > > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > From hotafin at gmail.com Thu Jan 12 07:52:05 2006 From: hotafin at gmail.com (Tamas Horvath) Date: Thu Jan 12 07:48:31 2006 Subject: [Biojava-l] Re: strange pdb In-Reply-To: References: Message-ID: (wow... stupid linewrap...)It seems to me, that we need a variance tag for the Group or Atom object...As a beginning... The altloc is supposedly means variation of the atomsposition, but it seems to me, it makes more sense to treat the alternates asalternative groups, as in the cases I've so far seen, these altlocs reallyrefer to alternative sidechain conformations. In this case there would be aTYR A and TYR B conformation. On 1/12/06, Tamas Horvath wrote:>> Hi!> I've just stubled upon a strange pdb parsing fenomenon. Look at the> following pdb file:> ATOM 1 N GLU 326 14.783 14.947 -11.793 1.00 46.17> N> ATOM 2 CA GLU 326 15.471 16.220 -11.447 1.00 39.29> C> ATOM 3 C GLU 326 14.978 16.646 -10.075 1.00 37.04> C> ATOM 4 O GLU 326 13.774 16.707 -9.841 1.00 37.72> O> ATOM 5 CB GLU 326 15.133 17.290 -12.489 1.00 45.78> C> ATOM 6 CG GLU 326 16.102 18.482 -12.553 1.00 71.24> C> ATOM 7 CD GLU 326 15.940 19.327 -13.826 1.00 93.39> C> ATOM 8 OE1 GLU 326 14.901 19.198 -14.512 1.00101.02> O> ATOM 9 OE2 GLU 326 16.857 20.119 -14.144 1.00 84.50> O> ATOM 10 N TYR 327 15.913 16.885 -9.163 1.00 33.93> N> ATOM 11 CA TYR 327 15.604 17.298 -7.797 1.00 23.92> C> ATOM 12 C TYR 327 15.865 18.786 -7.632 1.00 24.48> C> ATOM 13 O TYR 327 16.797 19.328 -8.230 1.00 31.71> O> ATOM 14 CB ATYR 327 16.402 16.443 -6.818 0.50 29.56> C> ATOM 15 CB BTYR 327 16.528 16.583 -6.799 0.50 30.30> C> ATOM 16 CG ATYR 327 16.280 14.990 -7.206 0.50 45.39> C> ATOM 17 CG BTYR 327 15.997 15.310 -6.184 0.50 31.62> C> ATOM 18 CD1ATYR 327 16.886 14.518 -8.371 0.50 44.19> C> ATOM 19 CD1BTYR 327 14.840 15.316 -5.413 0.50 41.31> C> ATOM 20 CD2ATYR 327 15.466 14.119 -6.496 0.50 38.02> C> ATOM 21 CD2BTYR 327 16.667 14.101 -6.351 0.50 54.42> C> ATOM 22 CE1ATYR 327 16.676 13.240 -8.828 0.50 27.11> C> ATOM 23 CE1BTYR 327 14.361 14.153 -4.823 0.50 24.22> C> ATOM 24 CE2ATYR 327 15.256 12.830 -6.944 0.50 27.50> C> ATOM 25 CE2BTYR 327 16.196 12.934 -5.764 0.50 45.82> C> ATOM 26 CZ ATYR 327 15.866 12.400 -8.119 0.50 24.52> C> ATOM 27 CZ BTYR 327 15.041 12.970 -5.001 0.50 38.12> C> ATOM 28 OH ATYR 327 15.666 11.127 -8.607 0.50 51.23> O> ATOM 29 OH BTYR 327 14.567 11.824 -4.411 0.50 40.14> O> ATOM 30 N PHE 328 15.050 19.446 -6.825 1.00 20.97> N> ATOM 31 CA PHE 328 15.212 20.876 -6.587 1.00 20.04> C> ATOM 32 C PHE 328 15.213 21.072 -5.098 1.00 28.28> C> ATOM 33 O PHE 328 14.775 20.197 -4.363 1.00 24.43> O> ATOM 34 CB PHE 328 14.061 21.656 -7.209 1.00 22.08> C> ATOM 35 CG PHE 328 13.906 21.406 -8.670 1.00 31.12> C> ATOM 36 CD1 PHE 328 13.164 20.320 -9.124 1.00 23.58> C> ATOM 37 CD2 PHE 328 14.547 22.217 -9.594 1.00 47.00> C> ATOM 38 CE1 PHE 328 13.064 20.044 -10.465 1.00 30.40> C> ATOM 39 CE2 PHE 328 14.452 21.948 -10.954 1.00 44.64> C> ATOM 40 CZ PHE 328 13.706 20.852 -11.386 1.00 33.12> C>> As the pdb parser goes through these it simply cuts off those A/B variants> of that TYR, and simply just parses them as similarly named atoms of the> same aa. This is really not a desired thing to do.> As in the pdb format description, this is:>> 17 Character altLoc Alternate location indicator.>>>> Maybe the simplest way to deal with it is to let the user choose, which wariant should be used...>> From matthew.pocock at ncl.ac.uk Thu Jan 12 08:27:10 2006 From: matthew.pocock at ncl.ac.uk (Matthew Pocock) Date: Thu Jan 12 08:42:45 2006 Subject: [Biojava-l] Generalized HMM in biojava? In-Reply-To: References: <200601111127.00973.matthew.pocock@ncl.ac.uk> Message-ID: <200601121327.11083.matthew.pocock@ncl.ac.uk> On Wednesday 11 January 2006 16:03, wendy wong wrote: > Thanks! > Now I have two questions about the SimpleEmissionState class: > > 1. advance: I am not entirely sure what it does. So if my state emits > 4 symbols at a time do I set it to {4}? If you are emitting 4 symbols at a time, then you should probably think of the sequence as being a string of 4-tuples. In this case, the advance would be {1 }, as you emit a single 4-tuple each time. > > 2. Each of my sites can emit up to more than 100 alphabets I think we are using different words here. Do you mean 100 alphabets, or alphabets containing 100 symbols? > and if > each state emits 4 symbols at a time the number of alphabet for each > state is 100^4. I am a bit concerned about setting up the > distributions (too much memory consumption?). Well, there's no way arround this. If you realy want to estimate a full discrete distribution over 4-tuples over 100 symbols, then you will have 100^4 parameters to estimate. The alternative is to estimate a much smaller number of variables which when combined together (e.g. by multiplying them) calculate the full set of parameters. With a little thinking, You can rig the distribution trainer to route the counts back from the 100^4 possible outcomes to the underlying parameters. It would probably help to have a better idea what it is you are attempting to model. > Is there a function that > I can overload so that the probability of each emission alphabet can > be calculated on the run? It's not the alphabet that will kill you, but the number of parameters you are estimating. Indeed, BioJava should be able to handle alphabets with more than 2^32 symbols quite happily. There's an implementation of cross-product alphabet designed especially for this case. > > Thanks for your help! > > wendy > > On 1/11/06, Matthew Pocock wrote: > > If each state emits a fixed number of symbols then you can just do an HMM > > where the emissions are over alpha^length. If you want the symbols to > > overlap then use an order-n distribution. > > > > Matthew > > > > On Wednesday 11 January 2006 09:37, wendy wong wrote: > > > what I mean by Generalized HMM is that each state emits a sequence of > > > symbols (fixed length though), which doesn't seen very straight > > > forward in biojava? > > > > > > thanks, > > > wendy > > > > > > On 1/11/06, mark.schreiber@novartis.com wrote: > > > > Depending on what you mean by generalized.... > > > > > > > > You can create lots of custom HMM architechtures using the DP > > > > packages of biojava. > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > wendy wong > > > > Sent by: biojava-l-bounces@portal.open-bio.org > > > > 01/11/2006 05:00 AM > > > > Please respond to sww8 > > > > > > > > > > > > To: biojava-l@biojava.org > > > > cc: (bcc: Mark Schreiber/GP/Novartis) > > > > Subject: [Biojava-l] Generalized HMM in biojava? > > > > > > > > > > > > Hi, > > > > > > > > I was wondering if it is possible to use the biojava library to > > > > construct a generalized HMM? > > > > > > > > thanks, > > > > Wendy > > > > > > > > _______________________________________________ > > > > Biojava-l mailing list - Biojava-l@biojava.org > > > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l@biojava.org > > > http://biojava.org/mailman/listinfo/biojava-l From ap3 at sanger.ac.uk Thu Jan 12 08:41:31 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu Jan 12 09:05:45 2006 Subject: [Biojava-l] Re: strange pdb In-Reply-To: References: Message-ID: <334130934e4489534c81036f79f2d6d3@sanger.ac.uk> Hi Tamas, the Atoms have an altLoc field. - only the pdb parser did not capture that information... I committed a fix to cvs. Cheers, Andreas On 12 Jan 2006, at 12:52, Tamas Horvath wrote: > (wow... stupid linewrap...)It seems to me, that we need a variance tag > for the Group or Atom object...As a beginning... The altloc is > supposedly means variation of the atomsposition, but it seems to me, > it makes more sense to treat the alternates asalternative groups, as > in the cases I've so far seen, these altlocs reallyrefer to > alternative sidechain conformations. In this case there would be aTYR > A and TYR B conformation. > On 1/12/06, Tamas Horvath wrote:>> Hi!> I've just > stubled upon a strange pdb parsing fenomenon. Look at the> following > pdb file:> ATOM 1 N GLU 326 14.783 14.947 -11.793 > 1.00 46.17> N> ATOM 2 CA GLU 326 15.471 16.220 -11.447 > 1.00 39.29> C> ATOM 3 C GLU 326 14.978 16.646 -10.075 > 1.00 37.04> C> ATOM 4 O GLU 326 13.774 16.707 -9.841 > 1.00 37.72> O> ATOM 5 CB GLU 326 15.133 17.290 -12.489 > 1.00 45.78> C> ATOM 6 CG GLU 326 16.102 18.482 -12.553 > 1.00 71.24> C> ATOM 7 CD GLU 326 15.940 19.327 -13.826 > 1.00 93.39> C> ATOM 8 OE1 GLU 326 14.901 19.198 -14.512 > 1.00101.02> O> ATOM 9 OE2 GLU 326 16.857 20.119 -14.144 > 1.00 84.50> O> ATOM 10 N TYR 327 15.913 16.885 -9.163 > 1.00 33.93> N> ATOM 11 CA TYR 327 15.604 17.298 -7.797 > 1.00 23.92> C> ATOM 12 C TYR 327 15.865 18.786 -7.632 > 1! > .00 24.48> C> ATOM 13 O TYR 327 16.797 19.328 -8.230 > 1.00 31.71> O> ATOM 14 CB ATYR 327 16.402 16.443 -6.818 > 0.50 29.56> C> ATOM 15 CB BTYR 327 16.528 16.583 -6.799 > 0.50 30.30> C> ATOM 16 CG ATYR 327 16.280 14.990 -7.206 > 0.50 45.39> C> ATOM 17 CG BTYR 327 15.997 15.310 -6.184 > 0.50 31.62> C> ATOM 18 CD1ATYR 327 16.886 14.518 -8.371 > 0.50 44.19> C> ATOM 19 CD1BTYR 327 14.840 15.316 -5.413 > 0.50 41.31> C> ATOM 20 CD2ATYR 327 15.466 14.119 -6.496 > 0.50 38.02> C> ATOM 21 CD2BTYR 327 16.667 14.101 -6.351 > 0.50 54.42> C> ATOM 22 CE1ATYR 327 16.676 13.240 -8.828 > 0.50 27.11> C> ATOM 23 CE1BTYR 327 14.361 14.153 -4.823 > 0.50 24.22> C> ATOM 24 CE2ATYR 327 15.256 12.830 -6.944 > 0.50 27.50> C> ATOM 25 CE2BTYR 327 16.196 12.934 -5.764 > 0.50 45.82> C> ATOM 26 CZ ATYR 327 15.866 12.400 -8.1! > 19 0.50 24.52> C> ATOM 27 CZ BTYR 327 15.041 12.970 > -5.001 0.50 38.12> C> ATOM 28 OH ATYR 327 15.666 11.127 > -8.607 0.50 51.23> O> ATOM 29 OH BTYR 327 14.567 11.824 > -4.411 0.50 40.14> O> ATOM 30 N PHE 328 15.050 19.446 > -6.825 1.00 20.97> N> ATOM 31 CA PHE 328 15.212 20.876 > -6.587 1.00 20.04> C> ATOM 32 C PHE 328 15.213 21.072 > -5.098 1.00 28.28> C> ATOM 33 O PHE 328 14.775 20.197 > -4.363 1.00 24.43> O> ATOM 34 CB PHE 328 14.061 21.656 > -7.209 1.00 22.08> C> ATOM 35 CG PHE 328 13.906 21.406 > -8.670 1.00 31.12> C> ATOM 36 CD1 PHE 328 13.164 20.320 > -9.124 1.00 23.58> C> ATOM 37 CD2 PHE 328 14.547 22.217 > -9.594 1.00 47.00> C> ATOM 38 CE1 PHE 328 13.064 20.044 > -10.465 1.00 30.40> C> ATOM 39 CE2 PHE 328 14.452 21.948 > -10.954 1.00 44.64> C> ATOM 40 CZ PHE 328 13.706 > 20.852! > -11.386 1.00 33.12> C>> As the pdb parser goes through these it > simply cuts off those A/B variants> of that TYR, and simply just > parses them as similarly named atoms of the> same aa. This is really > not a desired thing to do.> As in the pdb format description, this > is:>> 17 Character altLoc Alternate location > indicator.>>>> Maybe the simplest way to deal with it is to let the > user choose, which wariant should be used...>> > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From hotafin at gmail.com Thu Jan 12 07:17:31 2006 From: hotafin at gmail.com (Tamas Horvath) Date: Thu Jan 12 13:50:03 2006 Subject: [Biojava-l] strange pdb Message-ID: Hi!I've just stubled upon a strange pdb parsing fenomenon. Look at thefollowing pdb file:ATOM 1 N GLU 326 14.783 14.947 -11.793 1.00 46.17NATOM 2 CA GLU 326 15.471 16.220 -11.447 1.00 39.29CATOM 3 C GLU 326 14.978 16.646 -10.075 1.00 37.04CATOM 4 O GLU 326 13.774 16.707 -9.841 1.00 37.72OATOM 5 CB GLU 326 15.133 17.290 -12.489 1.00 45.78CATOM 6 CG GLU 326 16.102 18.482 -12.553 1.00 71.24CATOM 7 CD GLU 326 15.940 19.327 -13.826 1.00 93.39CATOM 8 OE1 GLU 326 14.901 19.198 -14.512 1.00101.02OATOM 9 OE2 GLU 326 16.857 20.119 -14.144 1.00 84.50OATOM 10 N TYR 327 15.913 16.885 -9.163 1.00 33.93NATOM 11 CA TYR 327 15.604 17.298 -7.797 1.00 23.92CATOM 12 C TYR 327 15.865 18.786 -7.632 1.00 24.48CATOM 13 O TYR 327 16.797 19.328 -8.230 1.00 31.71OATOM 14 CB ATYR 327 16.402 16.443 -6.818 0.50 29.56CATOM 15 CB BTYR 327 16.528 16.583 -6.799 0.50 30.30CATOM 16 CG ATYR 327 16.280 14.990 -7.206 0.50 45.39CATOM 17 CG BTYR 327 15.997 15.310 -6.184 0.50 31.62CATOM 18 CD1ATYR 327 16.886 14.518 -8.371 0.50 44.19CATOM 19 CD1BTYR 327 14.840 15.316 -5.413 0.50 41.31CATOM 20 CD2ATYR 327 15.466 14.119 -6.496 0.50 38.02CATOM 21 CD2BTYR 327 16.667 14.101 -6.351 0.50 54.42CATOM 22 CE1ATYR 327 16.676 13.240 -8.828 0.50 27.11CATOM 23 CE1BTYR 327 14.361 14.153 -4.823 0.50 24.22CATOM 24 CE2ATYR 327 15.256 12.830 -6.944 0.50 27.50CATOM 25 CE2BTYR 327 16.196 12.934 -5.764 0.50 45.82CATOM 26 CZ ATYR 327 15.866 12.400 -8.119 0.50 24.52CATOM 27 CZ BTYR 327 15.041 12.970 -5.001 0.50 38.12CATOM 28 OH ATYR 327 15.666 11.127 -8.607 0.50 51.23OATOM 29 OH BTYR 327 14.567 11.824 -4.411 0.50 40.14OATOM 30 N PHE 328 15.050 19.446 -6.825 1.00 20.97NATOM 31 CA PHE 328 15.212 20.876 -6.587 1.00 20.04CATOM 32 C PHE 328 15.213 21.072 -5.098 1.00 28.28CATOM 33 O PHE 328 14.775 20.197 -4.363 1.00 24.43OATOM 34 CB PHE 328 14.061 21.656 -7.209 1.00 22.08CATOM 35 CG PHE 328 13.906 21.406 -8.670 1.00 31.12CATOM 36 CD1 PHE 328 13.164 20.320 -9.124 1.00 23.58CATOM 37 CD2 PHE 328 14.547 22.217 -9.594 1.00 47.00CATOM 38 CE1 PHE 328 13.064 20.044 -10.465 1.00 30.40CATOM 39 CE2 PHE 328 14.452 21.948 -10.954 1.00 44.64CATOM 40 CZ PHE 328 13.706 20.852 -11.386 1.00 33.12C As the pdb parser goes through these it simply cuts off those A/B variantsof that TYR, and simply just parses them as similarly named atoms of thesame aa. This is really not a desired thing to do.As in the pdb format description, this is: 17 Character altLoc Alternate location indicator. Maybe the simplest way to deal with it is to let the user choose,which wariant should be used... From franckv at ebi.ac.uk Mon Jan 16 10:59:42 2006 From: franckv at ebi.ac.uk (Franck) Date: Mon Jan 16 16:24:52 2006 Subject: [Biojava-l] Re: Multiple questions (mark.schreiber@novartis.com) Message-ID: <43CBC2EE.9090504@ebi.ac.uk> Hi, sorry for this late response ! As for point 2) (Is there a wrapper for SequenceIO.fileToBiojava(..)), For one of my projects I've written a factory class which returns a Sequence object according to an URI or a string. The formats taken into account are EMBL, Genbank and SwissProt. This project is still going on and not fully tested but by now this code works with my sequences. If it can help someone... Franck p.s. You can find the java file attached. -------------- next part -------------- package uk.ac.ebi.ftv; import java.io.*; import java.net.URL; import java.net.MalformedURLException; import java.util.regex.Pattern; import org.biojava.bio.seq.Sequence; import org.biojava.bio.seq.SequenceIterator; import org.biojava.bio.seq.io.SeqIOTools; import org.biojava.bio.seq.io.SequenceBuilder; import org.biojava.bio.BioException; /** * Project FTV : Feature Table Viewer * F. Valentin - Jul 2005 * Copyright (c) European Bioinformatics Institute 2005 *

* $Header$ * Version : $Name$ *

*

* $Log$ */ public abstract class SequenceFactory { /* ----------------------- Class variables --------------------------- */ // According to the documentation the first line of EMBL and SwissProt files are // defined as following : // EMBL := ID \s+ \s+ ; \s+ [circular] \s+ ; \s+ // ; \s+ \s+ BP. // := \p{Alpha> \w+ // := standard // := .+ (should be the same as the value in the mol_type qualifier). // < division> := (PHG)|(CON)|... (see EMBL documentation) // := \d+ // ------------------------------------------------------------------------------ // SwissProt := ID \s+ \s+ ; \s+ ; \s+ AA. // := \w{1,12} // := PRT // := \d+ // ------------------------------------------------------------------------------ // GenBank := LOCUS \s{7} \s \s bp \s // \s{2} \s \s // := \w ( (\w(?<=\w)) | (\s(?=\s)) ){11} // := \s ( (\s(?<=\s)) | (\d (?=\d) ){4} \d // := \s{3} ([sdm]s-) // := (NA\s) | ( (DNA) | (tRNA) | (rRNA) | (mRNA) | (uRNA) | (snRNA) | (snoRNA) // := (circular) | (linear \s \s) // := \w{3} // := // date format dd-MMM-yyyy // ------------------------------------------------------------------------------ // DDBJ := the format seems to be the same as Genbank. // TODO need to be confirmed. // // We don't strictly follow these definitions. The important point here is to // be able to distinguish the different formats. However, if new formats are // added it's important to adapt the tests to keep the choice deterministic ! private static Pattern EMBL_PATTERN = Pattern.compile("\\AID.+BP\\.\\s*$", Pattern.MULTILINE); private static Pattern GENBANK_PATTERN = Pattern.compile("\\ALOCUS.+\\d{4}\\s*$", Pattern.MULTILINE); private static Pattern SWISSPROT_PATTERN = Pattern.compile("\\AID.+AA\\.\\s*$", Pattern.MULTILINE); /* ------------------------- Class methods --------------------------- */ /** * Create the biojava object Sequence according to the first line of the string. * @param st A string representing the sequence. * @return the sequence object. */ private static Sequence createSequenceFromString(String st) throws FtvUserException { SequenceIterator iterator; BufferedReader br = new BufferedReader(new StringReader(st)); Sequence sequence; // If EMBL format if (EMBL_PATTERN.matcher(st).find()) { iterator = SeqIOTools.readEmbl(br); } // Genbank/DDBJ format else if (GENBANK_PATTERN.matcher(st).find()) { iterator = SeqIOTools.readGenbank(br); } // SwissProt format else if (SWISSPROT_PATTERN.matcher(st).find()) { iterator = SeqIOTools.readSwissprot(br); } else { throw new FtvUserException(FtvUtil.MSG_SEQ_FORMAT_UNKNOWN); } // We read only the first sequence from the iterator (we use an iterator here because // it's simpler than creating the Sequence object directly, see StreamReader's // implementation to see what's have to be done). try { return sequence = iterator.nextSequence(); } catch (BioException e) { System.out.println("-------------------------"); e.getStackTrace(); System.out.println("-------------------------"); throw new FtvUserException("BioException : " + e.getMessage()); } } /** * Create a Sequence object according to the sort of string given as a parameter :
* The string can be :
* - the sequence itself.
* - an URI to the sequence.
* eg. http://www.ebi.ac.uk/cgibin/dbfetch?db=EMBL&id=j00021&forma=embl&style=raw
* ftp://www.asite.fr/sequence.embl * @param st string that represents a sequence. * @return the sequence object. */ public static Sequence createSequence(String st) throws FtvUserException, IOException { StringBuffer sb_sequence = new StringBuffer(); String st_sequence; BufferedReader in = null; URL url = null; String seq_line = null ; // If the URL has no protocol defined, this is the sequence itself. // (See http://www.ietf.org/rfc/rfc2396.txt chap 3.1) if (! st.matches("\\A\\w*(\\w|\\d|\\+|-|\\.):.+$")) { st_sequence = new String(st); } else { try { url = new URL(st); in = new BufferedReader(new InputStreamReader(url.openStream())); while ((seq_line = in.readLine()) != null) { sb_sequence.append(seq_line).append("\n"); } in.close(); st_sequence = new String(sb_sequence); } catch (MalformedURLException e) { throw new FtvUserException(FtvUtil.MSG_PROTOCOL_UNKNOWN); } catch (FileNotFoundException e) { throw new FtvUserException(FtvUtil.MSG_FILE_NOT_FOUND); } catch (IOException e) { throw e; //To change body of catch statement use File | Settings | File Templates. } } return createSequenceFromString(st_sequence); } } From koeberle at mpiib-berlin.mpg.de Thu Jan 19 12:16:46 2006 From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Thu Jan 19 12:19:40 2006 Subject: [Biojava-l] Parse XML BLAST Message-ID: <43CFC97E.9070308@mpiib-berlin.mpg.de> Hi, is it possible to get the information form BLAST-XML Tag with bioJAVA? I use the example from BioJava In Anger for parse a BLAST. I use BlastXMLParserFacade as a parser. To get the definition of the target gen I use SeqSimilaritySearchHit-Object parse the result from getSubjectID() and download the Sequence from NCBI. But this is very slow. for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); String name = hit.getSubjectID().split("\\|")[3]; Sequence seq = db.getSequence(name); System.out.print("\t" + seq.getAnnotation().getProperty("DEFINITION")); } Is there are a better way to get the Information? thanks, Christian -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle@mpiib-berlin.mpg.de From mark.schreiber at novartis.com Thu Jan 19 22:37:23 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Jan 19 22:41:01 2006 Subject: [Biojava-l] Parse XML BLAST Message-ID: Which example are you using? The BlastEcho might be faster. - Mark Christian K?berle Sent by: biojava-l-bounces@portal.open-bio.org 01/20/2006 01:16 AM To: bio java mailing list cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Parse XML BLAST Hi, is it possible to get the information form BLAST-XML Tag with bioJAVA? I use the example from BioJava In Anger for parse a BLAST. I use BlastXMLParserFacade as a parser. To get the definition of the target gen I use SeqSimilaritySearchHit-Object parse the result from getSubjectID() and download the Sequence from NCBI. But this is very slow. for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); String name = hit.getSubjectID().split("\\|")[3]; Sequence seq = db.getSequence(name); System.out.print("\t" + seq.getAnnotation().getProperty("DEFINITION")); } Is there are a better way to get the Information? thanks, Christian -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle@mpiib-berlin.mpg.de _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From dreher at mpiib-berlin.mpg.de Fri Jan 20 09:45:27 2006 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Fri Jan 20 09:49:44 2006 Subject: [Biojava-l] BioSQL cvs versions Message-ID: <43D0F787.5020705@mpiib-berlin.mpg.de> Hello, when I try to add a sequence to a BioSQL-DB, the following exception is thrown: *Exception Details: * org.postgresql.util.PSQLException ERROR: column "seqfeature_key_id" of relation "seqfeature" does not exist |org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512) org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297) org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804) org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760) org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729) org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481) org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374) . . . | apparently the BioJava- and BioSQL-version don't really match. I use the following cvs-version of the corresponding class: /BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005// Further I use the latest cvs-version of the BioSQL-script 'biosqldb-pg.sql' (it's from June 2005). Are there any suggestions how this could be solved? Thank you, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From jdiminic at gmail.com Fri Jan 20 14:17:46 2006 From: jdiminic at gmail.com (Janko Diminic) Date: Fri Jan 20 15:51:23 2006 Subject: [Biojava-l] BioSQL cvs versions In-Reply-To: <43D0F787.5020705@mpiib-berlin.mpg.de> References: <43D0F787.5020705@mpiib-berlin.mpg.de> Message-ID: <43cbb78e0601201117r41222872u@mail.gmail.com> Do you create database schema with create? Check if seqfeature_key_id exists. 2006/1/20, Felix Dreher : > Hello, > when I try to add a sequence to a BioSQL-DB, the following exception is > thrown: > > *Exception Details: * org.postgresql.util.PSQLException > ERROR: column "seqfeature_key_id" of relation "seqfeature" does not exist > > |org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512) > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297) > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) > org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) > org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) > org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) > org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804) > org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760) > org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729) > org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481) > org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374) > . > . > . > > | > apparently the BioJava- and BioSQL-version don't really match. > I use the following cvs-version of the corresponding class: > /BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005// > Further I use the latest cvs-version of the BioSQL-script > 'biosqldb-pg.sql' (it's from June 2005). > Are there any suggestions how this could be solved? > > Thank you, > Felix > > > > > > > -- > Felix Dreher > Max-Planck-Institute for Infection Biology > Campus Charit? Mitte > Department of Immunology > Mailing address: Schumannstra?e 21/22 > Visitors: Virchowweg 12 > 10117 Berlin > Germany > Tel.: +49 (0)30 28460-254 / -494 > Mobile: +49 (0)163 7542426 > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > -- Janko Diminic From wendy.wong at gmail.com Fri Jan 20 17:11:58 2006 From: wendy.wong at gmail.com (wendy wong) Date: Sat Jan 21 07:19:38 2006 Subject: [Biojava-l] Generalized HMM in biojava? In-Reply-To: <200601121327.11083.matthew.pocock@ncl.ac.uk> References: <200601111127.00973.matthew.pocock@ncl.ac.uk> <200601121327.11083.matthew.pocock@ncl.ac.uk> Message-ID: Thanks for your help! > It's not the alphabet that will kill you, but the number of parameters you are > estimating. Indeed, BioJava should be able to handle alphabets with more than > 2^32 symbols quite happily. There's an implementation of cross-product > alphabet designed especially for this case. what I am trying to do is to develop a phylogenetic HMM. so say there are 3 sequences, in the alignment, that means each site consists of 3 symbols, and if it is a generalized HMM, each state has several sites, say 7. I wrote a testing program to see if it works. when the length of sites in the state = 5 it worked. (I just want to see if I can factorize a symbol in the state alphabet. but when number of sites in the state = 7, I get java.lang.ArrayIndexOutOfBoundsException. (code attached) Is it because i was not using the alphabet efficiently? again, thanks very much for helping! Wendy public static void main(String[] args) throws MarshalException, ValidationException, IOException { Alphabet sequenceAlphabet = DNATools.getDNA(); Set alphabetSet = AlphabetManager.getAllSymbols((FiniteAlphabet) sequenceAlphabet); int no_sequences = 3; List siteAlphabetList = Collections.nCopies(no_sequences, sequenceAlphabet); Alphabet siteAlphabet = AlphabetManager.getCrossProductAlphabet(siteAlphabetList); int length = 7; List staeAlphabetList = Collections.nCopies(length, siteAlphabet); Alphabet stateAlphabet = AlphabetManager.getCrossProductAlphabet(staeAlphabetList); AlphabetIndex alphabetIndex = AlphabetManager.getAlphabetIndex((FiniteAlphabet) stateAlphabet); AtomicSymbol sym = (AtomicSymbol) alphabetIndex.symbolForIndex(3); List symList = sym.getSymbols(); log.info("sym (index=3) is " + sym); log.info("sym is composed of:"); Iterator symIter = symList.iterator(); while (symIter.hasNext()) { log.info(symIter.next()); } } From mark.schreiber at novartis.com Sun Jan 22 20:17:16 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Jan 22 20:13:42 2006 Subject: [Biojava-l] BioSQL cvs versions Message-ID: Dear Felix, We have found a number of deficiencies in biojava's support of biosql. Therefore we have moved to a new model using hibernate to overcome several problems. This will be officially released in biojava1.5. In the meantime you can download the development version from CVS. Having said that, the best supported database versions in biojava 1.4 are Oracle and MySQL. These have received the most testing and support. If you have a chance (and cannot use Hibernate) I would suggest using one of those. Although someone may offer a bug fix for this problem we do not plan to support the old biojava/biosql mappings after 1.5 is released. They have been deprecated in the CVS. The official way to interact with biosql will be via Hibernate. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Felix Dreher Sent by: biojava-l-bounces@portal.open-bio.org 01/20/2006 10:45 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioSQL cvs versions Hello, when I try to add a sequence to a BioSQL-DB, the following exception is thrown: *Exception Details: * org.postgresql.util.PSQLException ERROR: column "seqfeature_key_id" of relation "seqfeature" does not exist |org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512) org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297) org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804) org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760) org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729) org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481) org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374) . . . | apparently the BioJava- and BioSQL-version don't really match. I use the following cvs-version of the corresponding class: /BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005// Further I use the latest cvs-version of the BioSQL-script 'biosqldb-pg.sql' (it's from June 2005). Are there any suggestions how this could be solved? Thank you, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From matthew.pocock at ncl.ac.uk Mon Jan 23 06:32:21 2006 From: matthew.pocock at ncl.ac.uk (Matthew Pocock) Date: Mon Jan 23 06:46:20 2006 Subject: [Biojava-l] Generalized HMM in biojava? In-Reply-To: References: <200601121327.11083.matthew.pocock@ncl.ac.uk> Message-ID: <200601231132.21336.matthew.pocock@ncl.ac.uk> On Friday 20 January 2006 22:11, wendy wong wrote: > what I am trying to do is to develop a phylogenetic HMM. so say there > are 3 sequences, in the alignment, that means each site consists of 3 > symbols, and if it is a generalized HMM, each state has several sites, > say 7. OK - so you have a single HMM that emits whole columns of an alignment? Usually to a lign three sequences, you would use a 3-head HMM where each head emits one of the sequences. > I wrote a testing program to see if it works. when the length > of sites in the state = 5 it worked. (I just want to see if I can > factorize a symbol in the state alphabet. but when number of sites in > the state = 7, I get java.lang.ArrayIndexOutOfBoundsException. (code > attached) > > Is it because i was not using the alphabet efficiently? You shouldn't be getting exceptions. This is almost certainly a bug. Could you send the stack-trace? Matthew > > again, thanks very much for helping! > > Wendy > > public static void main(String[] args) throws MarshalException, > ValidationException, IOException { > > Alphabet sequenceAlphabet = DNATools.getDNA(); > Set alphabetSet = AlphabetManager.getAllSymbols((FiniteAlphabet) > sequenceAlphabet); > > int no_sequences = 3; > List siteAlphabetList = Collections.nCopies(no_sequences, > sequenceAlphabet); Alphabet siteAlphabet = > AlphabetManager.getCrossProductAlphabet(siteAlphabetList); > int length = 7; > List staeAlphabetList = Collections.nCopies(length, siteAlphabet); > Alphabet stateAlphabet = > AlphabetManager.getCrossProductAlphabet(staeAlphabetList); > > AlphabetIndex alphabetIndex = > AlphabetManager.getAlphabetIndex((FiniteAlphabet) stateAlphabet); > AtomicSymbol sym = (AtomicSymbol) alphabetIndex.symbolForIndex(3); > List symList = sym.getSymbols(); > log.info("sym (index=3) is " + sym); > log.info("sym is composed of:"); > Iterator symIter = symList.iterator(); > while (symIter.hasNext()) { > log.info(symIter.next()); > } > } From wendy.wong at gmail.com Mon Jan 23 06:43:43 2006 From: wendy.wong at gmail.com (wendy wong) Date: Mon Jan 23 06:46:28 2006 Subject: [Biojava-l] Generalized HMM in biojava? In-Reply-To: <200601231132.21336.matthew.pocock@ncl.ac.uk> References: <200601121327.11083.matthew.pocock@ncl.ac.uk> <200601231132.21336.matthew.pocock@ncl.ac.uk> Message-ID: > OK - so you have a single HMM that emits whole columns of an alignment? > Usually to a lign three sequences, you would use a 3-head HMM where each head > emits one of the sequences. I am not sure if it would work with a 3 head HMM, as in here the sequences are related to each other by the phylogenetic tree. so if the sequences order is the same, the column ACC would have a different likelihood than CCA. > You shouldn't be getting exceptions. This is almost certainly a bug. Could you > send the stack-trace? sure, here it is: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 at org.biojava.bio.symbol.LinearAlphabetIndex.buildIndex(LinearAlphabetIndex.java:108) at org.biojava.bio.symbol.LinearAlphabetIndex.(LinearAlphabetIndex.java:66) at org.biojava.bio.symbol.AlphabetManager.getAlphabetIndex(AlphabetManager.java:1796) at edu.cornell.bscb.evopromoter.TestingFunctions.main(TestingFunctions.java:61) I think I don't need the full alphabet of getDNA(), which has 16 symbols. I reduced it to 5 (A,T, C, G, N), so I can have a state that contains more sites... thanks, wendy > > again, thanks very much for helping! > > > > Wendy > > > > public static void main(String[] args) throws MarshalException, > > ValidationException, IOException { > > > > Alphabet sequenceAlphabet = DNATools.getDNA(); > > Set alphabetSet = AlphabetManager.getAllSymbols((FiniteAlphabet) > > sequenceAlphabet); > > > > int no_sequences = 3; > > List siteAlphabetList = Collections.nCopies(no_sequences, > > sequenceAlphabet); Alphabet siteAlphabet = > > AlphabetManager.getCrossProductAlphabet(siteAlphabetList); > > int length = 7; > > List staeAlphabetList = Collections.nCopies(length, siteAlphabet); > > Alphabet stateAlphabet = > > AlphabetManager.getCrossProductAlphabet(staeAlphabetList); > > > > AlphabetIndex alphabetIndex = > > AlphabetManager.getAlphabetIndex((FiniteAlphabet) stateAlphabet); > > AtomicSymbol sym = (AtomicSymbol) alphabetIndex.symbolForIndex(3); > > List symList = sym.getSymbols(); > > log.info("sym (index=3) is " + sym); > > log.info("sym is composed of:"); > > Iterator symIter = symList.iterator(); > > while (symIter.hasNext()) { > > log.info(symIter.next()); > > } > > } > From matthew.pocock at ncl.ac.uk Mon Jan 23 06:58:41 2006 From: matthew.pocock at ncl.ac.uk (Matthew Pocock) Date: Mon Jan 23 07:13:38 2006 Subject: [Biojava-l] Generalized HMM in biojava? In-Reply-To: References: <200601231132.21336.matthew.pocock@ncl.ac.uk> Message-ID: <200601231158.42259.matthew.pocock@ncl.ac.uk> On Monday 23 January 2006 11:43, wendy wong wrote: > > OK - so you have a single HMM that emits whole columns of an alignment? > > Usually to a lign three sequences, you would use a 3-head HMM where each > > head emits one of the sequences. > > I am not sure if it would work with a 3 head HMM, as in here the > sequences are related to each other by the phylogenetic tree. so if > the sequences order is the same, the column ACC would have a different > likelihood than CCA. So you already have the alignment from a phylogenetic program and you are using biojava to compute some other statistic over it? > > > You shouldn't be getting exceptions. This is almost certainly a bug. > > Could you send the stack-trace? > > sure, here it is: Thanks. I am not arround untill the end of the week. Could somebody take a look at this? > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.biojava.bio.symbol.LinearAlphabetIndex.buildIndex(LinearAlphabetIndex.j >ava:108) at > org.biojava.bio.symbol.LinearAlphabetIndex.(LinearAlphabetIndex.java: >66) at > org.biojava.bio.symbol.AlphabetManager.getAlphabetIndex(AlphabetManager.jav >a:1796) at > edu.cornell.bscb.evopromoter.TestingFunctions.main(TestingFunctions.java:61 >) > > I think I don't need the full alphabet of getDNA(), which has 16 > symbols. I reduced it to 5 (A,T, C, G, N), so I can have a state that > contains more sites... While this is a good idea, it actually will be counter-productive in BioJava. The DNA alphabet only has 4 'real' symbols - the nucleotides. The other symbols (n included) are 'virtual' symbols constructed from sets of the 'real' symbols. By introducing 'N' as a 1st class symbol, you have actually grown the problem from being exp(4,n) to exp(5,n) which is probably not what you wanted :-) > > thanks, > wendy Matthew From mitchellw at gis.a-star.edu.sg Tue Jan 24 07:12:59 2006 From: mitchellw at gis.a-star.edu.sg (Wayne Mitchell) Date: Tue Jan 24 07:17:02 2006 Subject: [Biojava-l] Bioinformatics Programmer Position in Singapore Message-ID: The Research Computing Group at Genome Institute Singapore is recruiting a bioinformatics programmer to work closely with institute Scientists to architect and implement informatics solutions to genomic biology problems. Current projects include sequence, proteomics, SNP and micro array analysis pipelines, db implementations, and user interface design. Candidate should have: -- Demonstrated ability to translate real world problems into actionable software solutions -- Outgoing, client-centric personality able to manage relationships with scientist clients. Gifted introverts will not thrive in this position. -- Experience in a complex, networked UNIX environment -- Strong programming skillset, ideally in a bioscience setting -- Team Software Development trackrecord, preferably enterprise Java -- DB and Data Warehouse design skills -- Bioinformatics/ Biology domain expertise (academic degree or work experience) strongly preferred. Minimum Education/ experience: bachelors in Computer Science with 2+ years general programming experience, or, 1+ years bioinformatics programming experience; or: BS in bioscience, chemistry, physics, math or engineering with 4+ years programming experience, or 2+ years bioinformatics programming experience. CV to mitchellw@gis.a-star.edu.sg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dr. Wayne Mitchell, Ph.D. Senior Scientist, Genome Institute of Singaapore +65 6478 8177 (vox) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . All rivers flow into the sea because it is lower than they are. Humility gives it its power. Dao De Jing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ???? The ocean of learning is unbound Zhuang Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . This email is confidential and may be priviledged. If you are not the intended recipient: please delete it and notify us immediately; pease do not copy or use it for any purpose, or disclose its contents to other persons. Thank you. From guedes at unisul.br Thu Jan 26 14:51:36 2006 From: guedes at unisul.br (Dickson S. Guedes) Date: Thu Jan 26 15:00:56 2006 Subject: [Biojava-l] Alignment with GAPs Message-ID: <43D92848.3060408@unisul.br> Hello All, It is possible to make the alignment of two sequences, being that one of them contains GAP? I?m doing some tests with DP and the Viterbi Algorithm, but without success. Where can I learn about? Thank you people, []s -- :: Dickson S. Guedes (guedes at unisul dot br) :: :: UNISUL - Universidade do Sul de Santa Catarina :: ATI - Assessoria de Tecnologia da Informa??o :: (0xx48) 621-3200 - http://www.unisul.br -- "H? 10 tipos de pessoas no mundo: as que entendem bin?rio, e as que n?o entendem" From matthew.pocock at ncl.ac.uk Thu Jan 26 15:33:45 2006 From: matthew.pocock at ncl.ac.uk (Matthew Pocock) Date: Thu Jan 26 15:40:22 2006 Subject: [Biojava-l] Alignment with GAPs In-Reply-To: <43D92848.3060408@unisul.br> References: <43D92848.3060408@unisul.br> Message-ID: <200601262033.45779.matthew.pocock@ncl.ac.uk> If I understand you correctly, one of the two sequences you are aligning contains a gap before you align them? Or do you want to produce a pair-wise alignment from two un-aligned sequences and introduce gaps? If it is the former, you want a state that emits a gap in one sequence and a symbol in the other, and also advances {1,1}. I think that is easy enough to set up, but can't remember the exact code. If the worst comes to the worst, you can construct the distribution over {gap,Protein} using the classes in .dist and then set up a SimpleState, providing the advance and alphabet in the constructor. Matthew On Thursday 26 January 2006 19:51, Dickson S. Guedes wrote: > Hello All, > > It is possible to make the alignment of two sequences, being that one of > them contains GAP? > > I?m doing some tests with DP and the Viterbi Algorithm, but without > success. > > Where can I learn about? > > Thank you people, > > []s > -- > > :: Dickson S. Guedes (guedes at unisul dot br) > :: > :: UNISUL - Universidade do Sul de Santa Catarina > :: ATI - Assessoria de Tecnologia da Informa??o > :: (0xx48) 621-3200 - http://www.unisul.br > > -- > "H? 10 tipos de pessoas no mundo: as que entendem > bin?rio, e as que n?o entendem" > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l From guedes at unisul.br Thu Jan 26 15:49:14 2006 From: guedes at unisul.br (Dickson S. Guedes) Date: Thu Jan 26 15:47:29 2006 Subject: [Biojava-l] Alignment with GAPs In-Reply-To: <200601262033.45779.matthew.pocock@ncl.ac.uk> References: <43D92848.3060408@unisul.br> <200601262033.45779.matthew.pocock@ncl.ac.uk> Message-ID: <43D935CA.2030708@unisul.br> Thanks Matthew, To produce a pair-wise alignment from two un-aligned sequences and introduce gaps I have used the sample at "BioJava In Anger" and it runs successfully. Now I need an alignment of two sequences where one of them already have gaps before I align. I mean that the DP class don?t accept GappedSequence, it?s right? []s Guedes Matthew Pocock escreveu: > If I understand you correctly, one of the two sequences you are aligning > contains a gap before you align them? Or do you want to produce a pair-wise > alignment from two un-aligned sequences and introduce gaps? > > If it is the former, you want a state that emits a gap in one sequence and a > symbol in the other, and also advances {1,1}. I think that is easy enough to > set up, but can't remember the exact code. If the worst comes to the worst, > you can construct the distribution over {gap,Protein} using the classes > in .dist and then set up a SimpleState, providing the advance and alphabet in > the constructor. > > Matthew -- -- :: Dickson S. Guedes (guedes at unisul dot br) :: :: UNISUL - Universidade do Sul de Santa Catarina :: ATI - Assessoria de Tecnologia da Informa??o :: (0xx48) 621-3200 - http://www.unisul.br -- "H? 10 tipos de pessoas no mundo: as que entendem bin?rio, e as que n?o entendem" From mark.schreiber at novartis.com Thu Jan 26 21:05:20 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Jan 26 21:01:40 2006 Subject: [Biojava-l] Alignment with GAPs Message-ID: Hi - I think the DP class should accept a GappedSequence. To get the result you want you will probably need to have at least one match state that can emit gaps. I'm curious to know why you would want to do that kind of alignment though? - Mark "Dickson S. Guedes" Sent by: biojava-l-bounces@portal.open-bio.org 01/27/2006 04:49 AM To: Matthew Pocock , Biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Alignment with GAPs Thanks Matthew, To produce a pair-wise alignment from two un-aligned sequences and introduce gaps I have used the sample at "BioJava In Anger" and it runs successfully. Now I need an alignment of two sequences where one of them already have gaps before I align. I mean that the DP class don?t accept GappedSequence, it?s right? []s Guedes Matthew Pocock escreveu: > If I understand you correctly, one of the two sequences you are aligning > contains a gap before you align them? Or do you want to produce a pair-wise > alignment from two un-aligned sequences and introduce gaps? > > If it is the former, you want a state that emits a gap in one sequence and a > symbol in the other, and also advances {1,1}. I think that is easy enough to > set up, but can't remember the exact code. If the worst comes to the worst, > you can construct the distribution over {gap,Protein} using the classes > in .dist and then set up a SimpleState, providing the advance and alphabet in > the constructor. > > Matthew -- -- :: Dickson S. Guedes (guedes at unisul dot br) :: :: UNISUL - Universidade do Sul de Santa Catarina :: ATI - Assessoria de Tecnologia da Informa??o :: (0xx48) 621-3200 - http://www.unisul.br -- "H? 10 tipos de pessoas no mundo: as que entendem bin?rio, e as que n?o entendem" _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From guedes at unisul.br Fri Jan 27 11:09:07 2006 From: guedes at unisul.br (Dickson S. Guedes) Date: Fri Jan 27 11:07:26 2006 Subject: [Biojava-l] Alignment with GAPs In-Reply-To: References: Message-ID: <43DA45A3.4070407@unisul.br> Hi Mark, Ok. I?ll test it, thanks. Curious? :) ... I?m testing somethings about progressive alignment, because I dont?t found how to do Multiple Sequence Aligments (MSA) using with only Biojava. I?m wrong? I make some tests with strap but it?s not what I need. Have you any suggestion about MSA with BioJava? Thanks all! mark.schreiber@novartis.com escreveu: > Hi - > > I think the DP class should accept a GappedSequence. To get the result you > want you will probably need to have at least one match state that can emit > gaps. I'm curious to know why you would want to do that kind of alignment > though? > > - Mark From toddri at eden.rutgers.edu Tue Jan 31 16:33:13 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Tue Jan 31 16:53:44 2006 Subject: [Biojava-l] Help needed tp add "Number of Bits" vertical and column number labeling to DistributionLogos In-Reply-To: <43D935CA.2030708@unisul.br> References: <43D92848.3060408@unisul.br> <200601262033.45779.matthew.pocock@ncl.ac.uk> <43D935CA.2030708@unisul.br> Message-ID: <43DFD799.7050803@eden.rutgers.edu> Hello, I would like to add a "2 Bits" vertical label (with a bracket) and column numbering to my DistributionLogos. I have seen both in some graphics, but haven't been able to find the code in the demos or on the web. Thanks, Todd From mark.schreiber at novartis.com Tue Jan 31 20:08:41 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jan 31 20:04:56 2006 Subject: [Biojava-l] Alignment with GAPs Message-ID: There is no MSA in biojava. CLUSTALW, TCoffee etc are probably much better. "Dickson S. Guedes" 01/28/2006 12:09 AM To: Mark Schreiber/GP/Novartis@PH cc: Biojava-l@biojava.org, Matthew Pocock Subject: Re: [Biojava-l] Alignment with GAPs Hi Mark, Ok. I?ll test it, thanks. Curious? :) ... I?m testing somethings about progressive alignment, because I dont?t found how to do Multiple Sequence Aligments (MSA) using with only Biojava. I?m wrong? I make some tests with strap but it?s not what I need. Have you any suggestion about MSA with BioJava? Thanks all! mark.schreiber@novartis.com escreveu: > Hi - > > I think the DP class should accept a GappedSequence. To get the result you > want you will probably need to have at least one match state that can emit > gaps. I'm curious to know why you would want to do that kind of alignment > though? > > - Mark From mark.schreiber at novartis.com Tue Jan 31 20:12:36 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jan 31 20:08:45 2006 Subject: [Biojava-l] Help needed tp add "Number of Bits" vertical and column number labeling to DistributionLogos Message-ID: Hi Todd, The DistributionLogos class is not the best way to draw large logos with additional features such as labels. The best way to do this is to make a custom component and copy the drawing code from DistributionLogo and incorporate your own code for labels etc. This way you can also draw several positions in the Logo into one component. The better option is to make the code draw direct to a Graphics2D object. In this way the code can paint to a component or to a BufferedImage. - Mark Todd Riley Sent by: biojava-l-bounces@portal.open-bio.org 02/01/2006 05:33 AM To: Biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Help needed tp add "Number of Bits" vertical and column number labeling to DistributionLogos Hello, I would like to add a "2 Bits" vertical label (with a bracket) and column numbering to my DistributionLogos. I have seen both in some graphics, but haven't been able to find the code in the demos or on the web. Thanks, Todd _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From toddri at eden.rutgers.edu Tue Jan 31 21:57:44 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Wed Feb 1 22:06:22 2006 Subject: [Biojava-l] Help needed to add "Number of Bits" vertical and column number labeling to DistributionLogos In-Reply-To: References: Message-ID: <43E023A8.7080201@eden.rutgers.edu> An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biojava-l/attachments/20060131/0814f9cb/attachment-0001.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 35167 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20060131/0814f9cb/attachment-0001.jpg -------------- next part -------------- A non-text attachment was scrubbed... Name: C:\DOCUME~1\Todd\LOCALS~1\Temp\msohtml1\01\clip_image002.jpg Type: image/jpeg Size: 17019 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20060131/0814f9cb/clip_image002-0001.jpg