From jburdick at keyfitz.org Tue May 8 15:15:46 2007 From: jburdick at keyfitz.org (Josh Burdick) Date: Tue, 08 May 2007 15:15:46 -0400 Subject: [Biojava-dev] reading a subsequence from a .nib file In-Reply-To: References: Message-ID: <1178651747.1738.137.camel@localhost.localdomain> On Fri, 2007-04-20 at 10:44 +0800, mark.schreiber at novartis.com wrote: > Hi Josh - > > Looks good. Just one thing, your JUnit test contains a hardcoded file > path to the test file which means it is not portable. Could you modify > that so that it loads the file from the classpath as a resource (see some > of the IO unit tests for examples). Can you also provide the test file. > I'm not currently compiling all of BioJava, and didn't know how to add the file as a resource. So I'm afraid I haven't set the pathname to something sensible. But I created a small test file which is a fragment of chromosome 18. The relevant files are: http://keyfitz.org/jburdick/read_nib_file_java/NibFile.java http://keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java http://keyfitz.org/jburdick/read_nib_file_java/chr18_chunk.nib http://keyfitz.org/jburdick/read_nib_file_java/readme.txt Josh > Best regards, > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > [...] From stephen at blackrim.net Wed May 9 11:45:34 2007 From: stephen at blackrim.net (Stephen A Smith) Date: Wed, 09 May 2007 11:45:34 -0400 Subject: [Biojava-dev] [Off Topic] Google group Message-ID: <4641EC9E.4010307@blackrim.net> Hi all, Just letting you know there is a google group open now for discussions of all thing programming and evolutionary biology. You can find it here http://groups.google.com/group/evo_code. Figured the people at bio* might be interested. Take care Stephen Smith From darin.london at duke.edu Thu May 10 12:17:38 2007 From: darin.london at duke.edu (darin.london at duke.edu) Date: Thu, 10 May 2007 12:17:38 -0400 Subject: [Biojava-dev] BOSC 2007 Second Call For Papers Message-ID: <200705101617.l4AGHc7L002450@tenero.duhs.duke.edu> The BOSC Organizing Committee are proud to announce BOSC 2007, occurring in Vienna, Austria on July 19th, 20th. The conference this year promises to be exciting, as the BOSC developers attempt to define and solve currently intractable problems in Bioinformatics. Please refer to the following website for complete information, and requests for submissions. Thank you, and we hope to see you in Vienna. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. From jeff at dvss.net Thu May 10 21:30:11 2007 From: jeff at dvss.net (Jeff Szielenski) Date: Thu, 10 May 2007 17:30:11 -0800 Subject: [Biojava-dev] Project Contribution. Message-ID: <003901c7936b$ecd256e0$0601a8c0@laptop> Hello, I am currently studying bioinformatics at UIC. I am interested in joining an open source project to start applying my knowledge. What if any help do you guys need? I have a software engineering background and just finished an introductory course in bioinformatics. Jeff From markjschreiber at gmail.com Thu May 10 22:54:36 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 11 May 2007 10:54:36 +0800 Subject: [Biojava-dev] Project Contribution. In-Reply-To: <003901c7936b$ecd256e0$0601a8c0@laptop> References: <003901c7936b$ecd256e0$0601a8c0@laptop> Message-ID: <93b45ca50705101954j3f899f49r438fe8d587ba027@mail.gmail.com> Hi - We need volunteers to convert the example programs in the biojava.org cookbook to use the newer biojavax API's. - Mark On 5/11/07, Jeff Szielenski wrote: > Hello, > > I am currently studying bioinformatics at UIC. I am interested in > joining an open source project to start applying my knowledge. What if > any help do you guys need? I have a software engineering background and > just finished an introductory course in bioinformatics. > > Jeff > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Sun May 13 23:14:43 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 14 May 2007 11:14:43 +0800 Subject: [Biojava-dev] DOM, JDom, Xerces? Message-ID: <93b45ca50705132014n4b0ed35eu67153def42ac705c@mail.gmail.com> Hello - I am looking into XPath and XQuery as a way rapidly process the increasing number of bioinformatics XML formats. As is typical with Java there is a too many APIs to choose from. Have people had good experiences with the internal java API (if so JDK1.5 or 1.6), or is JDom better (even though it is not really DOM)? How about Xerces and Xalan from Apache??? Is there much support for XQuery yet? I have seen there is a JSR for official support of XQuery, is there an associated API for download? Any comments would be greatly appreciated. - Mark From ayates at ebi.ac.uk Mon May 14 04:47:28 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 14 May 2007 09:47:28 +0100 Subject: [Biojava-dev] DOM, JDom, Xerces? In-Reply-To: <93b45ca50705132014n4b0ed35eu67153def42ac705c@mail.gmail.com> References: <93b45ca50705132014n4b0ed35eu67153def42ac705c@mail.gmail.com> Message-ID: <38571708-E02B-4B75-A3AC-3E0E090DBB77@ebi.ac.uk> Hey Mark, I've tried using XPath before on a couple of projects & it is quite nice. In terms of implementations the internal Java one isn't that bad but I really like XOM. It's a very clear/clean API and is quite easy to switch between using XPath back into the normal API. The other program I've heard reasonable things about is http:// javadude.com/tools/antxr/index.html which is the XML equivalent to antlr. I haven't used it yet but it does look interesting Andy On 14 May 2007, at 04:14, Mark Schreiber wrote: > Hello - > > I am looking into XPath and XQuery as a way rapidly process the > increasing number of bioinformatics XML formats. As is typical with > Java there is a too many APIs to choose from. > > Have people had good experiences with the internal java API (if so > JDK1.5 or 1.6), or is JDom better (even though it is not really DOM)? > How about Xerces and Xalan from Apache??? > > Is there much support for XQuery yet? I have seen there is a JSR for > official support of XQuery, is there an associated API for download? > > Any comments would be greatly appreciated. > > - Mark > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From darin.london at duke.edu Mon May 14 10:44:56 2007 From: darin.london at duke.edu (darin.london at duke.edu) Date: Mon, 14 May 2007 10:44:56 -0400 Subject: [Biojava-dev] BOSC 2007 Abstract Submission Deadline Extended Message-ID: <200705141444.l4EEiuxQ026957@tenero.duhs.duke.edu> Due to technical difficulties in sending out the 2nd call for papers, the BOSC organizers are extending the deadline for abstract submissions to Monday May 21st. The announcement day will remain the same so that it remains before the Early Discount Date. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. From phidias51 at gmail.com Mon May 14 12:00:07 2007 From: phidias51 at gmail.com (Mark Fortner) Date: Mon, 14 May 2007 09:00:07 -0700 Subject: [Biojava-dev] DOM, JDom, Xerces? In-Reply-To: <38571708-E02B-4B75-A3AC-3E0E090DBB77@ebi.ac.uk> References: <93b45ca50705132014n4b0ed35eu67153def42ac705c@mail.gmail.com> <38571708-E02B-4B75-A3AC-3E0E090DBB77@ebi.ac.uk> Message-ID: <6e1d61f50705140900w1df2d5d7l5f39fb418668f15f@mail.gmail.com> Mark & Andy, I haven't used XOM before, but it sounds interesting. I HAVE used Saxon and it has good support for both XPath and XQuery. Regards, Mark Fortner On 5/14/07, Andy Yates wrote: > > Hey Mark, > > I've tried using XPath before on a couple of projects & it is quite > nice. In terms of implementations the internal Java one isn't that > bad but I really like XOM. It's a very clear/clean API and is quite > easy to switch between using XPath back into the normal API. > > The other program I've heard reasonable things about is http:// > javadude.com/tools/antxr/index.html which is the XML equivalent to > antlr. I haven't used it yet but it does look interesting > > Andy > > On 14 May 2007, at 04:14, Mark Schreiber wrote: > > > Hello - > > > > I am looking into XPath and XQuery as a way rapidly process the > > increasing number of bioinformatics XML formats. As is typical with > > Java there is a too many APIs to choose from. > > > > Have people had good experiences with the internal java API (if so > > JDK1.5 or 1.6), or is JDom better (even though it is not really DOM)? > > How about Xerces and Xalan from Apache??? > > > > Is there much support for XQuery yet? I have seen there is a JSR for > > official support of XQuery, is there an associated API for download? > > > > Any comments would be greatly appreciated. > > > > - Mark > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ayates at ebi.ac.uk Mon May 14 12:11:16 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 14 May 2007 17:11:16 +0100 Subject: [Biojava-dev] DOM, JDom, Xerces? In-Reply-To: <6e1d61f50705140900w1df2d5d7l5f39fb418668f15f@mail.gmail.com> References: <93b45ca50705132014n4b0ed35eu67153def42ac705c@mail.gmail.com> <38571708-E02B-4B75-A3AC-3E0E090DBB77@ebi.ac.uk> <6e1d61f50705140900w1df2d5d7l5f39fb418668f15f@mail.gmail.com> Message-ID: Hi Mark, I've not used Saxon before but from what I can take from the docs it seems quite nice. However if anyone is going to do some XML processing give it a go. Recently I used XOM's document builder and I have to say that I wrote the majority without a need to consult XOM's documentation (the only bit I needed was figuring out how to print a pretty XML file out). It's very powerful & very easy to read in code. Also if you fancy having a look at some newer technologies Groovy has some very nice ways of parsing and generating XML. Documentation is a bit sparse in places but once you're in the groovy mindset (or bought the book) it's fine (and to be honest the Groovy builder objects are worth the pain). I'm moving most of my scripting to it now; it's like programming in Perl but with the comfort of knowing that you're in Java and can jump out into Java classes whenever you want ... like BioJava ;) Andy On 14 May 2007, at 17:00, Mark Fortner wrote: > Mark & Andy, > I haven't used XOM before, but it sounds interesting. I HAVE used > Saxon and > it has good support for both XPath and XQuery. > > Regards, > > Mark Fortner > > On 5/14/07, Andy Yates wrote: >> >> Hey Mark, >> >> I've tried using XPath before on a couple of projects & it is quite >> nice. In terms of implementations the internal Java one isn't that >> bad but I really like XOM. It's a very clear/clean API and is quite >> easy to switch between using XPath back into the normal API. >> >> The other program I've heard reasonable things about is http:// >> javadude.com/tools/antxr/index.html which is the XML equivalent to >> antlr. I haven't used it yet but it does look interesting >> >> Andy >> >> On 14 May 2007, at 04:14, Mark Schreiber wrote: >> >>> Hello - >>> >>> I am looking into XPath and XQuery as a way rapidly process the >>> increasing number of bioinformatics XML formats. As is typical with >>> Java there is a too many APIs to choose from. >>> >>> Have people had good experiences with the internal java API (if so >>> JDK1.5 or 1.6), or is JDom better (even though it is not really >>> DOM)? >>> How about Xerces and Xalan from Apache??? >>> >>> Is there much support for XQuery yet? I have seen there is a JSR for >>> official support of XQuery, is there an associated API for download? >>> >>> Any comments would be greatly appreciated. >>> >>> - Mark >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From bugzilla-daemon at portal.open-bio.org Thu May 17 04:19:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 17 May 2007 04:19:45 -0400 Subject: [Biojava-dev] [Bug 2297] New: XMLDistributionWriter closes OutputStreams Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2297 Summary: XMLDistributionWriter closes OutputStreams Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: dist/dp AssignedTo: biojava-dev at biojava.org ReportedBy: mark.schreiber at novartis.com The XMLDistributionWriter has a couple of places where it closes the output stream after writing out a distribution. This is bad if you want to write more than one and really bad if you use System.out!! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 17 21:26:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 17 May 2007 21:26:11 -0400 Subject: [Biojava-dev] [Bug 2298] New: Use of Big Decimal in DistributionTrainer Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2298 Summary: Use of Big Decimal in DistributionTrainer Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Windows XP Status: NEW Severity: enhancement Priority: P2 Component: dist/dp AssignedTo: biojava-dev at biojava.org ReportedBy: mark.schreiber at novartis.com It would be more accurate if the Distribution API used BigDecimal internally as much as possible to minimize rounding errors. Likely places that need changing are Distribution, DistributionTrainer, and the DP package. An alternative would be to make a biojavax extension of these classes that has methods that use BigDecimal directly. This would be desirable for biojava 1.6 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 17 23:44:41 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 17 May 2007 23:44:41 -0400 Subject: [Biojava-dev] [Bug 2297] XMLDistributionWriter closes OutputStreams In-Reply-To: Message-ID: <200705180344.l4I3if2R016019@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2297 mark.schreiber at novartis.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mark.schreiber at novartis.com 2007-05-17 23:44 EST ------- Fixed to no longer close streams. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 18 09:54:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 May 2007 09:54:07 -0400 Subject: [Biojava-dev] [Bug 2298] Use of Big Decimal in DistributionTrainer In-Reply-To: Message-ID: <200705181354.l4IDs7Xf007500@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2298 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from holland at ebi.ac.uk 2007-05-18 09:54 EST ------- Do you want double changed to BigDecimal for _every_ occurrence of double (e.g. weights, scores, etc.) or just for one of these occurrences? Modifying the existing classes is going to produce very inefficient code as there is a lot of use of the public API internally, which would require boxing/unboxing of double values to/from BigDecimal values. Such changes are possible but would cause very slow code due to the overhead of doing all these conversions. Better quality code could be produced if we just rewrote the distribution code completely with BigDecimals in mind. Extending the existing modules would not suffice because the public API is too tightly linked with the internal calculations, and to add in BigDecimal-awareness would be tantamount to a complete rewrite anyway! I think we can leave this until 1.6, but I'll leave it as open and assigned so that we don't forget it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 24 03:10:21 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 24 May 2007 03:10:21 -0400 Subject: [Biojava-dev] [Bug 2301] New: Initialization error of org.biojava.bio.alignment.NeedlemanWunsch when considering affine gap penalty Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2301 Summary: Initialization error of org.biojava.bio.alignment.NeedlemanWunsch when considering affine gap penalty Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: alignment AssignedTo: biojava-dev at biojava.org ReportedBy: cwu2 at eecs.wsu.edu CC: cwu2 at eecs.wsu.edu In the class org.biojava.bio.alignment.NeedlemanWunsch, the initilization of the first row and first column should be the following code when considering affine gap. Otherwise, the first CostMatrix[1][0] will not get the right value! Thanks! for (i=1; i<=query.length();i++) { // CostMatrix[i][0] = CostMatrix[i-1][0] + delete; E[i][0] = Double.POSITIVE_INFINITY; CostMatrix[i][0] = F[i][0] = delete + i*gapExt; } for (j=1; j<=subject.length(); j++) { // CostMatrix[0][j] = CostMatrix[0][j-1] + insert; F[0][j] = Double.POSITIVE_INFINITY; CostMatrix[0][j] = E[0][j] = insert + j*gapExt; } -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rejarohit2004 at gmail.com Tue May 1 05:14:48 2007 From: rejarohit2004 at gmail.com (rohit reja) Date: Tue, 01 May 2007 09:14:48 -0000 Subject: [Biojava-dev] query Message-ID: <29c042ff0705010214q4f64bbe3r5f704665cec40693@mail.gmail.com> hello all i m a novice programmer in java with the knowledge of java core i want to build up a biological database using biojava is there need to go for J2EE before learning biojava? Is biojava incorporated in NetBeans IDE.? Regards -- Rohit Reja 3rd -B.tech-Bioinformatics VIT University Vellore From FLong at skcc.org Tue May 8 03:23:00 2007 From: FLong at skcc.org (Fred Long) Date: Tue, 08 May 2007 07:23:00 -0000 Subject: [Biojava-dev] fixed and streamlined SmithWaterman.java Message-ID: I found that SmithWaterman was calling super() with the wrong parameters, which caused getMatch() to return the wrong value. While fixing that I removed a lot of the redundancy from SmithWaterman. Here is the patch file is anybody is interested. Disclaimer: I'm not a BioJava developer. FL ------------------------------------ *** old/NeedlemanWunsch.java 2007-05-08 00:05:20.000000000 -0700 --- NeedlemanWunsch.java 2007-05-08 00:05:38.000000000 -0700 *************** *** 63,69 **** protected SubstitutionMatrix subMatrix; protected Alignment pairalign; protected String alignment; ! private double insert, delete, gapExt, match, replace; /** Constructs a new Object with the given parameters based on the Needleman-Wunsch algorithm --- 63,69 ---- protected SubstitutionMatrix subMatrix; protected Alignment pairalign; protected String alignment; ! protected double insert, delete, gapExt, match, replace; /** Constructs a new Object with the given parameters based on the Needleman-Wunsch algorithm *************** *** 517,523 **** * target sequence * @return The score for the given substitution. */ ! private double matchReplace(Sequence query, Sequence subject, int i, int j) { try { return subMatrix.getValueAt(query.symbolAt(i), subject.symbolAt(j)); } catch (Exception exc) { --- 517,523 ---- * target sequence * @return The score for the given substitution. */ ! protected double matchReplace(Sequence query, Sequence subject, int i, int j) { try { return subMatrix.getValueAt(query.symbolAt(i), subject.symbolAt(j)); } catch (Exception exc) { *** old/SmithWaterman.java 2007-05-08 00:05:20.000000000 -0700 --- SmithWaterman.java 2007-05-08 00:05:38.000000000 -0700 *************** *** 53,141 **** public class SmithWaterman extends NeedlemanWunsch { - private double match, replace, insert, delete, gapExt; private double[][] scoreMatrix; /** Constructs the new SmithWaterman alignment object. Alignments are only performed, * if the alphabet of the given SubstitutionMatrix equals the alpabet of ! * both the query and the target Sequence. The alignment parameters here ! * are expenses and not scores as they are in the NeedlemanWunsch object. ! * scores are just given by multipliing the expenses with (-1). For example ! * you could use parameters like "-2, 5, 3, 3, 0". If the expenses for gap extension * are equal to the cost of starting a gap (delete or insert), no affine gap penalties * are used, which saves memory. * ! * @param match expenses for a match ! * @param replace expenses for a replace operation ! * @param insert expenses for a gap opening in the query sequence ! * @param delete expenses for a gap opening in the target sequence ! * @param gapExtend expenses for the extension of a gap which was started earlier. * @param matrix the SubstitutionMatrix object to use. */ public SmithWaterman(double match, double replace, double insert, double delete, double gapExtend, SubstitutionMatrix matrix) { ! super(insert, delete, gapExtend, match, replace, matrix); ! this.match = -match; ! this.replace = -replace; ! this.insert = -insert; ! this.delete = -delete; ! this.gapExt = -gapExtend; ! this.subMatrix = matrix; ! this.alignment = ""; } - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for an insert operation to the specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param ins costs for a single insert operation - */ - public void setInsert(double ins) { - this.insert = -ins; - } - - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for a delete operation to the specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param del costs for a single deletion operation - */ - public void setDelete(double del) { - this.delete = -del; - } - - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for an extension of any gap (insert or delete) to the - * specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param ge costs for any gap extension - */ - public void setGapExt(double ge) { - this.gapExt = -ge; - } - - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for a match operation to the specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param ma costs for a single match operation - */ - public void setMatch(double ma) { - this.match = -ma; - } - - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for a replace operation to the specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param rep costs for a single replace operation - */ - public void setReplace(double rep) { - this.replace = -rep; - } - /** Overrides the method inherited from the NeedlemanWunsch and performs only a local alignment. * It finds only the longest common subsequence. This is good for the beginning, but it might --- 53,79 ---- public class SmithWaterman extends NeedlemanWunsch { private double[][] scoreMatrix; /** Constructs the new SmithWaterman alignment object. Alignments are only performed, * if the alphabet of the given SubstitutionMatrix equals the alpabet of ! * both the query and the target Sequence. If the expenses for gap extension * are equal to the cost of starting a gap (delete or insert), no affine gap penalties * are used, which saves memory. * ! * @param match expenses for a match (usually < 0) ! * @param replace expenses for a replace operation (usually > 0) ! * @param insert expenses for a gap opening in the query sequence (usually > 0) ! * @param delete expenses for a gap opening in the target sequence (usually > 0) ! * @param gapExtend expenses for the extension of a gap which was started earlier (usually >= 0) * @param matrix the SubstitutionMatrix object to use. */ public SmithWaterman(double match, double replace, double insert, double delete, double gapExtend, SubstitutionMatrix matrix) { ! super(match, replace, insert, delete, gapExtend, matrix); } /** Overrides the method inherited from the NeedlemanWunsch and performs only a local alignment. * It finds only the longest common subsequence. This is good for the beginning, but it might *************** *** 186,193 **** for (i=1; i<=query.length(); i++) for (j=1; j<=subject.length(); j++) { ! E[i][j] = Math.max(E[i][j-1], scoreMatrix[i][j-1] + insert) + gapExt; ! F[i][j] = Math.max(F[i-1][j], scoreMatrix[i-1][j] + delete) + gapExt; scoreMatrix[i][j] = max(0.0, E[i][j], F[i][j], scoreMatrix[i-1][j-1] + matchReplace(query, subject, i, j)); if (scoreMatrix[i][j] > scoreMatrix[maxI][maxJ]) { --- 124,131 ---- for (i=1; i<=query.length(); i++) for (j=1; j<=subject.length(); j++) { ! E[i][j] = Math.max(E[i][j-1], scoreMatrix[i][j-1] - insert) - gapExt; ! F[i][j] = Math.max(F[i-1][j], scoreMatrix[i-1][j] - delete) - gapExt; scoreMatrix[i][j] = max(0.0, E[i][j], F[i][j], scoreMatrix[i-1][j-1] + matchReplace(query, subject, i, j)); if (scoreMatrix[i][j] > scoreMatrix[maxI][maxJ]) { *************** *** 223,229 **** // Insert || finish gap if extended gap is opened } else if (scoreMatrix[i][j] == E[i][j] || gap_extend[0]) { //check if gap has been extended or freshly opened ! gap_extend[0] = (E[i][j] != scoreMatrix[i][j-1] + insert + gapExt); align[0] = '-' + align[0]; align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; --- 161,167 ---- // Insert || finish gap if extended gap is opened } else if (scoreMatrix[i][j] == E[i][j] || gap_extend[0]) { //check if gap has been extended or freshly opened ! gap_extend[0] = (E[i][j] != scoreMatrix[i][j-1] - insert - gapExt); align[0] = '-' + align[0]; align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; *************** *** 232,238 **** // Delete || finish gap if extended gap is opened } else { //check if gap has been extended or freshly opened ! gap_extend[1] = (F[i][j] != scoreMatrix[i-1][j] + delete + gapExt); align[0] = st.tokenizeSymbol(query.symbolAt(i--)) + align[0]; align[1] = '-' + align[1]; --- 170,176 ---- // Delete || finish gap if extended gap is opened } else { //check if gap has been extended or freshly opened ! gap_extend[1] = (F[i][j] != scoreMatrix[i-1][j] - delete - gapExt); align[0] = st.tokenizeSymbol(query.symbolAt(i--)) + align[0]; align[1] = '-' + align[1]; *************** *** 257,264 **** scoreMatrix[i][j] = max( 0.0, ! scoreMatrix[i-1][j] + delete, ! scoreMatrix[i][j-1] + insert, scoreMatrix[i-1][j-1] + matchReplace(query, subject, i, j) ); --- 195,202 ---- scoreMatrix[i][j] = max( 0.0, ! scoreMatrix[i-1][j] - delete, ! scoreMatrix[i][j-1] - insert, scoreMatrix[i-1][j-1] + matchReplace(query, subject, i, j) ); *************** *** 291,297 **** align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; // Insert ! } else if (scoreMatrix[i][j] == scoreMatrix[i][j-1] + insert) { align[0] = '-' + align[0]; align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; path = ' ' + path; --- 229,235 ---- align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; // Insert ! } else if (scoreMatrix[i][j] == scoreMatrix[i][j-1] - insert) { align[0] = '-' + align[0]; align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; path = ' ' + path; *************** *** 382,409 **** if ((y > z)) return y; return z; } - - /** This method computes the scores for the substution of the i-th symbol - * of query by the j-th symbol of subject. - * - * @param query The query sequence - * @param subject The target sequence - * @param i The position of the symbol under consideration within the - * query sequence (starting from one) - * @param j The position of the symbol under consideration within the - * target sequence - * @return The score for the given substitution. - */ - private double matchReplace(Sequence query, Sequence subject, int i, int j) { - try { - return subMatrix.getValueAt(query.symbolAt(i), subject.symbolAt(j)); - } catch (Exception exc) { - if (query.symbolAt(i).getMatches().contains(subject.symbolAt(j)) || - subject.symbolAt(j).getMatches().contains(query.symbolAt(i))) - return match; - return replace; - } - } - - } --- 320,323 ---- From blee34 at mail.gatech.edu Sat May 19 11:29:54 2007 From: blee34 at mail.gatech.edu (blee34 at mail.gatech.edu) Date: Sat, 19 May 2007 11:29:54 -0400 Subject: [Biojava-dev] GSoC student Message-ID: <1179588594.464f17f2a2f94@webmail.mail.gatech.edu> Dear BioJava group Hi, My name Bohyun Lee who will be participating in BioJava project this summer (as a part of Google Summer of Code program, primary mentor: Richard Holland). Especially, I'm interested in Phylogeny reconstruction method and I'm planning to build some APIs for it. (including NEXUS parser, Tree building method, distance calculation methods, etc.) I'll be starting my project from comming week and I just wanted to introduce myself to the group before it starts. If any of you can take a look at my project plan and share your thoughts about it, I would be more than happy to listen. I'm looking forward to be a part of BioJava group this summer and I wish a great summer to everyone as well. Thank you very much. Best regards, Bohyun Lee -------------- next part -------------- A non-text attachment was scrubbed... Name: BioJava_plan.doc Type: application/msword Size: 28672 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biojava-dev/attachments/20070519/2ea8ae19/attachment-0001.doc From mark.schreiber at novartis.com Thu May 31 21:32:32 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 1 Jun 2007 09:32:32 +0800 Subject: [Biojava-dev] fixed and streamlined SmithWaterman.java Message-ID: Hi - Could the person who maintains the SW code (Andreas Drager??) take a look at this and check it in if required? Thanks. - Mark "Fred Long" Sent by: biojava-dev-bounces at lists.open-bio.org 05/08/2007 03:15 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] fixed and streamlined SmithWaterman.java I found that SmithWaterman was calling super() with the wrong parameters, which caused getMatch() to return the wrong value. While fixing that I removed a lot of the redundancy from SmithWaterman. Here is the patch file is anybody is interested. Disclaimer: I'm not a BioJava developer. FL ------------------------------------ *** old/NeedlemanWunsch.java 2007-05-08 00:05:20.000000000 -0700 --- NeedlemanWunsch.java 2007-05-08 00:05:38.000000000 -0700 *************** *** 63,69 **** protected SubstitutionMatrix subMatrix; protected Alignment pairalign; protected String alignment; ! private double insert, delete, gapExt, match, replace; /** Constructs a new Object with the given parameters based on the Needleman-Wunsch algorithm --- 63,69 ---- protected SubstitutionMatrix subMatrix; protected Alignment pairalign; protected String alignment; ! protected double insert, delete, gapExt, match, replace; /** Constructs a new Object with the given parameters based on the Needleman-Wunsch algorithm *************** *** 517,523 **** * target sequence * @return The score for the given substitution. */ ! private double matchReplace(Sequence query, Sequence subject, int i, int j) { try { return subMatrix.getValueAt(query.symbolAt(i), subject.symbolAt(j)); } catch (Exception exc) { --- 517,523 ---- * target sequence * @return The score for the given substitution. */ ! protected double matchReplace(Sequence query, Sequence subject, int i, int j) { try { return subMatrix.getValueAt(query.symbolAt(i), subject.symbolAt(j)); } catch (Exception exc) { *** old/SmithWaterman.java 2007-05-08 00:05:20.000000000 -0700 --- SmithWaterman.java 2007-05-08 00:05:38.000000000 -0700 *************** *** 53,141 **** public class SmithWaterman extends NeedlemanWunsch { - private double match, replace, insert, delete, gapExt; private double[][] scoreMatrix; /** Constructs the new SmithWaterman alignment object. Alignments are only performed, * if the alphabet of the given SubstitutionMatrix equals the alpabet of ! * both the query and the target Sequence. The alignment parameters here ! * are expenses and not scores as they are in the NeedlemanWunsch object. ! * scores are just given by multipliing the expenses with (-1). For example ! * you could use parameters like "-2, 5, 3, 3, 0". If the expenses for gap extension * are equal to the cost of starting a gap (delete or insert), no affine gap penalties * are used, which saves memory. * ! * @param match expenses for a match ! * @param replace expenses for a replace operation ! * @param insert expenses for a gap opening in the query sequence ! * @param delete expenses for a gap opening in the target sequence ! * @param gapExtend expenses for the extension of a gap which was started earlier. * @param matrix the SubstitutionMatrix object to use. */ public SmithWaterman(double match, double replace, double insert, double delete, double gapExtend, SubstitutionMatrix matrix) { ! super(insert, delete, gapExtend, match, replace, matrix); ! this.match = -match; ! this.replace = -replace; ! this.insert = -insert; ! this.delete = -delete; ! this.gapExt = -gapExtend; ! this.subMatrix = matrix; ! this.alignment = ""; } - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for an insert operation to the specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param ins costs for a single insert operation - */ - public void setInsert(double ins) { - this.insert = -ins; - } - - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for a delete operation to the specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param del costs for a single deletion operation - */ - public void setDelete(double del) { - this.delete = -del; - } - - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for an extension of any gap (insert or delete) to the - * specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param ge costs for any gap extension - */ - public void setGapExt(double ge) { - this.gapExt = -ge; - } - - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for a match operation to the specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param ma costs for a single match operation - */ - public void setMatch(double ma) { - this.match = -ma; - } - - /** Overrides the method inherited from the NeedlemanWunsch and - * sets the penalty for a replace operation to the specified value. - * Reason: internaly scores are used instead of penalties so that - * the value is muliplied with -1. - * @param rep costs for a single replace operation - */ - public void setReplace(double rep) { - this.replace = -rep; - } - /** Overrides the method inherited from the NeedlemanWunsch and performs only a local alignment. * It finds only the longest common subsequence. This is good for the beginning, but it might --- 53,79 ---- public class SmithWaterman extends NeedlemanWunsch { private double[][] scoreMatrix; /** Constructs the new SmithWaterman alignment object. Alignments are only performed, * if the alphabet of the given SubstitutionMatrix equals the alpabet of ! * both the query and the target Sequence. If the expenses for gap extension * are equal to the cost of starting a gap (delete or insert), no affine gap penalties * are used, which saves memory. * ! * @param match expenses for a match (usually < 0) ! * @param replace expenses for a replace operation (usually > 0) ! * @param insert expenses for a gap opening in the query sequence (usually > 0) ! * @param delete expenses for a gap opening in the target sequence (usually > 0) ! * @param gapExtend expenses for the extension of a gap which was started earlier (usually >= 0) * @param matrix the SubstitutionMatrix object to use. */ public SmithWaterman(double match, double replace, double insert, double delete, double gapExtend, SubstitutionMatrix matrix) { ! super(match, replace, insert, delete, gapExtend, matrix); } /** Overrides the method inherited from the NeedlemanWunsch and performs only a local alignment. * It finds only the longest common subsequence. This is good for the beginning, but it might *************** *** 186,193 **** for (i=1; i<=query.length(); i++) for (j=1; j<=subject.length(); j++) { ! E[i][j] = Math.max(E[i][j-1], scoreMatrix[i][j-1] + insert) + gapExt; ! F[i][j] = Math.max(F[i-1][j], scoreMatrix[i-1][j] + delete) + gapExt; scoreMatrix[i][j] = max(0.0, E[i][j], F[i][j], scoreMatrix[i-1][j-1] + matchReplace(query, subject, i, j)); if (scoreMatrix[i][j] > scoreMatrix[maxI][maxJ]) { --- 124,131 ---- for (i=1; i<=query.length(); i++) for (j=1; j<=subject.length(); j++) { ! E[i][j] = Math.max(E[i][j-1], scoreMatrix[i][j-1] - insert) - gapExt; ! F[i][j] = Math.max(F[i-1][j], scoreMatrix[i-1][j] - delete) - gapExt; scoreMatrix[i][j] = max(0.0, E[i][j], F[i][j], scoreMatrix[i-1][j-1] + matchReplace(query, subject, i, j)); if (scoreMatrix[i][j] > scoreMatrix[maxI][maxJ]) { *************** *** 223,229 **** // Insert || finish gap if extended gap is opened } else if (scoreMatrix[i][j] == E[i][j] || gap_extend[0]) { //check if gap has been extended or freshly opened ! gap_extend[0] = (E[i][j] != scoreMatrix[i][j-1] + insert + gapExt); align[0] = '-' + align[0]; align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; --- 161,167 ---- // Insert || finish gap if extended gap is opened } else if (scoreMatrix[i][j] == E[i][j] || gap_extend[0]) { //check if gap has been extended or freshly opened ! gap_extend[0] = (E[i][j] != scoreMatrix[i][j-1] - insert - gapExt); align[0] = '-' + align[0]; align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; *************** *** 232,238 **** // Delete || finish gap if extended gap is opened } else { //check if gap has been extended or freshly opened ! gap_extend[1] = (F[i][j] != scoreMatrix[i-1][j] + delete + gapExt); align[0] = st.tokenizeSymbol(query.symbolAt(i--)) + align[0]; align[1] = '-' + align[1]; --- 170,176 ---- // Delete || finish gap if extended gap is opened } else { //check if gap has been extended or freshly opened ! gap_extend[1] = (F[i][j] != scoreMatrix[i-1][j] - delete - gapExt); align[0] = st.tokenizeSymbol(query.symbolAt(i--)) + align[0]; align[1] = '-' + align[1]; *************** *** 257,264 **** scoreMatrix[i][j] = max( 0.0, ! scoreMatrix[i-1][j] + delete, ! scoreMatrix[i][j-1] + insert, scoreMatrix[i-1][j-1] + matchReplace(query, subject, i, j) ); --- 195,202 ---- scoreMatrix[i][j] = max( 0.0, ! scoreMatrix[i-1][j] - delete, ! scoreMatrix[i][j-1] - insert, scoreMatrix[i-1][j-1] + matchReplace(query, subject, i, j) ); *************** *** 291,297 **** align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; // Insert ! } else if (scoreMatrix[i][j] == scoreMatrix[i][j-1] + insert) { align[0] = '-' + align[0]; align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; path = ' ' + path; --- 229,235 ---- align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; // Insert ! } else if (scoreMatrix[i][j] == scoreMatrix[i][j-1] - insert) { align[0] = '-' + align[0]; align[1] = st.tokenizeSymbol(subject.symbolAt(j--)) + align[1]; path = ' ' + path; *************** *** 382,409 **** if ((y > z)) return y; return z; } - - /** This method computes the scores for the substution of the i-th symbol - * of query by the j-th symbol of subject. - * - * @param query The query sequence - * @param subject The target sequence - * @param i The position of the symbol under consideration within the - * query sequence (starting from one) - * @param j The position of the symbol under consideration within the - * target sequence - * @return The score for the given substitution. - */ - private double matchReplace(Sequence query, Sequence subject, int i, int j) { - try { - return subMatrix.getValueAt(query.symbolAt(i), subject.symbolAt(j)); - } catch (Exception exc) { - if (query.symbolAt(i).getMatches().contains(subject.symbolAt(j)) || - subject.symbolAt(j).getMatches().contains(query.symbolAt(i))) - return match; - return replace; - } - } - - } --- 320,323 ---- _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev