From holland at ebi.ac.uk Wed Jan 2 06:52:12 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 02 Jan 2008 11:52:12 +0000 Subject: [Biojava-dev] BioJava 3 design discussion coming to an end Message-ID: <477B7AEC.9000401@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all. At the end of January I will be taking the contents of our BioJava 3 discussion wiki page and compiling them into a more formal design proposal. If you have made any comments elsewhere (e.g. by email) which you would like to be considered in the final design proposal, then please add them to the wiki page (or its associated Talk page) before the end of the month. (I won't be trawling through email archives looking for comments so you really must copy your comments across to the wiki if you want them to be included!). The wiki address is: http://www.biojava.org/wiki/BioJava3_Proposal cheers, Richard - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHe3rs4C5LeMEKA/QRAgHqAJ0VR2utTbzjfPYNPXINv26yc1PRNgCZAUnX 978uKqgbePpnHm+3Ynfp7X4= =8nqF -----END PGP SIGNATURE----- From ap3 at sanger.ac.uk Sun Jan 6 07:41:37 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Sun, 6 Jan 2008 12:41:37 +0000 (GMT) Subject: [Biojava-dev] bioperl like blastparser Message-ID: Hi Michael, I just had a look at your patch for the query length. Several of the unit tests are now failing at org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase.java:143) The problem is that most blast related unit tests extend the SSBindCase, which expects a fixed number of attributes. With the new patch some of the blast-flavors have the additional queryLength attribute. Could you have a look at the behaviour of the parser for some of the files where the tests now fail? If you think the new behaviour of the parser is correct, we can simply update the tests to accept the different number of attributes. Thanks, Andreas -------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From markjschreiber at gmail.com Mon Jan 7 03:05:33 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 7 Jan 2008 16:05:33 +0800 Subject: [Biojava-dev] Biojava - svn migration In-Reply-To: References: <93b45ca50712271748x3019ce27m2d45008c8ce13ece@mail.gmail.com> Message-ID: <93b45ca50801070005n10338f8q572b1295bc9ccdff@mail.gmail.com> Hi - This URL also works on Windows Vista. I made a secure tunnel using plink which is part of the PuTTY package. Thanks, - Mark On Dec 28, 2007 12:34 PM, Michael Heuer wrote: > Hello Mark, Andreas > > I was able to check out on linux with commandline subversion and > in eclipse with the following URL: > > svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk > > michael > > > > On Fri, 28 Dec 2007, Mark Schreiber wrote: > > > Hi all - > > > > I am trying to do a SVN check out with Netbeans but the connection > > just seems to hang without doing anything (same happens with command > > line?). I am using the path specified in the biojava wiki but the > > conversation below suggests that the actual path may be different. > > What then is the 'home' path? > > > > Also, does the SSH+SVN command need to be prepended by an SSH command > > (or plink -ssh on windows)? > > > > - Mark > > > > On Dec 28, 2007 5:37 AM, Hilmar Lapp wrote: > > > I see. Makes sense. -hilmar > > > > > > > > > On Dec 27, 2007, at 4:30 PM, Jason Stajich wrote: > > > > > > > My idea was that just like > > > > /home/reposiitory/biojava > > > > we'd put the SVN in > > > > /home/svn-repository/biojava > > > > > > > > So each of the biojava SVN sub-projects ought would be > > > > /home/svn-repository/biojava/biojava-live > > > > /home/svn-repository/biojava/biojava-ensj > > > > > > > > /home/svn is really the homedir for the svn user and som utility > > > > stuff like the svn login passwords so I think it is better not to > > > > put the repos in there. > > > > > > > > -jason > > > > On Dec 27, 2007, at 1:31 PM, Hilmar Lapp wrote: > > > > > > > >> > > > >> On Dec 27, 2007, at 12:32 PM, Chris Fields wrote: > > > >> > > > >>> > > > >>> I agree, but there is already a /home/svn directory which appears > > > >>> related to blipkit. We would need to move the blipkit stuff into > > > >>> it's own subdir and go from there. > > > >>> > > > >> > > > >> > > > >> I.e., it was set up such that blipkit would be the only project > > > >> with in it? I'd assume that was by mistake. Also, the last update > > > >> that I have is that blipkit hasn't been active for a while, though > > > >> I'm not sure. > > > >> > > > >> ChrisM - do you recall the decisions leading to the blipkit svn > > > >> setup? Could it be moved to a subdirectory as ChrisF suggests? > > > >> > > > >> -hilmar > > > >> -- > > > >> =========================================================== > > > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > > > >> =========================================================== > > > >> > > > >> > > > >> > > > > > > > > > > -- > > > =========================================================== > > > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > > > =========================================================== > > > > > > > > > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > From ayates at ebi.ac.uk Mon Jan 7 04:34:33 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 7 Jan 2008 09:34:33 +0000 Subject: [Biojava-dev] Error while reading byte data for creating a Trace. In-Reply-To: <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> Message-ID: <04BE1C71-B7CF-4428-86C7-300E4283DAE8@ebi.ac.uk> Hi, As far as I am aware there isn't a problem with the current ABI parser however if you could send a code snippit of reading in the byte array & the stack trace of the index out of bounds exception that would be most helpful Andy On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > Hi all, > I am having a byte array which is having the data from an .ab1 > file.The > biojava library provides a class called as ABITrace which takes as > input > either a byte[] array , a file or a url.If i use the later > parameters (the > file or the url )the program works but if I pass the byte array to the > constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a > problem with the ABITrace class or how can I bypass this particular > error. > I am printing the length of the byte array and it comes to > 144930...Can > that cause a problem in my code? > > Thanks in advance. > Abhinav > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From markjschreiber at gmail.com Mon Jan 7 04:43:14 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 7 Jan 2008 17:43:14 +0800 Subject: [Biojava-dev] JUnit Message-ID: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> Hi all - What do people think about adding the JUnit jar to the test directory of the biojava-live repository and make the appropriate changes to the ant classpath? This would make it easier for people to test the package directly from the ant build rather than having to specifically place junit on the system classpath. It would probably make it more likely that people run and contribute tests as well. If we add JUnit 4.1 then it will allow the creation of Unit tests by just annotating class methods. If there are no objections I will add it in the next few days. - Mark From ayates at ebi.ac.uk Mon Jan 7 04:46:56 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 7 Jan 2008 09:46:56 +0000 Subject: [Biojava-dev] JUnit In-Reply-To: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> References: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> Message-ID: <42D03EDC-6AEE-4FF3-A3BC-12B304AE91EE@ebi.ac.uk> Yeah I'm happy for that to happen. Just on a side note is Junit 4 compatible with Junit 3's tests? Otherwise will we have to maintain two sets of unit test directories depending on the age of the test? Andy On 7 Jan 2008, at 09:43, Mark Schreiber wrote: > Hi all - > > What do people think about adding the JUnit jar to the test directory > of the biojava-live repository and make the appropriate changes to the > ant classpath? This would make it easier for people to test the > package directly from the ant build rather than having to specifically > place junit on the system classpath. It would probably make it more > likely that people run and contribute tests as well. > > If we add JUnit 4.1 then it will allow the creation of Unit tests by > just annotating class methods. > > If there are no objections I will add it in the next few days. > > - Mark > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From markjschreiber at gmail.com Mon Jan 7 04:48:31 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 7 Jan 2008 17:48:31 +0800 Subject: [Biojava-dev] JUnit In-Reply-To: <42D03EDC-6AEE-4FF3-A3BC-12B304AE91EE@ebi.ac.uk> References: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> <42D03EDC-6AEE-4FF3-A3BC-12B304AE91EE@ebi.ac.uk> Message-ID: <93b45ca50801070148kc1fb1bel88b9c365fca4d27a@mail.gmail.com> >From my preliminary tests it seems to be compatable. It seems that version 4.4 is out now. Allows all kinds of strange assertions, assumptions and theorys. - Mark On Jan 7, 2008 5:46 PM, Andy Yates wrote: > Yeah I'm happy for that to happen. Just on a side note is Junit 4 > compatible with Junit 3's tests? Otherwise will we have to maintain > two sets of unit test directories depending on the age of the test? > > Andy > > > On 7 Jan 2008, at 09:43, Mark Schreiber wrote: > > > Hi all - > > > > What do people think about adding the JUnit jar to the test directory > > of the biojava-live repository and make the appropriate changes to the > > ant classpath? This would make it easier for people to test the > > package directly from the ant build rather than having to specifically > > place junit on the system classpath. It would probably make it more > > likely that people run and contribute tests as well. > > > > If we add JUnit 4.1 then it will allow the creation of Unit tests by > > just annotating class methods. > > > > If there are no objections I will add it in the next few days. > > > > - Mark > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From ayates at ebi.ac.uk Mon Jan 7 04:50:02 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 7 Jan 2008 09:50:02 +0000 Subject: [Biojava-dev] JUnit In-Reply-To: <93b45ca50801070148kc1fb1bel88b9c365fca4d27a@mail.gmail.com> References: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> <42D03EDC-6AEE-4FF3-A3BC-12B304AE91EE@ebi.ac.uk> <93b45ca50801070148kc1fb1bel88b9c365fca4d27a@mail.gmail.com> Message-ID: Sounds good to me :) Andy On 7 Jan 2008, at 09:48, Mark Schreiber wrote: >> From my preliminary tests it seems to be compatable. > > It seems that version 4.4 is out now. Allows all kinds of strange > assertions, assumptions and theorys. > > - Mark > > On Jan 7, 2008 5:46 PM, Andy Yates wrote: >> Yeah I'm happy for that to happen. Just on a side note is Junit 4 >> compatible with Junit 3's tests? Otherwise will we have to maintain >> two sets of unit test directories depending on the age of the test? >> >> Andy >> >> >> On 7 Jan 2008, at 09:43, Mark Schreiber wrote: >> >>> Hi all - >>> >>> What do people think about adding the JUnit jar to the test >>> directory >>> of the biojava-live repository and make the appropriate changes to >>> the >>> ant classpath? This would make it easier for people to test the >>> package directly from the ant build rather than having to >>> specifically >>> place junit on the system classpath. It would probably make it more >>> likely that people run and contribute tests as well. >>> >>> If we add JUnit 4.1 then it will allow the creation of Unit tests by >>> just annotating class methods. >>> >>> If there are no objections I will add it in the next few days. >>> >>> - Mark >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> From ap3 at sanger.ac.uk Mon Jan 7 05:08:42 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 7 Jan 2008 10:08:42 +0000 Subject: [Biojava-dev] JUnit In-Reply-To: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> References: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> Message-ID: <034C9BC2-8AA5-4220-BB34-C3D86E991200@sanger.ac.uk> > What do people think about adding the JUnit jar to the test directory > of the biojava-live repository and make the appropriate changes to the > ant classpath? This would make it easier for people to test the > I would suggest to move all the jar files where we have dependencies on into a common subdirectory. e.g something called "libs" or "jars" Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From michaelgang at gmail.com Mon Jan 7 05:16:22 2008 From: michaelgang at gmail.com (Michael Gang) Date: Mon, 7 Jan 2008 12:16:22 +0200 Subject: [Biojava-dev] Fwd: bioperl like blastparser In-Reply-To: <6994d82b0801070050t5d9b513fhc53ff758554116ff@mail.gmail.com> References: <6994d82b0801070050t5d9b513fhc53ff758554116ff@mail.gmail.com> Message-ID: <6994d82b0801070216q1df26e72ic131592048100f3f@mail.gmail.com> Hi Andreas, You are correct. The junit.jar library was missing in my ant_home. Eclipse wrote that it was running the tests, but did not run any. Now I corrected it and see that tests are failing. I ran the program BlastEcho.java manually on the blast test files and on the ncbi blast. Judging after manually curation it worked good but at wu_blast id did not parse the query length. The reason is that in wu_blast the query length line has just 8 spaces at the beginning instead of 9. So I corrected the line which identifies the querylength at org.biojava.bio.program.sax.BlastSaxParser line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { Now it works also on wu_blast. It would be now a good idea to update the blast tests regarding the number of arguments and see if the fail still. Thanks in advance, Michael On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: > Hi Michael, > > I just had a look at your patch for the query length. > Several of the unit tests are now failing at > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase.java:143) > > The problem is that most blast related unit tests extend the SSBindCase, > which expects a fixed number of attributes. With the new patch some of the > blast-flavors have the additional queryLength attribute. > > Could you have a look at the behaviour of the parser for some of the files > where the tests now fail? If you think the new behaviour of the > parser is correct, we can simply update the tests to accept the different > number of attributes. > > Thanks, > Andreas > > > -------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > From holland at ebi.ac.uk Mon Jan 7 07:01:55 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) Subject: [Biojava-dev] Error while reading byte data for creating a Trace. In-Reply-To: <04BE1C71-B7CF-4428-86C7-300E4283DAE8@ebi.ac.uk> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> <04BE1C71-B7CF-4428-86C7-300E4283DAE8@ebi.ac.uk> Message-ID: <50442.80.42.95.78.1199707315.squirrel@webmail.ebi.ac.uk> This problem was resolved back in November. For some reason during the last couple of weeks the BioJava mailing list has been sending out occasional duplicate copies of emails sent several months ago! This was one of them. cheers, Richard On Mon, January 7, 2008 9:34 am, Andy Yates wrote: > Hi, > > As far as I am aware there isn't a problem with the current ABI parser > however if you could send a code snippit of reading in the byte array > & the stack trace of the index out of bounds exception that would be > most helpful > > Andy > > On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > >> Hi all, >> I am having a byte array which is having the data from an .ab1 >> file.The >> biojava library provides a class called as ABITrace which takes as >> input >> either a byte[] array , a file or a url.If i use the later >> parameters (the >> file or the url )the program works but if I pass the byte array to the >> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >> problem with the ABITrace class or how can I bypass this particular >> error. >> I am printing the length of the byte array and it comes to >> 144930...Can >> that cause a problem in my code? >> >> Thanks in advance. >> Abhinav >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland BioMart (http://www.biomart.org/) EMBL-EBI Hinxton, Cambridgeshire CB10 1SD, UK From ayates at ebi.ac.uk Mon Jan 7 07:18:50 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 7 Jan 2008 12:18:50 +0000 Subject: [Biojava-dev] Error while reading byte data for creating a Trace. In-Reply-To: <50442.80.42.95.78.1199707315.squirrel@webmail.ebi.ac.uk> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> <04BE1C71-B7CF-4428-86C7-300E4283DAE8@ebi.ac.uk> <50442.80.42.95.78.1199707315.squirrel@webmail.ebi.ac.uk> Message-ID: <065714BD-6D4F-4B5F-8AAE-E6C47C9405AB@ebi.ac.uk> Oh for ... :). Thought I'd seen this one before Andy On 7 Jan 2008, at 12:01, Richard Holland wrote: > This problem was resolved back in November. For some reason during the > last couple of weeks the BioJava mailing list has been sending out > occasional duplicate copies of emails sent several months ago! This > was > one of them. > > cheers, > Richard > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: >> Hi, >> >> As far as I am aware there isn't a problem with the current ABI >> parser >> however if you could send a code snippit of reading in the byte array >> & the stack trace of the index out of bounds exception that would be >> most helpful >> >> Andy >> >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: >> >>> Hi all, >>> I am having a byte array which is having the data from an .ab1 >>> file.The >>> biojava library provides a class called as ABITrace which takes as >>> input >>> either a byte[] array , a file or a url.If i use the later >>> parameters (the >>> file or the url )the program works but if I pass the byte array to >>> the >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is >>> there a >>> problem with the ABITrace class or how can I bypass this particular >>> error. >>> I am printing the length of the byte array and it comes to >>> 144930...Can >>> that cause a problem in my code? >>> >>> Thanks in advance. >>> Abhinav >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > -- > Richard Holland > BioMart (http://www.biomart.org/) > EMBL-EBI > Hinxton, Cambridgeshire CB10 1SD, UK From ap3 at sanger.ac.uk Mon Jan 7 16:54:21 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 7 Jan 2008 21:54:21 +0000 (GMT) Subject: [Biojava-dev] bioperl like blastparser Message-ID: Hi Michael, thanks for your patch, I commited it to the new svn repository and updated the unit tests to now either take 4 or 5 args. Andreas -------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From heuermh at acm.org Mon Jan 7 19:36:04 2008 From: heuermh at acm.org (Michael Heuer) Date: Mon, 7 Jan 2008 19:36:04 -0500 (EST) Subject: [Biojava-dev] JUnit In-Reply-To: <034C9BC2-8AA5-4220-BB34-C3D86E991200@sanger.ac.uk> Message-ID: Andreas Prlic wrote: > > What do people think about adding the JUnit jar to the test directory > > of the biojava-live repository and make the appropriate changes to the > > ant classpath? This would make it easier for people to test the > > > > I would suggest to move all the jar files where we have dependencies > on into a common subdirectory. > e.g something called "libs" or "jars" Using maven would resolve all of these issues. Or alternatively, a maven build can create an ant build.xml that downloads its dependencies from the maven central repository http://maven.apache.org/plugins/maven-ant-plugin/ or there is Ivy for ant, which can be configured to use the maven central repository http://ant.apache.org/ivy/ The 'lib' directory doesn't really have a place any more. michael From markjschreiber at gmail.com Mon Jan 7 22:17:35 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 8 Jan 2008 11:17:35 +0800 Subject: [Biojava-dev] JUnit In-Reply-To: References: <034C9BC2-8AA5-4220-BB34-C3D86E991200@sanger.ac.uk> Message-ID: <93b45ca50801071917t1cef45epf8772b4370ef3f97@mail.gmail.com> Hi - I have added the junit jar and modified the build.xml I will leave the decision about a lib directory etc for some more debate. - Mark On Jan 8, 2008 8:36 AM, Michael Heuer wrote: > Andreas Prlic wrote: > > > > What do people think about adding the JUnit jar to the test directory > > > of the biojava-live repository and make the appropriate changes to the > > > ant classpath? This would make it easier for people to test the > > > > > > > I would suggest to move all the jar files where we have dependencies > > on into a common subdirectory. > > e.g something called "libs" or "jars" > > Using maven would resolve all of these issues. > > Or alternatively, a maven build can create an ant build.xml that downloads > its dependencies from the maven central repository > > http://maven.apache.org/plugins/maven-ant-plugin/ > > or there is Ivy for ant, which can be configured to use the maven central > repository > > http://ant.apache.org/ivy/ > > The 'lib' directory doesn't really have a place any more. > > michael > > From heuermh at acm.org Mon Jan 7 23:55:42 2008 From: heuermh at acm.org (Michael Heuer) Date: Mon, 7 Jan 2008 23:55:42 -0500 (EST) Subject: [Biojava-dev] JUnit In-Reply-To: <93b45ca50801071917t1cef45epf8772b4370ef3f97@mail.gmail.com> Message-ID: Mark Schreiber wrote: > I will leave the decision about a lib directory etc for some more debate. Now that we have a subversion repository in place, I would be happy to create a maven-based build out on branch for consideration at some point. Ideally this would happen after refactoring/cleanup/purge so I have less work to do. ;) michael From michaelgang at gmail.com Tue Jan 8 03:23:56 2008 From: michaelgang at gmail.com (Michael Gang) Date: Tue, 8 Jan 2008 10:23:56 +0200 Subject: [Biojava-dev] read fasta file Message-ID: <6994d82b0801080023y3cdcc005g57b08c6566c37445@mail.gmail.com> Dear All, I want to read a fasta file of dna (the accessions are internal to our company and may not be like the convention), make manipulations on it and write it to another file. When i take the example from the book "Biojava in Anger" it works fine, but I get warnings that the SeqIOTools type is deprecated. When using the RichSequence.IOTools package I have problems that when writing the fasta it changes the fasta header (it adds the lcl: prefix). I want that the fasta header will be in the output file like in the input file. Will the SeqIOTools type supported further ? If not, is there another way to solve the problem ? Thanks in advance, Michael From holland at ebi.ac.uk Tue Jan 8 03:51:32 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 08 Jan 2008 08:51:32 +0000 Subject: [Biojava-dev] read fasta file In-Reply-To: <6994d82b0801080023y3cdcc005g57b08c6566c37445@mail.gmail.com> References: <6994d82b0801080023y3cdcc005g57b08c6566c37445@mail.gmail.com> Message-ID: <47833994.7090608@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 SeqIOTools is deprecated - this means that it _may_ get dropped in a future release and so you can't rely on it being present in any future release. RichSequence.IOTools follows the FASTA format exactly, which requires a namespace prefix in the header, and it will change the existing header if it does not already meet the FASTA standard. There is currently no way to stop it from doing that, although you might want to raise a bug report so that it goes on our list of things to change. You can do that here: http://bugzilla.open-bio.org/enter_bug.cgi?product=BioJava cheers, Richard Michael Gang wrote: > Dear All, > > I want to read a fasta file of dna (the accessions are internal to our > company and may not be like the convention), make manipulations on it > and write it to another file. > When i take the example from the book "Biojava in Anger" it works > fine, but I get warnings that the SeqIOTools type is deprecated. > When using the RichSequence.IOTools package I have problems that when > writing the fasta it changes the fasta header (it adds the lcl: > prefix). > I want that the fasta header will be in the output file like in the input file. > Will the SeqIOTools type supported further ? > If not, is there another way to solve the problem ? > > Thanks in advance, > Michael > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHgzmT4C5LeMEKA/QRAoF8AJ9SLAMGvm7SpByOyfL1/7tUZ9NbZgCgjeTq FjmCDFlMygy68q1zkbpwX2o= =bTSb -----END PGP SIGNATURE----- From bugzilla-daemon at portal.open-bio.org Tue Jan 8 10:00:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Jan 2008 10:00:58 -0500 Subject: [Biojava-dev] [Bug 2432] New: non conventional fasta header && RichSequence.IOTools Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2432 Summary: non conventional fasta header && RichSequence.IOTools Product: BioJava Version: unspecified Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: seq.io AssignedTo: biojava-dev at biojava.org ReportedBy: michaelgang at gmail.com When reading a fasta file with non conventional header (for example company intern accessions) and writing it with RichSequence.IOTools the fasta header get changed. With deprecated SeqIOTools it works -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From minhduc.cao at gmail.com Tue Jan 8 19:57:43 2008 From: minhduc.cao at gmail.com (Minh Duc, Cao) Date: Wed, 9 Jan 2008 11:57:43 +1100 Subject: [Biojava-dev] Problem with read RichFormat file from an applet Message-ID: Hi, I used IOTools.readFastaDNA(in,null) to read Fasta file and, for a stand alone application, it works perfectly. However, when the code is called from an applet, the following exception is thrown Exception in thread "Thread-8" java.lang.ExceptionInInitializerError at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1813) at org.biojava.bio.seq.SimpleFeatureHolder.( SimpleFeatureHolder.java:54) at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature( RichFeature.java:167) at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java :61) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( SimpleRichSequenceBuilder.java:100) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( SimpleRichSequenceBuilder.java:81) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder (SimpleRichSequenceBuilderFactory.java:68) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) at org.biojavax.bio.seq.io.RichStreamReader.nextSequence( RichStreamReader.java:92) at dnaPlatform.function.ReadFormatFileFunction.guessFormat( ReadFormatFileFunction.java:134) at dnaPlatform.gui.RunFunction.run(MainPanel.java:929) Caused by: java.security.AccessControlException: access denied ( java.lang.RuntimePermission createClassLoader) at java.security.AccessControlContext.checkPermission(Unknown Source) at java.security.AccessController.checkPermission(Unknown Source) at java.lang.SecurityManager.checkPermission(Unknown Source) at java.lang.SecurityManager.checkCreateClassLoader(Unknown Source) at java.lang.ClassLoader.(Unknown Source) at org.biojava.utils.bytecode.GeneratedClassLoader.( GeneratedClassLoader.java:29) at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java :51) at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java :58) at org.biojava.bio.seq.FeatureFilter$OnlyChildren.( FeatureFilter.java:1270) ... 11 more It is noted that the applet is signed and can read files from client harddisk if other method is used. Do anyone have an idea how can I go about to fix this problem? Thank you very much Minh From markjschreiber at gmail.com Wed Jan 9 03:30:45 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 9 Jan 2008 16:30:45 +0800 Subject: [Biojava-dev] read fasta file In-Reply-To: <47833994.7090608@ebi.ac.uk> References: <6994d82b0801080023y3cdcc005g57b08c6566c37445@mail.gmail.com> <47833994.7090608@ebi.ac.uk> Message-ID: <93b45ca50801090030t6ffd9907ieb1082fd20c5b713@mail.gmail.com> Along these lines I have plans to add a way to format the Fasta header in the RichSequence.IOTools so that the content of it can be customised. Currently it follows the NCBI model and tries to add everything it can. I would be interested in proposals for a template mechanism. - Mark On Jan 8, 2008 4:51 PM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > SeqIOTools is deprecated - this means that it _may_ get dropped in a > future release and so you can't rely on it being present in any future > release. > > RichSequence.IOTools follows the FASTA format exactly, which requires a > namespace prefix in the header, and it will change the existing header > if it does not already meet the FASTA standard. There is currently no > way to stop it from doing that, although you might want to raise a bug > report so that it goes on our list of things to change. You can do that > here: http://bugzilla.open-bio.org/enter_bug.cgi?product=BioJava > > cheers, > Richard > > > Michael Gang wrote: > > Dear All, > > > > I want to read a fasta file of dna (the accessions are internal to our > > company and may not be like the convention), make manipulations on it > > and write it to another file. > > When i take the example from the book "Biojava in Anger" it works > > fine, but I get warnings that the SeqIOTools type is deprecated. > > When using the RichSequence.IOTools package I have problems that when > > writing the fasta it changes the fasta header (it adds the lcl: > > prefix). > > I want that the fasta header will be in the output file like in the input file. > > Will the SeqIOTools type supported further ? > > If not, is there another way to solve the problem ? > > > > Thanks in advance, > > Michael > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHgzmT4C5LeMEKA/QRAoF8AJ9SLAMGvm7SpByOyfL1/7tUZ9NbZgCgjeTq > FjmCDFlMygy68q1zkbpwX2o= > =bTSb > -----END PGP SIGNATURE----- > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at ebi.ac.uk Wed Jan 9 03:38:12 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 09 Jan 2008 08:38:12 +0000 Subject: [Biojava-dev] Problem with read RichFormat file from an applet In-Reply-To: References: Message-ID: <478487F4.3090209@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello. This is the root of your problem: Caused by: java.security.AccessControlException: access denied ( java.lang.RuntimePermission createClassLoader) at org.biojava.utils.bytecode.GeneratedClassLoader.( GeneratedClassLoader.java:29) at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) The applet runtime environment is not allowing BioJava to create a custom class loader. It's not to do with disk access at all unfortunately. I don't know of a solution myself as I've not done much work with applets. Does anyone else on this list have any suggestions? cheers, Richard Minh Duc, Cao wrote: > Hi, > > I used IOTools.readFastaDNA(in,null) to read Fasta file and, for a stand > alone application, it works perfectly. However, when the code is called from > an applet, the following exception is thrown > > Exception in thread "Thread-8" java.lang.ExceptionInInitializerError > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1813) > at org.biojava.bio.seq.SimpleFeatureHolder.( > SimpleFeatureHolder.java:54) > at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature( > RichFeature.java:167) > at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java > :61) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > SimpleRichSequenceBuilder.java:100) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > SimpleRichSequenceBuilder.java:81) > at > org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder > (SimpleRichSequenceBuilderFactory.java:68) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > at org.biojavax.bio.seq.io.RichStreamReader.nextSequence( > RichStreamReader.java:92) > at dnaPlatform.function.ReadFormatFileFunction.guessFormat( > ReadFormatFileFunction.java:134) > at dnaPlatform.gui.RunFunction.run(MainPanel.java:929) > Caused by: java.security.AccessControlException: access denied ( > java.lang.RuntimePermission createClassLoader) > at java.security.AccessControlContext.checkPermission(Unknown Source) > at java.security.AccessController.checkPermission(Unknown Source) > at java.lang.SecurityManager.checkPermission(Unknown Source) > at java.lang.SecurityManager.checkCreateClassLoader(Unknown Source) > at java.lang.ClassLoader.(Unknown Source) > at org.biojava.utils.bytecode.GeneratedClassLoader.( > GeneratedClassLoader.java:29) > at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) > at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java > :51) > at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java > :58) > at org.biojava.bio.seq.FeatureFilter$OnlyChildren.( > FeatureFilter.java:1270) > ... 11 more > > It is noted that the applet is signed and can read files from client > harddisk if other method is used. > > Do anyone have an idea how can I go about to fix this problem? > > Thank you very much > > Minh > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhIfz4C5LeMEKA/QRAhZqAJ9k36tFYC7wdBt6eScgCn5MK9uVZwCeIVHU R0e4dCpmpjJnHOrfjfw0wYc= =WayD -----END PGP SIGNATURE----- From markjschreiber at gmail.com Wed Jan 9 03:50:11 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 9 Jan 2008 16:50:11 +0800 Subject: [Biojava-dev] Problem with read RichFormat file from an applet In-Reply-To: <478487F4.3090209@ebi.ac.uk> References: <478487F4.3090209@ebi.ac.uk> Message-ID: <93b45ca50801090050j40455c2bid1af46c277a62582@mail.gmail.com> Consulting a good book on the java security model may reveal a way that you can modify the policy to allow this. However, I think you should give serious consideration to why you would want to use an applet in any context. The technology has major limitations and has been long since superceeded by either severlet or other technologies for server side stuff or webstart for client side apps distributed from a server. - Mark On Jan 9, 2008 4:38 PM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello. > > This is the root of your problem: > > Caused by: java.security.AccessControlException: access denied ( > java.lang.RuntimePermission createClassLoader) > at org.biojava.utils.bytecode.GeneratedClassLoader.( > GeneratedClassLoader.java:29) > at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) > > The applet runtime environment is not allowing BioJava to create a > custom class loader. It's not to do with disk access at all unfortunately. > > I don't know of a solution myself as I've not done much work with applets. > > Does anyone else on this list have any suggestions? > > cheers, > Richard > > > > Minh Duc, Cao wrote: > > Hi, > > > > I used IOTools.readFastaDNA(in,null) to read Fasta file and, for a stand > > alone application, it works perfectly. However, when the code is called from > > an applet, the following exception is thrown > > > > Exception in thread "Thread-8" java.lang.ExceptionInInitializerError > > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1813) > > at org.biojava.bio.seq.SimpleFeatureHolder.( > > SimpleFeatureHolder.java:54) > > at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature( > > RichFeature.java:167) > > at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java > > :61) > > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > > SimpleRichSequenceBuilder.java:100) > > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > > SimpleRichSequenceBuilder.java:81) > > at > > org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder > > (SimpleRichSequenceBuilderFactory.java:68) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > at org.biojavax.bio.seq.io.RichStreamReader.nextSequence( > > RichStreamReader.java:92) > > at dnaPlatform.function.ReadFormatFileFunction.guessFormat( > > ReadFormatFileFunction.java:134) > > at dnaPlatform.gui.RunFunction.run(MainPanel.java:929) > > Caused by: java.security.AccessControlException: access denied ( > > java.lang.RuntimePermission createClassLoader) > > at java.security.AccessControlContext.checkPermission(Unknown Source) > > at java.security.AccessController.checkPermission(Unknown Source) > > at java.lang.SecurityManager.checkPermission(Unknown Source) > > at java.lang.SecurityManager.checkCreateClassLoader(Unknown Source) > > at java.lang.ClassLoader.(Unknown Source) > > at org.biojava.utils.bytecode.GeneratedClassLoader.( > > GeneratedClassLoader.java:29) > > at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) > > at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java > > :51) > > at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java > > :58) > > at org.biojava.bio.seq.FeatureFilter$OnlyChildren.( > > FeatureFilter.java:1270) > > ... 11 more > > > > It is noted that the applet is signed and can read files from client > > harddisk if other method is used. > > > > Do anyone have an idea how can I go about to fix this problem? > > > > Thank you very much > > > > Minh > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHhIfz4C5LeMEKA/QRAhZqAJ9k36tFYC7wdBt6eScgCn5MK9uVZwCeIVHU > R0e4dCpmpjJnHOrfjfw0wYc= > =WayD > -----END PGP SIGNATURE----- > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From jcope at cableone.net Wed Jan 9 10:59:16 2008 From: jcope at cableone.net (Jeff Cope) Date: Wed, 9 Jan 2008 08:59:16 -0700 Subject: [Biojava-dev] BioJava Development In-Reply-To: References: Message-ID: <000801c852d8$966eb370$6402a8c0@roadrunner> Hi, My name is Jeff Cope, and I'm currently working on a bioinformatics project for a professor at BSU. What we have so far is mostly in java, but with the data calculations taking place using functionality found in the BioPython library (Molecular Weight, Instability Index, Isoelectric Point, Aromaticity, GRAVY, etc...). Anywho, currently we are only looking at protein sequences, and thought that we could help you out on the BioJava project by seeing if that functionality could be added into your library instead... So I guess my question is, now that I'm signed up on the developers mail list, what would I need to do to get started (assuming you want my help), and what kind of programming ground rules do you have... I feel pretty comfortable in Java code, and if you would like to see an example of my source code, and some of the work I've done so far, you can find it here: Current project: http://trac.boisestate.edu/protcalc/ Source Code: http://trac.boisestate.edu/protcalc/src/ API docs: http://trac.boisestate.edu/protcalc/docs/ Thanks, Jeff Cope -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of biojava-dev-request at lists.open-bio.org Sent: Tuesday, January 08, 2008 1:52 AM To: biojava-dev at lists.open-bio.org Subject: biojava-dev Digest, Vol 59, Issue 2 Send biojava-dev mailing list submissions to biojava-dev at lists.open-bio.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.open-bio.org/mailman/listinfo/biojava-dev or, via email, send a message with subject or body 'help' to biojava-dev-request at lists.open-bio.org You can reach the person managing the list at biojava-dev-owner at lists.open-bio.org When replying, please edit your Subject line so it is more specific than "Re: Contents of biojava-dev digest..." Today's Topics: 1. Fwd: bioperl like blastparser (Michael Gang) 2. Re: Error while reading byte data for creating a Trace. (Richard Holland) 3. Re: Error while reading byte data for creating a Trace. (Andy Yates) 4. Re: bioperl like blastparser (Andreas Prlic) 5. Re: JUnit (Michael Heuer) 6. Re: JUnit (Mark Schreiber) 7. Re: JUnit (Michael Heuer) 8. read fasta file (Michael Gang) 9. Re: read fasta file (Richard Holland) ---------------------------------------------------------------------- Message: 1 Date: Mon, 7 Jan 2008 12:16:22 +0200 From: "Michael Gang" Subject: [Biojava-dev] Fwd: bioperl like blastparser To: biojava-dev at biojava.org Message-ID: <6994d82b0801070216q1df26e72ic131592048100f3f at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi Andreas, You are correct. The junit.jar library was missing in my ant_home. Eclipse wrote that it was running the tests, but did not run any. Now I corrected it and see that tests are failing. I ran the program BlastEcho.java manually on the blast test files and on the ncbi blast. Judging after manually curation it worked good but at wu_blast id did not parse the query length. The reason is that in wu_blast the query length line has just 8 spaces at the beginning instead of 9. So I corrected the line which identifies the querylength at org.biojava.bio.program.sax.BlastSaxParser line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { Now it works also on wu_blast. It would be now a good idea to update the blast tests regarding the number of arguments and see if the fail still. Thanks in advance, Michael On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: > Hi Michael, > > I just had a look at your patch for the query length. > Several of the unit tests are now failing at > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase .java:143) > > The problem is that most blast related unit tests extend the SSBindCase, > which expects a fixed number of attributes. With the new patch some of the > blast-flavors have the additional queryLength attribute. > > Could you have a look at the behaviour of the parser for some of the files > where the tests now fail? If you think the new behaviour of the > parser is correct, we can simply update the tests to accept the different > number of attributes. > > Thanks, > Andreas > > > -------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > ------------------------------ Message: 2 Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) From: "Richard Holland" Subject: Re: [Biojava-dev] Error while reading byte data for creating a Trace. To: "Andy Yates" Cc: biojava-l at biojava.org, biojava-dev at biojava.org, abhi232 at cc.gatech.edu Message-ID: <50442.80.42.95.78.1199707315.squirrel at webmail.ebi.ac.uk> Content-Type: text/plain;charset=iso-8859-1 This problem was resolved back in November. For some reason during the last couple of weeks the BioJava mailing list has been sending out occasional duplicate copies of emails sent several months ago! This was one of them. cheers, Richard On Mon, January 7, 2008 9:34 am, Andy Yates wrote: > Hi, > > As far as I am aware there isn't a problem with the current ABI parser > however if you could send a code snippit of reading in the byte array > & the stack trace of the index out of bounds exception that would be > most helpful > > Andy > > On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > >> Hi all, >> I am having a byte array which is having the data from an .ab1 >> file.The >> biojava library provides a class called as ABITrace which takes as >> input >> either a byte[] array , a file or a url.If i use the later >> parameters (the >> file or the url )the program works but if I pass the byte array to the >> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >> problem with the ABITrace class or how can I bypass this particular >> error. >> I am printing the length of the byte array and it comes to >> 144930...Can >> that cause a problem in my code? >> >> Thanks in advance. >> Abhinav >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland BioMart (http://www.biomart.org/) EMBL-EBI Hinxton, Cambridgeshire CB10 1SD, UK ------------------------------ Message: 3 Date: Mon, 7 Jan 2008 12:18:50 +0000 From: Andy Yates Subject: Re: [Biojava-dev] Error while reading byte data for creating a Trace. To: "Richard Holland" Cc: biojava-l at biojava.org, biojava-dev at biojava.org, abhi232 at cc.gatech.edu Message-ID: <065714BD-6D4F-4B5F-8AAE-E6C47C9405AB at ebi.ac.uk> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Oh for ... :). Thought I'd seen this one before Andy On 7 Jan 2008, at 12:01, Richard Holland wrote: > This problem was resolved back in November. For some reason during the > last couple of weeks the BioJava mailing list has been sending out > occasional duplicate copies of emails sent several months ago! This > was > one of them. > > cheers, > Richard > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: >> Hi, >> >> As far as I am aware there isn't a problem with the current ABI >> parser >> however if you could send a code snippit of reading in the byte array >> & the stack trace of the index out of bounds exception that would be >> most helpful >> >> Andy >> >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: >> >>> Hi all, >>> I am having a byte array which is having the data from an .ab1 >>> file.The >>> biojava library provides a class called as ABITrace which takes as >>> input >>> either a byte[] array , a file or a url.If i use the later >>> parameters (the >>> file or the url )the program works but if I pass the byte array to >>> the >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is >>> there a >>> problem with the ABITrace class or how can I bypass this particular >>> error. >>> I am printing the length of the byte array and it comes to >>> 144930...Can >>> that cause a problem in my code? >>> >>> Thanks in advance. >>> Abhinav >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > -- > Richard Holland > BioMart (http://www.biomart.org/) > EMBL-EBI > Hinxton, Cambridgeshire CB10 1SD, UK ------------------------------ Message: 4 Date: Mon, 7 Jan 2008 21:54:21 +0000 (GMT) From: Andreas Prlic Subject: Re: [Biojava-dev] bioperl like blastparser To: michaelgang at gmail.com Cc: biojava-dev at biojava.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Hi Michael, thanks for your patch, I commited it to the new svn repository and updated the unit tests to now either take 4 or 5 args. Andreas -------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------ Message: 5 Date: Mon, 7 Jan 2008 19:36:04 -0500 (EST) From: Michael Heuer Subject: Re: [Biojava-dev] JUnit To: Andreas Prlic Cc: biojava-dev at biojava.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII Andreas Prlic wrote: > > What do people think about adding the JUnit jar to the test directory > > of the biojava-live repository and make the appropriate changes to the > > ant classpath? This would make it easier for people to test the > > > > I would suggest to move all the jar files where we have dependencies > on into a common subdirectory. > e.g something called "libs" or "jars" Using maven would resolve all of these issues. Or alternatively, a maven build can create an ant build.xml that downloads its dependencies from the maven central repository http://maven.apache.org/plugins/maven-ant-plugin/ or there is Ivy for ant, which can be configured to use the maven central repository http://ant.apache.org/ivy/ The 'lib' directory doesn't really have a place any more. michael ------------------------------ Message: 6 Date: Tue, 8 Jan 2008 11:17:35 +0800 From: "Mark Schreiber" Subject: Re: [Biojava-dev] JUnit To: "Michael Heuer" Cc: biojava-dev at biojava.org Message-ID: <93b45ca50801071917t1cef45epf8772b4370ef3f97 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi - I have added the junit jar and modified the build.xml I will leave the decision about a lib directory etc for some more debate. - Mark On Jan 8, 2008 8:36 AM, Michael Heuer wrote: > Andreas Prlic wrote: > > > > What do people think about adding the JUnit jar to the test directory > > > of the biojava-live repository and make the appropriate changes to the > > > ant classpath? This would make it easier for people to test the > > > > > > > I would suggest to move all the jar files where we have dependencies > > on into a common subdirectory. > > e.g something called "libs" or "jars" > > Using maven would resolve all of these issues. > > Or alternatively, a maven build can create an ant build.xml that downloads > its dependencies from the maven central repository > > http://maven.apache.org/plugins/maven-ant-plugin/ > > or there is Ivy for ant, which can be configured to use the maven central > repository > > http://ant.apache.org/ivy/ > > The 'lib' directory doesn't really have a place any more. > > michael > > ------------------------------ Message: 7 Date: Mon, 7 Jan 2008 23:55:42 -0500 (EST) From: Michael Heuer Subject: Re: [Biojava-dev] JUnit To: Mark Schreiber Cc: biojava-dev at biojava.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII Mark Schreiber wrote: > I will leave the decision about a lib directory etc for some more debate. Now that we have a subversion repository in place, I would be happy to create a maven-based build out on branch for consideration at some point. Ideally this would happen after refactoring/cleanup/purge so I have less work to do. ;) michael ------------------------------ Message: 8 Date: Tue, 8 Jan 2008 10:23:56 +0200 From: "Michael Gang" Subject: [Biojava-dev] read fasta file To: biojava-dev at biojava.org Message-ID: <6994d82b0801080023y3cdcc005g57b08c6566c37445 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Dear All, I want to read a fasta file of dna (the accessions are internal to our company and may not be like the convention), make manipulations on it and write it to another file. When i take the example from the book "Biojava in Anger" it works fine, but I get warnings that the SeqIOTools type is deprecated. When using the RichSequence.IOTools package I have problems that when writing the fasta it changes the fasta header (it adds the lcl: prefix). I want that the fasta header will be in the output file like in the input file. Will the SeqIOTools type supported further ? If not, is there another way to solve the problem ? Thanks in advance, Michael ------------------------------ Message: 9 Date: Tue, 08 Jan 2008 08:51:32 +0000 From: Richard Holland Subject: Re: [Biojava-dev] read fasta file To: Michael Gang Cc: biojava-dev at biojava.org Message-ID: <47833994.7090608 at ebi.ac.uk> Content-Type: text/plain; charset=ISO-8859-1 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 SeqIOTools is deprecated - this means that it _may_ get dropped in a future release and so you can't rely on it being present in any future release. RichSequence.IOTools follows the FASTA format exactly, which requires a namespace prefix in the header, and it will change the existing header if it does not already meet the FASTA standard. There is currently no way to stop it from doing that, although you might want to raise a bug report so that it goes on our list of things to change. You can do that here: http://bugzilla.open-bio.org/enter_bug.cgi?product=BioJava cheers, Richard Michael Gang wrote: > Dear All, > > I want to read a fasta file of dna (the accessions are internal to our > company and may not be like the convention), make manipulations on it > and write it to another file. > When i take the example from the book "Biojava in Anger" it works > fine, but I get warnings that the SeqIOTools type is deprecated. > When using the RichSequence.IOTools package I have problems that when > writing the fasta it changes the fasta header (it adds the lcl: > prefix). > I want that the fasta header will be in the output file like in the input file. > Will the SeqIOTools type supported further ? > If not, is there another way to solve the problem ? > > Thanks in advance, > Michael > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHgzmT4C5LeMEKA/QRAoF8AJ9SLAMGvm7SpByOyfL1/7tUZ9NbZgCgjeTq FjmCDFlMygy68q1zkbpwX2o= =bTSb -----END PGP SIGNATURE----- ------------------------------ _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev End of biojava-dev Digest, Vol 59, Issue 2 ****************************************** From holland at ebi.ac.uk Wed Jan 9 12:09:13 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 09 Jan 2008 17:09:13 +0000 Subject: [Biojava-dev] BioJava Development In-Reply-To: <000801c852d8$966eb370$6402a8c0@roadrunner> References: <000801c852d8$966eb370$6402a8c0@roadrunner> Message-ID: <4784FFB9.1050402@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Jeff. Thanks for volunteering! Sounds like you've got a useful set of tools which we would definitely appreciate as new BioJava features. We're currently planning a Big Reorganisation for a new BioJava 3 from the ground-up. Details will be published some time in February I expect, if not then early March. See this Wiki for things that are currently being considered (most will make it into the plan, some may not, but I won't know until I've written it up and identified the conflicting areas): http://www.biojava.org/wiki/BioJava3_Proposal (also see the associated Discussion page for further comments) If I were you, I'd hang on until the final plan is published. It will contain everything you need to know on how to write modules for the new BioJava 3. However, if you're in a rush to get it into the current BioJava 2 release, then take a look round the JavaDocs to see what is present and what is not. You'll soon get a good idea of how things are organised and what features are absent. We also have a bugzilla page with some unresolved bugs - always a good starting point to learn how a system works and where the opportunities for development are! http://bugzilla.open-bio.org/buglist.cgi?product=BioJava&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED cheers, Richard Jeff Cope wrote: > Hi, > > My name is Jeff Cope, and I'm currently working on a bioinformatics > project for a professor at BSU. What we have so far is mostly in java, but > with the data calculations taking place using functionality found in the > BioPython library (Molecular Weight, Instability Index, Isoelectric Point, > Aromaticity, GRAVY, etc...). Anywho, currently we are only looking at > protein sequences, and thought that we could help you out on the BioJava > project by seeing if that functionality could be added into your library > instead... > > So I guess my question is, now that I'm signed up on the developers > mail list, what would I need to do to get started (assuming you want my > help), and what kind of programming ground rules do you have... > > I feel pretty comfortable in Java code, and if you would like to see > an example of my source code, and some of the work I've done so far, you can > find it here: > Current project: > http://trac.boisestate.edu/protcalc/ > Source Code: > http://trac.boisestate.edu/protcalc/src/ > API docs: > http://trac.boisestate.edu/protcalc/docs/ > > > Thanks, > Jeff Cope > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > biojava-dev-request at lists.open-bio.org > Sent: Tuesday, January 08, 2008 1:52 AM > To: biojava-dev at lists.open-bio.org > Subject: biojava-dev Digest, Vol 59, Issue 2 > > Send biojava-dev mailing list submissions to > biojava-dev at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biojava-dev > or, via email, send a message with subject or body 'help' to > biojava-dev-request at lists.open-bio.org > > You can reach the person managing the list at > biojava-dev-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of biojava-dev digest..." > > > Today's Topics: > > 1. Fwd: bioperl like blastparser (Michael Gang) > 2. Re: Error while reading byte data for creating a Trace. > (Richard Holland) > 3. Re: Error while reading byte data for creating a Trace. > (Andy Yates) > 4. Re: bioperl like blastparser (Andreas Prlic) > 5. Re: JUnit (Michael Heuer) > 6. Re: JUnit (Mark Schreiber) > 7. Re: JUnit (Michael Heuer) > 8. read fasta file (Michael Gang) > 9. Re: read fasta file (Richard Holland) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 7 Jan 2008 12:16:22 +0200 > From: "Michael Gang" > Subject: [Biojava-dev] Fwd: bioperl like blastparser > To: biojava-dev at biojava.org > Message-ID: > <6994d82b0801070216q1df26e72ic131592048100f3f at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi Andreas, > > You are correct. > The junit.jar library was missing in my ant_home. > Eclipse wrote that it was running the tests, but did not run any. > Now I corrected it and see that tests are failing. > I ran the program BlastEcho.java manually on the blast test files and > on the ncbi blast. > Judging after manually curation it worked good but at wu_blast id did > not parse the query length. > The reason is that in wu_blast the query length line has just 8 spaces > at the beginning instead of 9. > So I corrected the line which identifies the querylength at > org.biojava.bio.program.sax.BlastSaxParser > line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { > > Now it works also on wu_blast. > It would be now a good idea to update the blast tests regarding the > number of arguments and see if the fail still. > > Thanks in advance, > Michael > > > > On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: >> Hi Michael, >> >> I just had a look at your patch for the query length. >> Several of the unit tests are now failing at >> > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase > .java:143) >> The problem is that most blast related unit tests extend the SSBindCase, >> which expects a fixed number of attributes. With the new patch some of the >> blast-flavors have the additional queryLength attribute. >> >> Could you have a look at the behaviour of the parser for some of the files >> where the tests now fail? If you think the new behaviour of the >> parser is correct, we can simply update the tests to accept the different >> number of attributes. >> >> Thanks, >> Andreas >> >> >> -------------------------------------------------- >> >> Andreas Prlic Wellcome Trust Sanger Institute >> Hinxton, Cambridge CB10 1SA, UK >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> > > > ------------------------------ > > Message: 2 > Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) > From: "Richard Holland" > Subject: Re: [Biojava-dev] Error while reading byte data for creating > a Trace. > To: "Andy Yates" > Cc: biojava-l at biojava.org, biojava-dev at biojava.org, > abhi232 at cc.gatech.edu > Message-ID: <50442.80.42.95.78.1199707315.squirrel at webmail.ebi.ac.uk> > Content-Type: text/plain;charset=iso-8859-1 > > This problem was resolved back in November. For some reason during the > last couple of weeks the BioJava mailing list has been sending out > occasional duplicate copies of emails sent several months ago! This was > one of them. > > cheers, > Richard > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: >> Hi, >> >> As far as I am aware there isn't a problem with the current ABI parser >> however if you could send a code snippit of reading in the byte array >> & the stack trace of the index out of bounds exception that would be >> most helpful >> >> Andy >> >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: >> >>> Hi all, >>> I am having a byte array which is having the data from an .ab1 >>> file.The >>> biojava library provides a class called as ABITrace which takes as >>> input >>> either a byte[] array , a file or a url.If i use the later >>> parameters (the >>> file or the url )the program works but if I pass the byte array to the >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >>> problem with the ABITrace class or how can I bypass this particular >>> error. >>> I am printing the length of the byte array and it comes to >>> 144930...Can >>> that cause a problem in my code? >>> >>> Thanks in advance. >>> Abhinav >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhP+44C5LeMEKA/QRAroBAJ9oU/by7joNIkdpkOoEtPFzcP+6ZwCfcVYV YpEnZK4o2READMXOsaE9oMo= =QZDP -----END PGP SIGNATURE----- From minhduc.cao at gmail.com Wed Jan 9 14:47:28 2008 From: minhduc.cao at gmail.com (Minh Duc, Cao) Date: Thu, 10 Jan 2008 06:47:28 +1100 Subject: [Biojava-dev] Problem with read RichFormat file from an applet In-Reply-To: <93b45ca50801090050j40455c2bid1af46c277a62582@mail.gmail.com> References: <478487F4.3090209@ebi.ac.uk> <93b45ca50801090050j40455c2bid1af46c277a62582@mail.gmail.com> Message-ID: Hi, I figured out the problem. Once the biojava jar file is signed, the applet can read files without any problems. Many thanks to you both. Also thank Mark for your suggestions, I will certainly try WebStart out. Cheers Minh On Jan 9, 2008 7:50 PM, Mark Schreiber wrote: > Consulting a good book on the java security model may reveal a way > that you can modify the policy to allow this. > > However, I think you should give serious consideration to why you > would want to use an applet in any context. The technology has major > limitations and has been long since superceeded by either severlet or > other technologies for server side stuff or webstart for client side > apps distributed from a server. > > - Mark > > On Jan 9, 2008 4:38 PM, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hello. > > > > This is the root of your problem: > > > > Caused by: java.security.AccessControlException: access denied ( > > java.lang.RuntimePermission createClassLoader) > > at org.biojava.utils.bytecode.GeneratedClassLoader.( > > GeneratedClassLoader.java:29) > > at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java > :68) > > > > The applet runtime environment is not allowing BioJava to create a > > custom class loader. It's not to do with disk access at all > unfortunately. > > > > I don't know of a solution myself as I've not done much work with > applets. > > > > Does anyone else on this list have any suggestions? > > > > cheers, > > Richard > > > > > > > > Minh Duc, Cao wrote: > > > Hi, > > > > > > I used IOTools.readFastaDNA(in,null) to read Fasta file and, for a > stand > > > alone application, it works perfectly. However, when the code is > called from > > > an applet, the following exception is thrown > > > > > > Exception in thread "Thread-8" java.lang.ExceptionInInitializerError > > > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java > :1813) > > > at org.biojava.bio.seq.SimpleFeatureHolder.( > > > SimpleFeatureHolder.java:54) > > > at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature( > > > RichFeature.java:167) > > > at org.biojavax.bio.seq.io.RichSeqIOAdapter.( > RichSeqIOAdapter.java > > > :61) > > > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > > > SimpleRichSequenceBuilder.java:100) > > > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > > > SimpleRichSequenceBuilder.java:81) > > > at > > > > org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder > > > (SimpleRichSequenceBuilderFactory.java:68) > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > > RichStreamReader.java:109) > > > at org.biojavax.bio.seq.io.RichStreamReader.nextSequence( > > > RichStreamReader.java:92) > > > at dnaPlatform.function.ReadFormatFileFunction.guessFormat( > > > ReadFormatFileFunction.java:134) > > > at dnaPlatform.gui.RunFunction.run(MainPanel.java:929) > > > Caused by: java.security.AccessControlException: access denied ( > > > java.lang.RuntimePermission createClassLoader) > > > at java.security.AccessControlContext.checkPermission(Unknown > Source) > > > at java.security.AccessController.checkPermission(Unknown Source) > > > at java.lang.SecurityManager.checkPermission(Unknown Source) > > > at java.lang.SecurityManager.checkCreateClassLoader(Unknown > Source) > > > at java.lang.ClassLoader.(Unknown Source) > > > at org.biojava.utils.bytecode.GeneratedClassLoader.( > > > GeneratedClassLoader.java:29) > > > at org.biojava.utils.walker.WalkerFactory.( > WalkerFactory.java:68) > > > at org.biojava.utils.walker.WalkerFactory.getInstance( > WalkerFactory.java > > > :51) > > > at org.biojava.utils.walker.WalkerFactory.getInstance( > WalkerFactory.java > > > :58) > > > at org.biojava.bio.seq.FeatureFilter$OnlyChildren.( > > > FeatureFilter.java:1270) > > > ... 11 more > > > > > > It is noted that the applet is signed and can read files from client > > > harddisk if other method is used. > > > > > > Do anyone have an idea how can I go about to fix this problem? > > > > > > Thank you very much > > > > > > Minh > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > - -- > > Richard Holland (BioMart) > > EMBL EBI, Wellcome Trust Genome Campus, > > Hinxton, Cambridgeshire CB10 1SD, UK > > Tel. +44 (0)1223 494416 > > > > http://www.biomart.org/ > > http://www.biojava.org/ > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.2.2 (GNU/Linux) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > > > iD8DBQFHhIfz4C5LeMEKA/QRAhZqAJ9k36tFYC7wdBt6eScgCn5MK9uVZwCeIVHU > > R0e4dCpmpjJnHOrfjfw0wYc= > > =WayD > > -----END PGP SIGNATURE----- > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > From markjschreiber at gmail.com Thu Jan 10 02:04:00 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 10 Jan 2008 15:04:00 +0800 Subject: [Biojava-dev] BioJava Development In-Reply-To: <4784FFB9.1050402@ebi.ac.uk> References: <000801c852d8$966eb370$6402a8c0@roadrunner> <4784FFB9.1050402@ebi.ac.uk> Message-ID: <93b45ca50801092304p6c97d4adve40f1f83959b2c32@mail.gmail.com> Hi Jeff - I think the type of functionality you describe would be best placed in the proteomic package where there is already some stuff for calculating MW and PI. Probably the reorganisation for BJ3 won't mean much change for your code as these types of algorithms can usually be readily ported to any new system. - Mark On Jan 10, 2008 1:09 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Jeff. > > Thanks for volunteering! Sounds like you've got a useful set of tools > which we would definitely appreciate as new BioJava features. > > We're currently planning a Big Reorganisation for a new BioJava 3 from > the ground-up. Details will be published some time in February I expect, > if not then early March. See this Wiki for things that are currently > being considered (most will make it into the plan, some may not, but I > won't know until I've written it up and identified the conflicting areas): > > http://www.biojava.org/wiki/BioJava3_Proposal > > (also see the associated Discussion page for further comments) > > If I were you, I'd hang on until the final plan is published. It will > contain everything you need to know on how to write modules for the new > BioJava 3. > > However, if you're in a rush to get it into the current BioJava 2 > release, then take a look round the JavaDocs to see what is present and > what is not. You'll soon get a good idea of how things are organised and > what features are absent. We also have a bugzilla page with some > unresolved bugs - always a good starting point to learn how a system > works and where the opportunities for development are! > http://bugzilla.open-bio.org/buglist.cgi?product=BioJava&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED > > cheers, > Richard > > > Jeff Cope wrote: > > Hi, > > > > My name is Jeff Cope, and I'm currently working on a bioinformatics > > project for a professor at BSU. What we have so far is mostly in java, but > > with the data calculations taking place using functionality found in the > > BioPython library (Molecular Weight, Instability Index, Isoelectric Point, > > Aromaticity, GRAVY, etc...). Anywho, currently we are only looking at > > protein sequences, and thought that we could help you out on the BioJava > > project by seeing if that functionality could be added into your library > > instead... > > > > So I guess my question is, now that I'm signed up on the developers > > mail list, what would I need to do to get started (assuming you want my > > help), and what kind of programming ground rules do you have... > > > > I feel pretty comfortable in Java code, and if you would like to see > > an example of my source code, and some of the work I've done so far, you can > > find it here: > > Current project: > > http://trac.boisestate.edu/protcalc/ > > Source Code: > > http://trac.boisestate.edu/protcalc/src/ > > API docs: > > http://trac.boisestate.edu/protcalc/docs/ > > > > > > Thanks, > > Jeff Cope > > > > -----Original Message----- > > From: biojava-dev-bounces at lists.open-bio.org > > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > > biojava-dev-request at lists.open-bio.org > > Sent: Tuesday, January 08, 2008 1:52 AM > > To: biojava-dev at lists.open-bio.org > > Subject: biojava-dev Digest, Vol 59, Issue 2 > > > > Send biojava-dev mailing list submissions to > > biojava-dev at lists.open-bio.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > or, via email, send a message with subject or body 'help' to > > biojava-dev-request at lists.open-bio.org > > > > You can reach the person managing the list at > > biojava-dev-owner at lists.open-bio.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of biojava-dev digest..." > > > > > > Today's Topics: > > > > 1. Fwd: bioperl like blastparser (Michael Gang) > > 2. Re: Error while reading byte data for creating a Trace. > > (Richard Holland) > > 3. Re: Error while reading byte data for creating a Trace. > > (Andy Yates) > > 4. Re: bioperl like blastparser (Andreas Prlic) > > 5. Re: JUnit (Michael Heuer) > > 6. Re: JUnit (Mark Schreiber) > > 7. Re: JUnit (Michael Heuer) > > 8. read fasta file (Michael Gang) > > 9. Re: read fasta file (Richard Holland) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Mon, 7 Jan 2008 12:16:22 +0200 > > From: "Michael Gang" > > Subject: [Biojava-dev] Fwd: bioperl like blastparser > > To: biojava-dev at biojava.org > > Message-ID: > > <6994d82b0801070216q1df26e72ic131592048100f3f at mail.gmail.com> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Hi Andreas, > > > > You are correct. > > The junit.jar library was missing in my ant_home. > > Eclipse wrote that it was running the tests, but did not run any. > > Now I corrected it and see that tests are failing. > > I ran the program BlastEcho.java manually on the blast test files and > > on the ncbi blast. > > Judging after manually curation it worked good but at wu_blast id did > > not parse the query length. > > The reason is that in wu_blast the query length line has just 8 spaces > > at the beginning instead of 9. > > So I corrected the line which identifies the querylength at > > org.biojava.bio.program.sax.BlastSaxParser > > line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { > > > > Now it works also on wu_blast. > > It would be now a good idea to update the blast tests regarding the > > number of arguments and see if the fail still. > > > > Thanks in advance, > > Michael > > > > > > > > On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: > >> Hi Michael, > >> > >> I just had a look at your patch for the query length. > >> Several of the unit tests are now failing at > >> > > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase > > .java:143) > >> The problem is that most blast related unit tests extend the SSBindCase, > >> which expects a fixed number of attributes. With the new patch some of the > >> blast-flavors have the additional queryLength attribute. > >> > >> Could you have a look at the behaviour of the parser for some of the files > >> where the tests now fail? If you think the new behaviour of the > >> parser is correct, we can simply update the tests to accept the different > >> number of attributes. > >> > >> Thanks, > >> Andreas > >> > >> > >> -------------------------------------------------- > >> > >> Andreas Prlic Wellcome Trust Sanger Institute > >> Hinxton, Cambridge CB10 1SA, UK > >> > >> > >> > >> > >> -- > >> The Wellcome Trust Sanger Institute is operated by Genome Research > >> Limited, a charity registered in England with number 1021457 and a > >> company registered in England with number 2742969, whose registered > >> office is 215 Euston Road, London, NW1 2BE. > >> > > > > > > ------------------------------ > > > > Message: 2 > > Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) > > From: "Richard Holland" > > Subject: Re: [Biojava-dev] Error while reading byte data for creating > > a Trace. > > To: "Andy Yates" > > Cc: biojava-l at biojava.org, biojava-dev at biojava.org, > > abhi232 at cc.gatech.edu > > Message-ID: <50442.80.42.95.78.1199707315.squirrel at webmail.ebi.ac.uk> > > Content-Type: text/plain;charset=iso-8859-1 > > > > This problem was resolved back in November. For some reason during the > > last couple of weeks the BioJava mailing list has been sending out > > occasional duplicate copies of emails sent several months ago! This was > > one of them. > > > > cheers, > > Richard > > > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: > >> Hi, > >> > >> As far as I am aware there isn't a problem with the current ABI parser > >> however if you could send a code snippit of reading in the byte array > >> & the stack trace of the index out of bounds exception that would be > >> most helpful > >> > >> Andy > >> > >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > >> > >>> Hi all, > >>> I am having a byte array which is having the data from an .ab1 > >>> file.The > >>> biojava library provides a class called as ABITrace which takes as > >>> input > >>> either a byte[] array , a file or a url.If i use the later > >>> parameters (the > >>> file or the url )the program works but if I pass the byte array to the > >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a > >>> problem with the ABITrace class or how can I bypass this particular > >>> error. > >>> I am printing the length of the byte array and it comes to > >>> 144930...Can > >>> that cause a problem in my code? > >>> > >>> Thanks in advance. > >>> Abhinav > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > > > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHhP+44C5LeMEKA/QRAroBAJ9oU/by7joNIkdpkOoEtPFzcP+6ZwCfcVYV > YpEnZK4o2READMXOsaE9oMo= > =QZDP > > -----END PGP SIGNATURE----- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From michaelgang at gmail.com Thu Jan 10 07:44:49 2008 From: michaelgang at gmail.com (Michael Gang) Date: Thu, 10 Jan 2008 14:44:49 +0200 Subject: [Biojava-dev] problem with blast parser Message-ID: <6994d82b0801100444u653d412ara835691fe316ae2a@mail.gmail.com> Hi All, I've observed the following pronlem with the blast parser. When parsing a blast with more than one query, it skips a part of the queries. When I wanted to understand what the problem is I got to the following conclusion.(There are good chances that I am wrong) When the BlastSaxParser in the function hitsSectionReached calls the line: oHits.parse(oContents,poLine,"Database:");. It just get's back when it get to the line :Database= but this is in the middle of the next query (the query began with the line "BLASTX 2.2.16 [Mar-25-2007]". I also tested it with the blastecho program Did someone observed the same problem. Can someone help me in this issue ? Thanks in Advance, Michael From holland at ebi.ac.uk Thu Jan 10 07:50:21 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 10 Jan 2008 12:50:21 +0000 Subject: [Biojava-dev] problem with blast parser In-Reply-To: <6994d82b0801100444u653d412ara835691fe316ae2a@mail.gmail.com> References: <6994d82b0801100444u653d412ara835691fe316ae2a@mail.gmail.com> Message-ID: <4786148D.5010003@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello. This is related to your previous email about it not being able to read all data (e.g. query length). I believe Andreas Prlic (copied on this) is looking into this, and some other people might be as well. One of them will get back to you. cheers, Richard Michael Gang wrote: > Hi All, > I've observed the following pronlem with the blast parser. > When parsing a blast with more than one query, it skips a part of the queries. > When I wanted to understand what the problem is I got to the following > conclusion.(There are good chances that I am wrong) > When the BlastSaxParser in the function hitsSectionReached calls the > line: oHits.parse(oContents,poLine,"Database:");. > It just get's back when it get to the line :Database= but this is in > the middle of the next query (the query began with the line "BLASTX > 2.2.16 [Mar-25-2007]". > I also tested it with the blastecho program > Did someone observed the same problem. > Can someone help me in this issue ? > > Thanks in Advance, > Michael > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhhSM4C5LeMEKA/QRAvYuAJ9r2Tkx7DzSmgAZ6sfLEnFewmSfNgCfWpvf ZL5W/VHtWS6vDZe00Yc1MoY= =4Qbs -----END PGP SIGNATURE----- From ap3 at sanger.ac.uk Sun Jan 13 08:44:42 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Sun, 13 Jan 2008 13:44:42 +0000 Subject: [Biojava-dev] anonymous svn Message-ID: Hi, seems the anonymous svn access is making progress, but I need to do some admin work on the svn repository in the next hours: The decision from open-bio was to move the svn repository data store from berkely db to fsfs, which will make the replication of the developer repository onto the server that will provide the anonymous access much more smoothly. so checkout and commits will not work for 1 or 2 hours now, but should be fine again afterwards. >> http://svnbook.red-bean.com/en/1.4/svn- >> book.html#svn.reposadmin.basics.backends >> >> http://subversion.tigris.org/faq.html#bdb-fsfs-convert Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From bugzilla-daemon at portal.open-bio.org Sun Jan 13 22:31:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 13 Jan 2008 22:31:53 -0500 Subject: [Biojava-dev] [Bug 2434] New: bio.seq.io.UniProtFormat error? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2434 Summary: bio.seq.io.UniProtFormat error? Product: BioJava Version: 1.5 Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: seq.io AssignedTo: biojava-dev at biojava.org ReportedBy: lisujun at gmail.com Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at org.biojavax.bio.seq.io.RichStreamReader.nextSequence(RichStreamReader.java:92) at DeleteHighAbundance.getDescription(DeleteHighAbundance.java:41) at DeleteHighAbundance.main(DeleteHighAbundance.java:47) Caused by: org.biojava.bio.seq.io.ParseException: A Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.UniProtFormat Accession=null Id= Comments=Bad ID line Parse_block=ID IPI00000001.1 IPI; PRT; 577 AA. Stack trace follows .... at org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:286) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 3 more ======== when i used the RichSequence.IOTools.readUniprot to read the IPI DAT(Uniprot format), this error happened. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ap3 at sanger.ac.uk Mon Jan 14 12:26:30 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 14 Jan 2008 17:26:30 +0000 Subject: [Biojava-dev] biojava svn migration complete Message-ID: <1B7EE4F0-142D-47BC-8143-677627AAC1AC@sanger.ac.uk> Hi! The BioJava SVN migration has been completed. Thanks a lot to everyone who has made contributions to this!! The new anonymous checkout of BioJava is now possible via svn co svn://code.open-bio.org/biojava/biojava-live/trunk biojava-live Developers can obtain a checkout from svn co svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/ biojava-live/trunk/ ./biojava-live and it is possible to browse the repository online at http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/ trunk Also the automated builds have been updated http://www.spice-3d.org/cruise/ see http://biojava.org/wiki/CVS_to_SVN_Migration for more details. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jcope at cableone.net Tue Jan 15 11:27:38 2008 From: jcope at cableone.net (Jeff Cope) Date: Tue, 15 Jan 2008 09:27:38 -0700 Subject: [Biojava-dev] BioJava Development In-Reply-To: <93b45ca50801092304p6c97d4adve40f1f83959b2c32@mail.gmail.com> References: <000801c852d8$966eb370$6402a8c0@roadrunner> <4784FFB9.1050402@ebi.ac.uk> <93b45ca50801092304p6c97d4adve40f1f83959b2c32@mail.gmail.com> Message-ID: <000001c85793$8b8c83f0$6402a8c0@roadrunner> Hi Richard and Mark, Thanks for the quick response, and help in pointing out where the new classes should go. Here are the classes I wanted to propose and where I thought they should be. Due time constraints, this would probably be something that I could put into BioJava 2, and would be willing to translate to BioJava 3 when time came around. As Mark had suggested it appears that the org.biojava.bio.proteomics package would be the ideal area for the added classes, so I would be interpreting functionality for most of the classes from the BioPython library. AromaticityCalc.java - Returns the sum of the percentage of the AA sequence that is made up of the AA's Tyrosine, Tryptophan, and Phenylalanine. GRAVYCalc.java (Grand Average of Hydropathy) - Returns the average sum of hydropathy of the AA sequence InstabilityIndexCalc.java - Returns an estimated stability for a protein sequence (<= 40 is considered stable) AACompositionCalc.java - Breaks down protein sequences, and returns the percent that individual AA's make up the sequence. The functionality for Aromaticity, GRAVY, and Instability Index would all come from the biopython library (biopython-1.43/Bio/SeqUtils/ProtParam.py), and AAComposition is something that we worked together that was useful for our application. Most of the classes are pretty simple for the most part, I think the most difficult part would be figuring out how the strings of characters are translated to protein sequences in BioJava. Anyway, Let me know if this sounds like it would be useful... Thanks, Jeff -----Original Message----- From: Mark Schreiber [mailto:markjschreiber at gmail.com] Sent: Thursday, January 10, 2008 12:04 AM To: Richard Holland Cc: Jeff Cope; biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] BioJava Development Hi Jeff - I think the type of functionality you describe would be best placed in the proteomic package where there is already some stuff for calculating MW and PI. Probably the reorganisation for BJ3 won't mean much change for your code as these types of algorithms can usually be readily ported to any new system. - Mark On Jan 10, 2008 1:09 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Jeff. > > Thanks for volunteering! Sounds like you've got a useful set of tools > which we would definitely appreciate as new BioJava features. > > We're currently planning a Big Reorganisation for a new BioJava 3 from > the ground-up. Details will be published some time in February I expect, > if not then early March. See this Wiki for things that are currently > being considered (most will make it into the plan, some may not, but I > won't know until I've written it up and identified the conflicting areas): > > http://www.biojava.org/wiki/BioJava3_Proposal > > (also see the associated Discussion page for further comments) > > If I were you, I'd hang on until the final plan is published. It will > contain everything you need to know on how to write modules for the new > BioJava 3. > > However, if you're in a rush to get it into the current BioJava 2 > release, then take a look round the JavaDocs to see what is present and > what is not. You'll soon get a good idea of how things are organised and > what features are absent. We also have a bugzilla page with some > unresolved bugs - always a good starting point to learn how a system > works and where the opportunities for development are! > http://bugzilla.open-bio.org/buglist.cgi?product=BioJava&bug_status=NEW&bug_ status=ASSIGNED&bug_status=REOPENED > > cheers, > Richard > > > Jeff Cope wrote: > > Hi, > > > > My name is Jeff Cope, and I'm currently working on a bioinformatics > > project for a professor at BSU. What we have so far is mostly in java, but > > with the data calculations taking place using functionality found in the > > BioPython library (Molecular Weight, Instability Index, Isoelectric Point, > > Aromaticity, GRAVY, etc...). Anywho, currently we are only looking at > > protein sequences, and thought that we could help you out on the BioJava > > project by seeing if that functionality could be added into your library > > instead... > > > > So I guess my question is, now that I'm signed up on the developers > > mail list, what would I need to do to get started (assuming you want my > > help), and what kind of programming ground rules do you have... > > > > I feel pretty comfortable in Java code, and if you would like to see > > an example of my source code, and some of the work I've done so far, you can > > find it here: > > Current project: > > http://trac.boisestate.edu/protcalc/ > > Source Code: > > http://trac.boisestate.edu/protcalc/src/ > > API docs: > > http://trac.boisestate.edu/protcalc/docs/ > > > > > > Thanks, > > Jeff Cope > > > > -----Original Message----- > > From: biojava-dev-bounces at lists.open-bio.org > > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > > biojava-dev-request at lists.open-bio.org > > Sent: Tuesday, January 08, 2008 1:52 AM > > To: biojava-dev at lists.open-bio.org > > Subject: biojava-dev Digest, Vol 59, Issue 2 > > > > Send biojava-dev mailing list submissions to > > biojava-dev at lists.open-bio.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > or, via email, send a message with subject or body 'help' to > > biojava-dev-request at lists.open-bio.org > > > > You can reach the person managing the list at > > biojava-dev-owner at lists.open-bio.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of biojava-dev digest..." > > > > > > Today's Topics: > > > > 1. Fwd: bioperl like blastparser (Michael Gang) > > 2. Re: Error while reading byte data for creating a Trace. > > (Richard Holland) > > 3. Re: Error while reading byte data for creating a Trace. > > (Andy Yates) > > 4. Re: bioperl like blastparser (Andreas Prlic) > > 5. Re: JUnit (Michael Heuer) > > 6. Re: JUnit (Mark Schreiber) > > 7. Re: JUnit (Michael Heuer) > > 8. read fasta file (Michael Gang) > > 9. Re: read fasta file (Richard Holland) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Mon, 7 Jan 2008 12:16:22 +0200 > > From: "Michael Gang" > > Subject: [Biojava-dev] Fwd: bioperl like blastparser > > To: biojava-dev at biojava.org > > Message-ID: > > <6994d82b0801070216q1df26e72ic131592048100f3f at mail.gmail.com> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Hi Andreas, > > > > You are correct. > > The junit.jar library was missing in my ant_home. > > Eclipse wrote that it was running the tests, but did not run any. > > Now I corrected it and see that tests are failing. > > I ran the program BlastEcho.java manually on the blast test files and > > on the ncbi blast. > > Judging after manually curation it worked good but at wu_blast id did > > not parse the query length. > > The reason is that in wu_blast the query length line has just 8 spaces > > at the beginning instead of 9. > > So I corrected the line which identifies the querylength at > > org.biojava.bio.program.sax.BlastSaxParser > > line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { > > > > Now it works also on wu_blast. > > It would be now a good idea to update the blast tests regarding the > > number of arguments and see if the fail still. > > > > Thanks in advance, > > Michael > > > > > > > > On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: > >> Hi Michael, > >> > >> I just had a look at your patch for the query length. > >> Several of the unit tests are now failing at > >> > > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase > > .java:143) > >> The problem is that most blast related unit tests extend the SSBindCase, > >> which expects a fixed number of attributes. With the new patch some of the > >> blast-flavors have the additional queryLength attribute. > >> > >> Could you have a look at the behaviour of the parser for some of the files > >> where the tests now fail? If you think the new behaviour of the > >> parser is correct, we can simply update the tests to accept the different > >> number of attributes. > >> > >> Thanks, > >> Andreas > >> > >> > >> -------------------------------------------------- > >> > >> Andreas Prlic Wellcome Trust Sanger Institute > >> Hinxton, Cambridge CB10 1SA, UK > >> > >> > >> > >> > >> -- > >> The Wellcome Trust Sanger Institute is operated by Genome Research > >> Limited, a charity registered in England with number 1021457 and a > >> company registered in England with number 2742969, whose registered > >> office is 215 Euston Road, London, NW1 2BE. > >> > > > > > > ------------------------------ > > > > Message: 2 > > Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) > > From: "Richard Holland" > > Subject: Re: [Biojava-dev] Error while reading byte data for creating > > a Trace. > > To: "Andy Yates" > > Cc: biojava-l at biojava.org, biojava-dev at biojava.org, > > abhi232 at cc.gatech.edu > > Message-ID: <50442.80.42.95.78.1199707315.squirrel at webmail.ebi.ac.uk> > > Content-Type: text/plain;charset=iso-8859-1 > > > > This problem was resolved back in November. For some reason during the > > last couple of weeks the BioJava mailing list has been sending out > > occasional duplicate copies of emails sent several months ago! This was > > one of them. > > > > cheers, > > Richard > > > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: > >> Hi, > >> > >> As far as I am aware there isn't a problem with the current ABI parser > >> however if you could send a code snippit of reading in the byte array > >> & the stack trace of the index out of bounds exception that would be > >> most helpful > >> > >> Andy > >> > >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > >> > >>> Hi all, > >>> I am having a byte array which is having the data from an .ab1 > >>> file.The > >>> biojava library provides a class called as ABITrace which takes as > >>> input > >>> either a byte[] array , a file or a url.If i use the later > >>> parameters (the > >>> file or the url )the program works but if I pass the byte array to the > >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a > >>> problem with the ABITrace class or how can I bypass this particular > >>> error. > >>> I am printing the length of the byte array and it comes to > >>> 144930...Can > >>> that cause a problem in my code? > >>> > >>> Thanks in advance. > >>> Abhinav > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > > > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHhP+44C5LeMEKA/QRAroBAJ9oU/by7joNIkdpkOoEtPFzcP+6ZwCfcVYV > YpEnZK4o2READMXOsaE9oMo= > =QZDP > > -----END PGP SIGNATURE----- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From bugzilla-daemon at portal.open-bio.org Wed Jan 16 10:16:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jan 2008 10:16:30 -0500 Subject: [Biojava-dev] [Bug 2435] New: Mistake in createRecord( ) of GFF3Parser Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2435 Summary: Mistake in createRecord( ) of GFF3Parser Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: P3 Component: bio AssignedTo: biojava-dev at biojava.org ReportedBy: pudimat at gmail.com CC: pudimat at gmail.com When setting the fields in a new GFF3Record, the source field is set twice, however the second time it is set to the GFF type value. Thus, a record has its type in the source field, and the type field has value "any". See line: 256 in GFF3Parser -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 16 12:07:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jan 2008 12:07:36 -0500 Subject: [Biojava-dev] [Bug 2435] Mistake in createRecord( ) of GFF3Parser In-Reply-To: Message-ID: <200801161707.m0GH7a8C005347@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2435 ------- Comment #1 from pudimat at gmail.com 2008-01-16 12:07 EST ------- Another error in line 344 (method parseAttribute() ): when splitting the key-value-pair of an attribute, the separating '=' is the first symbol of the attribute value. Reason: attValList = attVal.substring(spaceIndex).trim(); must be changed to attValList = attVal.substring(spaceIndex+1).trim(); -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 17 07:36:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 17 Jan 2008 07:36:06 -0500 Subject: [Biojava-dev] [Bug 2435] Mistake in createRecord( ) of GFF3Parser In-Reply-To: Message-ID: <200801171236.m0HCa6eE030734@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2435 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from holland at ebi.ac.uk 2008-01-17 07:36 EST ------- I have fixed this in the new subversion repository. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 20 02:17:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jan 2008 02:17:42 -0500 Subject: [Biojava-dev] [Bug 2360] saving of ProfileHmm cause NullPointerException In-Reply-To: Message-ID: <200801200717.m0K7Hg0H009027@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2360 mark.schreiber at novartis.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mark.schreiber at novartis.com 2008-01-20 02:17 EST ------- Bug was caused by serialization of an untrained distribution with no weights. Now fixed with Unit test -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 20 02:26:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jan 2008 02:26:00 -0500 Subject: [Biojava-dev] [Bug 2371] ChromatogramFactory.create fails on Windows In-Reply-To: Message-ID: <200801200726.m0K7Q013009324@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2371 mark.schreiber at novartis.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WORKSFORME ------- Comment #1 from mark.schreiber at novartis.com 2008-01-20 02:25 EST ------- This works using biojava 1.6 RC-1 on Windows Vista. I can't replicate the error using your chromatogram. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 20 03:28:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jan 2008 03:28:33 -0500 Subject: [Biojava-dev] [Bug 2164] Restriction Mapper - Thread (or dual core cpu) problem In-Reply-To: Message-ID: <200801200828.m0K8SXrV011671@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2164 ------- Comment #10 from mark.schreiber at novartis.com 2008-01-20 03:28 EST ------- As suggested I have added back the synchronized blockas as they are certainly needed even though this doesn't fully solve the problem. Interestingly on windows vista on a dual core CPU a race condition develops and never seems to resolve (not even a stack trace)! Notably this was on a short sequence not one likely to be a PackedSymbolList as mentioned below. I wonder if there is a problem with the SimpleThreadPool. Maybe a switch to a normal Java thread pool might be better? (In reply to comment #9) > Here are the comments at the top of org.biojava.bio.symbol.PackedSymbolList, > and I quote: > "WARNING: these variables constitute an opportunity > for things to go wrong when doing multithreaded access > via symbolAt(). Keep SymbolAt() synchronized so they > don't get changed during a lookup! Naaasssty." -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 21 04:47:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jan 2008 04:47:37 -0500 Subject: [Biojava-dev] [Bug 2164] Restriction Mapper - Thread (or dual core cpu) problem In-Reply-To: Message-ID: <200801210947.m0L9lbY5028968@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2164 ------- Comment #11 from andyyatz at gmail.com 2008-01-21 04:47 EST ------- If we're looking at Java5+ features here then maybe something like: http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/locks/ReadWriteLock.html This is a far superior solution to synchronized blocks as they offer a difference between reading something & altering it. The blocks can last for as little or as long as required with us only needing to make sure that we perform the code in a try {} finally {} block to ensure we do not continually lock out a resource. (In reply to comment #10) > As suggested I have added back the synchronized blockas as they are certainly > needed even though this doesn't fully solve the problem. Interestingly on > windows vista on a dual core CPU a race condition develops and never seems to > resolve (not even a stack trace)! Notably this was on a short sequence not one > likely to be a PackedSymbolList as mentioned below. > > I wonder if there is a problem with the SimpleThreadPool. Maybe a switch to a > normal Java thread pool might be better? > > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 21 14:15:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jan 2008 14:15:24 -0500 Subject: [Biojava-dev] [Bug 2164] Restriction Mapper - Thread (or dual core cpu) problem In-Reply-To: Message-ID: <200801211915.m0LJFOcP031316@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2164 ------- Comment #12 from gwaldon at geneinfinity.org 2008-01-21 14:15 EST ------- I have seen a similar problem (at least producing similar log) and it was solved by adding pool.stopThreads() at the end of your code: SequenceIterator iter = SeqIOTools.readFastaDNA(br); SimpleThreadPool pool = new SimpleThreadPool(); RestrictionMapper mapper = new RestrictionMapper(pool); mapper.addEnzyme(RestrictionEnzymeManager.getEnzyme("MseI")); mapper.addEnzyme(RestrictionEnzymeManager.getEnzyme("HpaII")); mapper.addEnzyme(RestrictionEnzymeManager.getEnzyme("AluI")); Sequence seq; while(iter.hasNext()){ seq = iter.nextSequence(); mapper.annotate(seq); } pool.stopThreads(); Hope it helps. - George -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ap3 at sanger.ac.uk Tue Jan 22 03:25:49 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 22 Jan 2008 08:25:49 +0000 Subject: [Biojava-dev] biojava looking for maintainers Message-ID: <20765D52-2C0E-40C6-A769-7D13CF1DB489@sanger.ac.uk> Hi, BioJava is a widely used Java library that provides standard APIs, parsers, and solutions for common bioinformatics problems. It is used in a number of applications and referenced in many scientific publications. See here for an overview of these: http://biojava.org/wiki/ BioJava:BioJavaInside In order to continue our first class efforts to serve the community and to further the quality of our source code we are looking for motivated individuals who want to claim responsibility for some of the core libraries and take over maintenance of these. Please see here for a list of modules for which we are currently looking for maintainers: http://biojava.org/wiki/Maintainers_wanted If you want to become a BioJava maintainer, please post to the biojava-dev mailing list. As always - happy biojavaing, Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From felipe.albrecht at gmail.com Wed Jan 23 01:58:04 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Wed, 23 Jan 2008 04:58:04 -0200 Subject: [Biojava-dev] Pairwise Alignment methods Message-ID: Hello all, I have a simple question about pairwise alignment classes (SmithWaterman and NeedlemanWunsch): Why it is necessary two Sequence for alignment and not two SymbolList? Example, I have a SymbolList collection to align between then, by this way I need to create some "dummies" Sequence for to do the alignment. Reading the source, I saw that the unique field that is exclusive to Sequence is the name, for the alignment output, but if I need only the alignment result, it is useless. It is not possible to override the pairwiseAlignment to accept SymbolList or may be a new method that the parameters are 2 SymbolList and returns the alignment score? Thank you Felipe Albrecht From markjschreiber at gmail.com Wed Jan 23 03:50:05 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 23 Jan 2008 16:50:05 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: Message-ID: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> Hi Felipe - I agree this is a barrier to ease of use. Even if Sequences are required internally for some obscure reason there is no reason why dummy Sequences cannot be made inside the aligner. These sequences could be given names like 'query' and 'subject' or even 'seq1' and 'seq2'. I will take a look at adding some methods. Best regards, - Mark On Jan 23, 2008 2:58 PM, Felipe Albrecht wrote: > Hello all, > > I have a simple question about pairwise alignment classes (SmithWaterman and > NeedlemanWunsch): > Why it is necessary two Sequence for alignment and not two SymbolList? > > Example, I have a SymbolList collection to align between then, > by this way I need to create some "dummies" Sequence for to do the > alignment. > > Reading the source, I saw that the unique field that is exclusive to > Sequence is the name, for the alignment output, > but if I need only the alignment result, it is useless. > > It is not possible to override the pairwiseAlignment to accept SymbolList or > may be a new method that the parameters are 2 SymbolList and returns the > alignment score? > > Thank you > > Felipe Albrecht > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From dankoc at gmail.com Wed Jan 23 16:56:19 2008 From: dankoc at gmail.com (Charles Danko) Date: Wed, 23 Jan 2008 16:56:19 -0500 Subject: [Biojava-dev] Direct access to public genome databases Message-ID: <8adccabf0801231356o16c51b55s43b3637459277f66@mail.gmail.com> Hello, Direct access to public genome databases (i.e. a class to import sequence, annotations, etc. and create the applicable biojava object) would be a very useful addition to BioJava. The Ensj project doesn't look like it has been updated since official support was dropped. Are there any plans to work these features into BioJava? Have I missed features that already exist? Depending on the amount of time required, I may be willing to contribute to such an endeavor -- particularly for the purpose of importing sequence. I have quite a bit of experience working with java, but not much in a collaborative environment. Best, Charles From heuermh at acm.org Wed Jan 23 20:06:53 2008 From: heuermh at acm.org (Michael Heuer) Date: Wed, 23 Jan 2008 20:06:53 -0500 (EST) Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: <8adccabf0801231356o16c51b55s43b3637459277f66@mail.gmail.com> Message-ID: Charles Danko wrote: > Hello, > > Direct access to public genome databases (i.e. a class to import > sequence, annotations, etc. and create the applicable biojava object) > would be a very useful addition to BioJava. The Ensj project doesn't > look like it has been updated since official support was dropped. Are > there any plans to work these features into BioJava? Have I missed > features that already exist? > > Depending on the amount of time required, I may be willing to > contribute to such an endeavor -- particularly for the purpose of > importing sequence. I have quite a bit of experience working with > java, but not much in a collaborative environment. What sort of client API would you have in mind? I'm not a Taverna expert, but it seems to me that access to third-party data resources is already well covered with the web services available there. > http://taverna.sourceforge.net/ > http://www.mygrid.org.uk/wiki/Mygrid/BiologicalWebServices Or simply call the web services directly. michael From markjschreiber at gmail.com Wed Jan 23 23:00:27 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 23 Jan 2008 23:00:27 -0500 Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: References: <8adccabf0801231356o16c51b55s43b3637459277f66@mail.gmail.com> Message-ID: <93b45ca50801232000p1cfd83c0o47f7b714abfd2d01@mail.gmail.com> Hi - >From personal experience you can access data from NCBI and KEGG using their webservice API's and (depending on the result) use the biojava parsers to return biojava objects. The question is, should this activity be wrapped into biojava (ie should the webservice stuff be inside biojava)? Pros: Happens behind the scenes, users don't need to know about WS, possible uniform interface to several sources Cons: Lots more dependencies on WS jars (take a look at JAX-WS for example) and WS client jars I'm interested in hearing more pros and cons from other people. This is timely given the upcoming webservices meet up in Tokyo. Best regards, - Mark On Jan 23, 2008 8:06 PM, Michael Heuer wrote: > Charles Danko wrote: > > > Hello, > > > > Direct access to public genome databases (i.e. a class to import > > sequence, annotations, etc. and create the applicable biojava object) > > would be a very useful addition to BioJava. The Ensj project doesn't > > look like it has been updated since official support was dropped. Are > > there any plans to work these features into BioJava? Have I missed > > features that already exist? > > > > Depending on the amount of time required, I may be willing to > > contribute to such an endeavor -- particularly for the purpose of > > importing sequence. I have quite a bit of experience working with > > java, but not much in a collaborative environment. > > What sort of client API would you have in mind? > > > I'm not a Taverna expert, but it seems to me that access to third-party > data resources is already well covered with the web services available > there. > > > http://taverna.sourceforge.net/ > > http://www.mygrid.org.uk/wiki/Mygrid/BiologicalWebServices > > Or simply call the web services directly. > > michael > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Thu Jan 24 02:50:32 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 24 Jan 2008 15:50:32 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> Message-ID: <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> Hi Felipe - Thanks for the input on this. As a general rule the GC should never be called from code. Generally this degrades performance of the JVM. Unless there is a very good reason I will remove this. Probably you are right a method parameter may work better. - Mark On Jan 24, 2008 1:47 PM, Felipe Albrecht wrote: > Hello, > > I think that it can be solved by a simple way: > Implement (or just copy and cut) a pairwiseAlignment utilizing SymboList as > parameters and do no creating a alignment, just the calculating it and > returning the value. > > Another thing that is a bit stange for me, is the utilization of garbage > collector direcly, that is: The field "scoreMatrix" is a class field, why at > the end of pairwiseAlignment it is set to null and the garbage collector > run? It is not better (and simpler) to use scoreMatrix as method variable? > > I'm annexing the class code with my changes that is doing well the (4^8) * > (4^8) SymbolList pairwise alignments that I am needing :-) > > Thank you, > > Felipe Albrecht > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber wrote: > > Hi Felipe - > > > > I agree this is a barrier to ease of use. Even if Sequences are > > required internally for some obscure reason there is no reason why > > dummy Sequences cannot be made inside the aligner. These sequences > > could be given names like 'query' and 'subject' or even 'seq1' and > > 'seq2'. > > > > I will take a look at adding some methods. > > > > Best regards, > > > > - Mark > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > wrote: > > > Hello all, > > > > > > I have a simple question about pairwise alignment classes (SmithWaterman > and > > > NeedlemanWunsch): > > > Why it is necessary two Sequence for alignment and not two SymbolList? > > > > > > Example, I have a SymbolList collection to align between then, > > > by this way I need to create some "dummies" Sequence for to do the > > > alignment. > > > > > > Reading the source, I saw that the unique field that is exclusive to > > > Sequence is the name, for the alignment output, > > > but if I need only the alignment result, it is useless. > > > > > > It is not possible to override the pairwiseAlignment to accept > SymbolList or > > > may be a new method that the parameters are 2 SymbolList and returns the > > > alignment score? > > > > > > Thank you > > > > > > Felipe Albrecht > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > From ap3 at sanger.ac.uk Thu Jan 24 03:40:39 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu, 24 Jan 2008 08:40:39 +0000 (GMT) Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: References: Message-ID: Hi, >> Direct access to public genome databases (i.e. a class to import >> sequence, annotations, etc. and create the applicable biojava object) >> would be a very useful addition to BioJava. The Ensj project doesn't >> look like it has been updated since official support was dropped. Are >> there any plans to work these features into BioJava? Have I missed >> features that already exist? >> >> Depending on the amount of time required, I may be willing to >> contribute to such an endeavor -- particularly for the purpose of >> importing sequence. I have quite a bit of experience working with >> java, but not much in a collaborative environment. > Ensembl provides access to more and more of its data via DAS, the Distributed Annotation System. DAS is a RESTful protocol to access data from distributed sites over the internet. http://www.ensembl.org/info/using/external_data/das/ensembl_das.html it is quite heavily used and to see a list of available DAS services see http://www.dasregistry.org Andreas -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From bugzilla-daemon at portal.open-bio.org Thu Jan 24 07:22:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 24 Jan 2008 07:22:41 -0500 Subject: [Biojava-dev] [Bug 2164] Restriction Mapper - Thread (or dual core cpu) problem In-Reply-To: Message-ID: <200801241222.m0OCMfZ2026239@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2164 ------- Comment #13 from mark.schreiber at novartis.com 2008-01-24 07:22 EST ------- Using the stopThread() method prevents the race condition. The other option is to make the threads deamon threads which terminate when all other threads are finished. I have tested this on a dual core machine with windows vista on a 1 million base pair sequence and there is no problem. Unless someone can determine this bug remains on other operating systems I will close the bug. Please use the current version of biojava from SVN for testing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From felipe.albrecht at gmail.com Thu Jan 24 08:05:39 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Thu, 24 Jan 2008 11:05:39 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> Message-ID: If you prefer, I can send a diff and should I do the same thing in SequenceAlignment and NeedlemanWunsch classes? Thank you, Felipe Albrecht On Jan 24, 2008 5:50 AM, Mark Schreiber wrote: > Hi Felipe - > > Thanks for the input on this. As a general rule the GC should never be > called from code. Generally this degrades performance of the JVM. > Unless there is a very good reason I will remove this. Probably you > are right a method parameter may work better. > > - Mark > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > wrote: > > Hello, > > > > I think that it can be solved by a simple way: > > Implement (or just copy and cut) a pairwiseAlignment utilizing SymboList > as > > parameters and do no creating a alignment, just the calculating it and > > returning the value. > > > > Another thing that is a bit stange for me, is the utilization of garbage > > collector direcly, that is: The field "scoreMatrix" is a class field, > why at > > the end of pairwiseAlignment it is set to null and the garbage collector > > run? It is not better (and simpler) to use scoreMatrix as method > variable? > > > > I'm annexing the class code with my changes that is doing well the (4^8) > * > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > Thank you, > > > > Felipe Albrecht > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber > wrote: > > > Hi Felipe - > > > > > > I agree this is a barrier to ease of use. Even if Sequences are > > > required internally for some obscure reason there is no reason why > > > dummy Sequences cannot be made inside the aligner. These sequences > > > could be given names like 'query' and 'subject' or even 'seq1' and > > > 'seq2'. > > > > > > I will take a look at adding some methods. > > > > > > Best regards, > > > > > > - Mark > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > wrote: > > > > Hello all, > > > > > > > > I have a simple question about pairwise alignment classes > (SmithWaterman > > and > > > > NeedlemanWunsch): > > > > Why it is necessary two Sequence for alignment and not two > SymbolList? > > > > > > > > Example, I have a SymbolList collection to align between then, > > > > by this way I need to create some "dummies" Sequence for to do the > > > > alignment. > > > > > > > > Reading the source, I saw that the unique field that is exclusive to > > > > Sequence is the name, for the alignment output, > > > > but if I need only the alignment result, it is useless. > > > > > > > > It is not possible to override the pairwiseAlignment to accept > > SymbolList or > > > > may be a new method that the parameters are 2 SymbolList and returns > the > > > > alignment score? > > > > > > > > Thank you > > > > > > > > Felipe Albrecht > > > > _______________________________________________ > > > > biojava-dev mailing list > > > > biojava-dev at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > From markjschreiber at gmail.com Thu Jan 24 08:35:43 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 24 Jan 2008 21:35:43 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> Message-ID: <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> Hi - I have just commited changes that let you use SymbolLists in all parts of the NW and SW SequenceAlignment objects. As you suggested I made the matrix a method local variable. I also removed calls to the garbage collector. This can be checked out from SVN. - Mark On Jan 24, 2008 9:05 PM, Felipe Albrecht wrote: > If you prefer, I can send a diff and should I do the same thing in > SequenceAlignment and NeedlemanWunsch classes? > > Thank you, > > Felipe Albrecht > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com> wrote: > > Hi Felipe - > > > > Thanks for the input on this. As a general rule the GC should never be > > called from code. Generally this degrades performance of the JVM. > > Unless there is a very good reason I will remove this. Probably you > > are right a method parameter may work better. > > > > - Mark > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > wrote: > > > Hello, > > > > > > > > > > > > I think that it can be solved by a simple way: > > > Implement (or just copy and cut) a pairwiseAlignment utilizing SymboList > as > > > parameters and do no creating a alignment, just the calculating it and > > > returning the value. > > > > > > Another thing that is a bit stange for me, is the utilization of garbage > > > collector direcly, that is: The field "scoreMatrix" is a class field, > why at > > > the end of pairwiseAlignment it is set to null and the garbage collector > > > run? It is not better (and simpler) to use scoreMatrix as method > variable? > > > > > > I'm annexing the class code with my changes that is doing well the (4^8) > * > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > Thank you, > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber > wrote: > > > > Hi Felipe - > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences are > > > > required internally for some obscure reason there is no reason why > > > > dummy Sequences cannot be made inside the aligner. These sequences > > > > could be given names like 'query' and 'subject' or even 'seq1' and > > > > 'seq2'. > > > > > > > > I will take a look at adding some methods. > > > > > > > > Best regards, > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > wrote: > > > > > Hello all, > > > > > > > > > > I have a simple question about pairwise alignment classes > (SmithWaterman > > > and > > > > > NeedlemanWunsch): > > > > > Why it is necessary two Sequence for alignment and not two > SymbolList? > > > > > > > > > > Example, I have a SymbolList collection to align between then, > > > > > by this way I need to create some "dummies" Sequence for to do the > > > > > alignment. > > > > > > > > > > Reading the source, I saw that the unique field that is exclusive to > > > > > Sequence is the name, for the alignment output, > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > It is not possible to override the pairwiseAlignment to accept > > > SymbolList or > > > > > may be a new method that the parameters are 2 SymbolList and returns > the > > > > > alignment score? > > > > > > > > > > Thank you > > > > > > > > > > Felipe Albrecht > > > > > _______________________________________________ > > > > > biojava-dev mailing list > > > > > biojava-dev at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > From felipe.albrecht at gmail.com Thu Jan 24 14:40:48 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Thu, 24 Jan 2008 17:40:48 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> Message-ID: Hello, I saw the commit and I think that this solution is not the better. I think it because you are creating internally two Sequence and probably the programmer will not use others alignment information, he will use only the score. Because it, I think that if you have 2 SymbolList, just do the alignment and return the score, as I did.Otherwise, If the programmer want the "visual alignment", he should create externally the SimpleSequences, it is, not the method must do it. IMHO, one [serious] problem in biojava is the memory consumption, it have not "lightweight" classes or methods that do the things quickly. Because it, may be is a good choice to have a method that simply gives the alignment score, and not do the others things, like backtracking. Another think, the cost of the "instanceof" is high. Thank you, Felipe Albrecht On Jan 24, 2008 11:35 AM, Mark Schreiber wrote: > Hi - > > I have just commited changes that let you use SymbolLists in all parts > of the NW and SW SequenceAlignment objects. > > As you suggested I made the matrix a method local variable. I also > removed calls to the garbage collector. > > This can be checked out from SVN. > > - Mark > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > wrote: > > If you prefer, I can send a diff and should I do the same thing in > > SequenceAlignment and NeedlemanWunsch classes? > > > > Thank you, > > > > Felipe Albrecht > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com> > wrote: > > > Hi Felipe - > > > > > > Thanks for the input on this. As a general rule the GC should never be > > > called from code. Generally this degrades performance of the JVM. > > > Unless there is a very good reason I will remove this. Probably you > > > are right a method parameter may work better. > > > > > > - Mark > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > wrote: > > > > Hello, > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > Implement (or just copy and cut) a pairwiseAlignment utilizing > SymboList > > as > > > > parameters and do no creating a alignment, just the calculating it > and > > > > returning the value. > > > > > > > > Another thing that is a bit stange for me, is the utilization of > garbage > > > > collector direcly, that is: The field "scoreMatrix" is a class > field, > > why at > > > > the end of pairwiseAlignment it is set to null and the garbage > collector > > > > run? It is not better (and simpler) to use scoreMatrix as method > > variable? > > > > > > > > I'm annexing the class code with my changes that is doing well the > (4^8) > > * > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > Thank you, > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber > > wrote: > > > > > Hi Felipe - > > > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences are > > > > > required internally for some obscure reason there is no reason why > > > > > dummy Sequences cannot be made inside the aligner. These > sequences > > > > > could be given names like 'query' and 'subject' or even 'seq1' and > > > > > 'seq2'. > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > Best regards, > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht < > felipe.albrecht at gmail.com> > > > > wrote: > > > > > > Hello all, > > > > > > > > > > > > I have a simple question about pairwise alignment classes > > (SmithWaterman > > > > and > > > > > > NeedlemanWunsch): > > > > > > Why it is necessary two Sequence for alignment and not two > > SymbolList? > > > > > > > > > > > > Example, I have a SymbolList collection to align between then, > > > > > > by this way I need to create some "dummies" Sequence for to do > the > > > > > > alignment. > > > > > > > > > > > > Reading the source, I saw that the unique field that is > exclusive to > > > > > > Sequence is the name, for the alignment output, > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to accept > > > > SymbolList or > > > > > > may be a new method that the parameters are 2 SymbolList and > returns > > the > > > > > > alignment score? > > > > > > > > > > > > Thank you > > > > > > > > > > > > Felipe Albrecht > > > > > > _______________________________________________ > > > > > > biojava-dev mailing list > > > > > > biojava-dev at lists.open-bio.org > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > From dankoc at gmail.com Thu Jan 24 15:17:38 2008 From: dankoc at gmail.com (Charles Danko) Date: Thu, 24 Jan 2008 15:17:38 -0500 Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: References: Message-ID: <8adccabf0801241217gfa82d47r2728b08c6bcde862@mail.gmail.com> DAS looks just wonderful, and I am very glad to be made aware of it ? it seems like a much better solution than my initial, highly naive reaction (accessing public SQL connections). As I understand it, the easiest way to access DAS services in Java is via an API such as JAX-WS? Jumping into the Javadocs, and looking over a JAX-WS tutorial that I found here: http://java.sun.com/webservices/docs/2.0/tutorial/doc/ it looks there is a lot to this. In this sense, a BioJava class that that takes care of much of the connecting, data transfer, and parsing would be a welcome convenience for users not already familiar with this API (like me :). Even for those who are well-versed in all of this, such a class would allow DB access with a lot less code. Given the frequency that most of us accesses these public databases, this seems like a worthy goal to me!! Since its so easy to understand the basics of constructing a DAS request URL, even something as simple as a class that takes a pre-formed URL (in the constructor) and acts as an iterator over whatever information is in the result, would be very useful. Best, Charles On Jan 24, 2008 3:40 AM, Andreas Prlic wrote: > Hi, > > >> Direct access to public genome databases (i.e. a class to import > >> sequence, annotations, etc. and create the applicable biojava object) > >> would be a very useful addition to BioJava. The Ensj project doesn't > >> look like it has been updated since official support was dropped. Are > >> there any plans to work these features into BioJava? Have I missed > >> features that already exist? > >> > >> Depending on the amount of time required, I may be willing to > >> contribute to such an endeavor -- particularly for the purpose of > >> importing sequence. I have quite a bit of experience working with > >> java, but not much in a collaborative environment. > > > > Ensembl provides access to more and more of its data via DAS, > the Distributed Annotation System. DAS is a RESTful protocol to > access data from distributed sites over the internet. > > http://www.ensembl.org/info/using/external_data/das/ensembl_das.html > > it is quite heavily used and to see a list of available DAS services > see > http://www.dasregistry.org > > Andreas > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > From markjschreiber at gmail.com Thu Jan 24 20:26:41 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 25 Jan 2008 09:26:41 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> Message-ID: <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> Hi Felipe - I agree your method is more efficient but I think it violates the SequenceAlignment interface which would cause compatibility problems. I also wonder what should happen if a user calls the getAlignment() method if you have only calculated a score. instanceof is potentially expensive but it is nothing compared to actually performing the SmithWaterman. Biojava is somewhat memory heavy but this is largely because it is object oriented. Certainly something in C would be lighter and faster but the whole point in using Java is the relative benefits of object oriented design. While ultra optimized algorithms where once a major feature of bioinformatics this is becoming less necessary as standard desktops are now equivalent to the super computers of 5 years ago. I actually find the SW and NW to be reasonably fast. This is because all the heavy lifting is done in loops that the JVM presumably compiles and executes natively. - Mark On Jan 25, 2008 3:40 AM, Felipe Albrecht wrote: > Hello, > > I saw the commit and I think that this solution is not the better. > I think it because you are creating internally two Sequence and probably the > programmer will not use others alignment information, he will use only the > score. > > Because it, I think that if you have 2 SymbolList, just do the alignment and > return the score, as I did.Otherwise, If the programmer want the "visual > alignment", he should create externally the SimpleSequences, it is, not the > method must do it. > > IMHO, one [serious] problem in biojava is the memory consumption, it have > not "lightweight" classes or methods that do the things quickly. Because it, > may be is a good choice to have a method that simply gives the alignment > score, and not do the others things, like backtracking. Another think, the > cost of the "instanceof" is high. > > Thank you, > > Felipe Albrecht > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber wrote: > > Hi - > > > > I have just commited changes that let you use SymbolLists in all parts > > of the NW and SW SequenceAlignment objects. > > > > As you suggested I made the matrix a method local variable. I also > > removed calls to the garbage collector. > > > > This can be checked out from SVN. > > > > - Mark > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > wrote: > > > If you prefer, I can send a diff and should I do the same thing in > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > Thank you, > > > > > > Felipe Albrecht > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com> > wrote: > > > > Hi Felipe - > > > > > > > > Thanks for the input on this. As a general rule the GC should never be > > > > called from code. Generally this degrades performance of the JVM. > > > > Unless there is a very good reason I will remove this. Probably you > > > > are right a method parameter may work better. > > > > > > > > - Mark > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > wrote: > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > Implement (or just copy and cut) a pairwiseAlignment utilizing > SymboList > > > as > > > > > parameters and do no creating a alignment, just the calculating it > and > > > > > returning the value. > > > > > > > > > > Another thing that is a bit stange for me, is the utilization of > garbage > > > > > collector direcly, that is: The field "scoreMatrix" is a class > field, > > > why at > > > > > the end of pairwiseAlignment it is set to null and the garbage > collector > > > > > run? It is not better (and simpler) to use scoreMatrix as method > > > variable? > > > > > > > > > > I'm annexing the class code with my changes that is doing well the > (4^8) > > > * > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > Thank you, > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < markjschreiber at gmail.com > > > > > wrote: > > > > > > Hi Felipe - > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences are > > > > > > required internally for some obscure reason there is no reason why > > > > > > dummy Sequences cannot be made inside the aligner. These > sequences > > > > > > could be given names like 'query' and 'subject' or even 'seq1' and > > > > > > 'seq2'. > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > > wrote: > > > > > > > Hello all, > > > > > > > > > > > > > > I have a simple question about pairwise alignment classes > > > (SmithWaterman > > > > > and > > > > > > > NeedlemanWunsch): > > > > > > > Why it is necessary two Sequence for alignment and not two > > > SymbolList? > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between then, > > > > > > > by this way I need to create some "dummies" Sequence for to do > the > > > > > > > alignment. > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > exclusive to > > > > > > > Sequence is the name, for the alignment output, > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to accept > > > > > SymbolList or > > > > > > > may be a new method that the parameters are 2 SymbolList and > returns > > > the > > > > > > > alignment score? > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > _______________________________________________ > > > > > > > biojava-dev mailing list > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From felipe.albrecht at gmail.com Thu Jan 24 21:06:53 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Fri, 25 Jan 2008 00:06:53 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> Message-ID: Hi, is not possible to add into the SequenceAlignment interface something like: "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList symbolList2)"? Okay, the name is horrible, but you know what it means. > While ultra optimized algorithms where once a major > feature of bioinformatics this is becoming less necessary as standard > desktops are now equivalent to the super computers of 5 years ago. Okay, but do not forget that the bioinformatics data size is growing faster than the computer processing and main memory capacity. What im trying to say is that the actual methods are fast [and light] enough for do 1, 10, 100, 1000 pairwise alignments, but not for 10k, 100k or in my case, 65k * 65k. Really, I dont see problems of having optimized functions for specifics operations, as unix phylosofies: "do small programs for specifics things, for big things join then" (Something like it :-) ). Thank you, Felipe Albrecht On Jan 24, 2008 11:26 PM, Mark Schreiber wrote: > Hi Felipe - > > I agree your method is more efficient but I think it violates the > SequenceAlignment interface which would cause compatibility problems. > I also wonder what should happen if a user calls the getAlignment() > method if you have only calculated a score. > > instanceof is potentially expensive but it is nothing compared to > actually performing the SmithWaterman. > > Biojava is somewhat memory heavy but this is largely because it is > object oriented. Certainly something in C would be lighter and faster > but the whole point in using Java is the relative benefits of object > oriented design. While ultra optimized algorithms where once a major > feature of bioinformatics this is becoming less necessary as standard > desktops are now equivalent to the super computers of 5 years ago. > > I actually find the SW and NW to be reasonably fast. This is because > all the heavy lifting is done in loops that the JVM presumably > compiles and executes natively. > > - Mark > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht > wrote: > > Hello, > > > > I saw the commit and I think that this solution is not the better. > > I think it because you are creating internally two Sequence and probably > the > > programmer will not use others alignment information, he will use only > the > > score. > > > > Because it, I think that if you have 2 SymbolList, just do the alignment > and > > return the score, as I did.Otherwise, If the programmer want the "visual > > alignment", he should create externally the SimpleSequences, it is, not > the > > method must do it. > > > > IMHO, one [serious] problem in biojava is the memory consumption, it > have > > not "lightweight" classes or methods that do the things quickly. Because > it, > > may be is a good choice to have a method that simply gives the alignment > > score, and not do the others things, like backtracking. Another think, > the > > cost of the "instanceof" is high. > > > > Thank you, > > > > Felipe Albrecht > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber > wrote: > > > Hi - > > > > > > I have just commited changes that let you use SymbolLists in all parts > > > of the NW and SW SequenceAlignment objects. > > > > > > As you suggested I made the matrix a method local variable. I also > > > removed calls to the garbage collector. > > > > > > This can be checked out from SVN. > > > > > > - Mark > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > > wrote: > > > > If you prefer, I can send a diff and should I do the same thing in > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > Thank you, > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com> > > wrote: > > > > > Hi Felipe - > > > > > > > > > > Thanks for the input on this. As a general rule the GC should > never be > > > > > called from code. Generally this degrades performance of the JVM. > > > > > Unless there is a very good reason I will remove this. Probably > you > > > > > are right a method parameter may work better. > > > > > > > > > > - Mark > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht < > felipe.albrecht at gmail.com> > > > > wrote: > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > Implement (or just copy and cut) a pairwiseAlignment utilizing > > SymboList > > > > as > > > > > > parameters and do no creating a alignment, just the calculating > it > > and > > > > > > returning the value. > > > > > > > > > > > > Another thing that is a bit stange for me, is the utilization of > > garbage > > > > > > collector direcly, that is: The field "scoreMatrix" is a class > > field, > > > > why at > > > > > > the end of pairwiseAlignment it is set to null and the garbage > > collector > > > > > > run? It is not better (and simpler) to use scoreMatrix as method > > > > variable? > > > > > > > > > > > > I'm annexing the class code with my changes that is doing well > the > > (4^8) > > > > * > > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > > > Thank you, > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > > > wrote: > > > > > > > Hi Felipe - > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences > are > > > > > > > required internally for some obscure reason there is no reason > why > > > > > > > dummy Sequences cannot be made inside the aligner. These > > sequences > > > > > > > could be given names like 'query' and 'subject' or even 'seq1' > and > > > > > > > 'seq2'. > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > > > > wrote: > > > > > > > > Hello all, > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment classes > > > > (SmithWaterman > > > > > > and > > > > > > > > NeedlemanWunsch): > > > > > > > > Why it is necessary two Sequence for alignment and not two > > > > SymbolList? > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between > then, > > > > > > > > by this way I need to create some "dummies" Sequence for to > do > > the > > > > > > > > alignment. > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > > exclusive to > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to > accept > > > > > > SymbolList or > > > > > > > > may be a new method that the parameters are 2 SymbolList and > > returns > > > > the > > > > > > > > alignment score? > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > _______________________________________________ > > > > > > > > biojava-dev mailing list > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From markjschreiber at gmail.com Thu Jan 24 22:43:30 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 25 Jan 2008 11:43:30 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> Message-ID: <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> On Jan 25, 2008 10:06 AM, Felipe Albrecht wrote: > Hi, > > is not possible to add into the SequenceAlignment interface something like: > "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList > symbolList2)"? > Okay, the name is horrible, but you know what it means. > You could but you would break backwards compatibility with anyone who implemented this interface previously. We have sometimes done this in biojava but we would need to make sure it will break no ones code. Another option would be to extend the interface with another that adds this method (not very tidy I know). > > > While ultra optimized algorithms where once a major > > feature of bioinformatics this is becoming less necessary as standard > > desktops are now equivalent to the super computers of 5 years ago. > > Okay, but do not forget that the bioinformatics data size is growing faster > than the computer processing and main memory capacity. > > What im trying to say is that the actual methods are fast [and light] enough > for do 1, 10, 100, 1000 pairwise alignments, but not for 10k, 100k or in my > case, 65k * 65k. One could also argue that Smith Waterman is not ideal for large sequences. I think it is o(NM) or something. > Really, I dont see problems of having optimized functions for specifics > operations, as unix phylosofies: "do small programs for specifics things, > for big things join then" (Something like it :-) ). > Yes, this would be an argument for workflow or service oriented architecture built from multiple inter operable biojava sub-projects. This is obviously not what biojava is. Indeed biojava is not even an application you just use it to build applications. Maybe for your use case you could use biojava to handle the I/O and then do the more efficient SW using your own code. BioJava is a collection of objects that are (somewhat) related and interoperable. It doesn't mean you have to use biojava throughout your application. - Mark > On Jan 24, 2008 11:26 PM, Mark Schreiber wrote: > > Hi Felipe - > > > > I agree your method is more efficient but I think it violates the > > SequenceAlignment interface which would cause compatibility problems. > > I also wonder what should happen if a user calls the getAlignment() > > method if you have only calculated a score. > > > > instanceof is potentially expensive but it is nothing compared to > > actually performing the SmithWaterman. > > > > Biojava is somewhat memory heavy but this is largely because it is > > object oriented. Certainly something in C would be lighter and faster > > but the whole point in using Java is the relative benefits of object > > oriented design. While ultra optimized algorithms where once a major > > feature of bioinformatics this is becoming less necessary as standard > > desktops are now equivalent to the super computers of 5 years ago. > > > > I actually find the SW and NW to be reasonably fast. This is because > > all the heavy lifting is done in loops that the JVM presumably > > compiles and executes natively. > > > > - Mark > > > > > > > > > > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht > wrote: > > > Hello, > > > > > > I saw the commit and I think that this solution is not the better. > > > I think it because you are creating internally two Sequence and probably > the > > > programmer will not use others alignment information, he will use only > the > > > score. > > > > > > Because it, I think that if you have 2 SymbolList, just do the alignment > and > > > return the score, as I did.Otherwise, If the programmer want the "visual > > > alignment", he should create externally the SimpleSequences, it is, not > the > > > method must do it. > > > > > > IMHO, one [serious] problem in biojava is the memory consumption, it > have > > > not "lightweight" classes or methods that do the things quickly. Because > it, > > > may be is a good choice to have a method that simply gives the alignment > > > score, and not do the others things, like backtracking. Another think, > the > > > cost of the "instanceof" is high. > > > > > > Thank you, > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber > wrote: > > > > Hi - > > > > > > > > I have just commited changes that let you use SymbolLists in all parts > > > > of the NW and SW SequenceAlignment objects. > > > > > > > > As you suggested I made the matrix a method local variable. I also > > > > removed calls to the garbage collector. > > > > > > > > This can be checked out from SVN. > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > > > wrote: > > > > > If you prefer, I can send a diff and should I do the same thing in > > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > > > Thank you, > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com > > > > wrote: > > > > > > Hi Felipe - > > > > > > > > > > > > Thanks for the input on this. As a general rule the GC should > never be > > > > > > called from code. Generally this degrades performance of the JVM. > > > > > > Unless there is a very good reason I will remove this. Probably > you > > > > > > are right a method parameter may work better. > > > > > > > > > > > > - Mark > > > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > > > > wrote: > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > > Implement (or just copy and cut) a pairwiseAlignment utilizing > > > SymboList > > > > > as > > > > > > > parameters and do no creating a alignment, just the calculating > it > > > and > > > > > > > returning the value. > > > > > > > > > > > > > > Another thing that is a bit stange for me, is the utilization of > > > garbage > > > > > > > collector direcly, that is: The field "scoreMatrix" is a class > > > field, > > > > > why at > > > > > > > the end of pairwiseAlignment it is set to null and the garbage > > > collector > > > > > > > run? It is not better (and simpler) to use scoreMatrix as method > > > > > variable? > > > > > > > > > > > > > > I'm annexing the class code with my changes that is doing well > the > > > (4^8) > > > > > * > > > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > > > > > wrote: > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences > are > > > > > > > > required internally for some obscure reason there is no reason > why > > > > > > > > dummy Sequences cannot be made inside the aligner. These > > > sequences > > > > > > > > could be given names like 'query' and 'subject' or even 'seq1' > and > > > > > > > > 'seq2'. > > > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > > > > > > wrote: > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment classes > > > > > (SmithWaterman > > > > > > > and > > > > > > > > > NeedlemanWunsch): > > > > > > > > > Why it is necessary two Sequence for alignment and not two > > > > > SymbolList? > > > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between > then, > > > > > > > > > by this way I need to create some "dummies" Sequence for to > do > > > the > > > > > > > > > alignment. > > > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > > > exclusive to > > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to > accept > > > > > > > SymbolList or > > > > > > > > > may be a new method that the parameters are 2 SymbolList and > > > returns > > > > > the > > > > > > > > > alignment score? > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > _______________________________________________ > > > > > > > > > biojava-dev mailing list > > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From felipe.albrecht at gmail.com Thu Jan 24 23:25:23 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Fri, 25 Jan 2008 02:25:23 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> Message-ID: Hello again :-) On Jan 25, 2008 1:43 AM, Mark Schreiber wrote: > On Jan 25, 2008 10:06 AM, Felipe Albrecht > wrote: > > Hi, > > > > is not possible to add into the SequenceAlignment interface something > like: > > "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList > > symbolList2)"? > > Okay, the name is horrible, but you know what it means. > > > > You could but you would break backwards compatibility with anyone who > implemented this interface previously. We have sometimes done this in > biojava but we would need to make sure it will break no ones code. > Another option would be to extend the interface with another that adds > this method (not very tidy I know). If I'm correct, the SequenceAlignment is an abstract class, so, we can define there with an empty implementation, and SmithWaterman and others classes implements it. Anyone that implemented SequenceAlignment will not see anything different. > > > > > > While ultra optimized algorithms where once a major > > > feature of bioinformatics this is becoming less necessary as standard > > > desktops are now equivalent to the super computers of 5 years ago. > > > > Okay, but do not forget that the bioinformatics data size is growing > faster > > than the computer processing and main memory capacity. > > > > What im trying to say is that the actual methods are fast [and light] > enough > > for do 1, 10, 100, 1000 pairwise alignments, but not for 10k, 100k or in > my > > case, 65k * 65k. > > One could also argue that Smith Waterman is not ideal for large > sequences. I think it is o(NM) or something. I'm not comparing two sequences with 65k * 65k bases, but doing the alignment of 65k little sequences between then. > > > > Really, I dont see problems of having optimized functions for specifics > > operations, as unix phylosofies: "do small programs for specifics > things, > > for big things join then" (Something like it :-) ). > > > > Yes, this would be an argument for workflow or service oriented > architecture built from multiple inter operable biojava sub-projects. > This is obviously not what biojava is. Indeed biojava is not even an > application you just use it to build applications. Maybe for your use > case you could use biojava to handle the I/O and then do the more > efficient SW using your own code. BioJava is a collection of objects > that are (somewhat) related and interoperable. It doesn't mean you > have to use biojava throughout your application. Okay, now I understood, biojava is not a library for bioinformatics applications, but for interconnect bioinformatics applications. So, biojava in the actual way is not appropriate for the application that I am developing. I will develop some "optimized" classes and functions for my use and when it will be ready I will announce in this mailing list and ask if want to merge in biojava. If biojava team needs somebody to improve some biojava functions, specially sequences and sequences IO, can ask me. Thank you Felipe Albrecht > > > - Mark > > > On Jan 24, 2008 11:26 PM, Mark Schreiber > wrote: > > > Hi Felipe - > > > > > > I agree your method is more efficient but I think it violates the > > > SequenceAlignment interface which would cause compatibility problems. > > > I also wonder what should happen if a user calls the getAlignment() > > > method if you have only calculated a score. > > > > > > instanceof is potentially expensive but it is nothing compared to > > > actually performing the SmithWaterman. > > > > > > Biojava is somewhat memory heavy but this is largely because it is > > > object oriented. Certainly something in C would be lighter and faster > > > but the whole point in using Java is the relative benefits of object > > > oriented design. While ultra optimized algorithms where once a major > > > feature of bioinformatics this is becoming less necessary as standard > > > desktops are now equivalent to the super computers of 5 years ago. > > > > > > I actually find the SW and NW to be reasonably fast. This is because > > > all the heavy lifting is done in loops that the JVM presumably > > > compiles and executes natively. > > > > > > - Mark > > > > > > > > > > > > > > > > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht > > wrote: > > > > Hello, > > > > > > > > I saw the commit and I think that this solution is not the better. > > > > I think it because you are creating internally two Sequence and > probably > > the > > > > programmer will not use others alignment information, he will use > only > > the > > > > score. > > > > > > > > Because it, I think that if you have 2 SymbolList, just do the > alignment > > and > > > > return the score, as I did.Otherwise, If the programmer want the > "visual > > > > alignment", he should create externally the SimpleSequences, it is, > not > > the > > > > method must do it. > > > > > > > > IMHO, one [serious] problem in biojava is the memory consumption, it > > > have > > > > not "lightweight" classes or methods that do the things quickly. > Because > > it, > > > > may be is a good choice to have a method that simply gives the > alignment > > > > score, and not do the others things, like backtracking. Another > think, > > the > > > > cost of the "instanceof" is high. > > > > > > > > Thank you, > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber > > wrote: > > > > > Hi - > > > > > > > > > > I have just commited changes that let you use SymbolLists in all > parts > > > > > of the NW and SW SequenceAlignment objects. > > > > > > > > > > As you suggested I made the matrix a method local variable. I also > > > > > removed calls to the garbage collector. > > > > > > > > > > This can be checked out from SVN. > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > > > > wrote: > > > > > > If you prefer, I can send a diff and should I do the same thing > in > > > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > > > > > Thank you, > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > wrote: > > > > > > > Hi Felipe - > > > > > > > > > > > > > > Thanks for the input on this. As a general rule the GC should > > never be > > > > > > > called from code. Generally this degrades performance of the > JVM. > > > > > > > Unless there is a very good reason I will remove this. > Probably > > you > > > > > > > are right a method parameter may work better. > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > > > > > > wrote: > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > > > Implement (or just copy and cut) a pairwiseAlignment > utilizing > > > > SymboList > > > > > > as > > > > > > > > parameters and do no creating a alignment, just the > calculating > > it > > > > and > > > > > > > > returning the value. > > > > > > > > > > > > > > > > Another thing that is a bit stange for me, is the > utilization of > > > > garbage > > > > > > > > collector direcly, that is: The field "scoreMatrix" is a > class > > > > field, > > > > > > why at > > > > > > > > the end of pairwiseAlignment it is set to null and the > garbage > > > > collector > > > > > > > > run? It is not better (and simpler) to use scoreMatrix as > method > > > > > > variable? > > > > > > > > > > > > > > > > I'm annexing the class code with my changes that is doing > well > > the > > > > (4^8) > > > > > > * > > > > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > > markjschreiber at gmail.com > > > > > > > > > > > wrote: > > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if > Sequences > > are > > > > > > > > > required internally for some obscure reason there is no > reason > > why > > > > > > > > > dummy Sequences cannot be made inside the aligner. These > > > > sequences > > > > > > > > > could be given names like 'query' and 'subject' or even > 'seq1' > > and > > > > > > > > > 'seq2'. > > > > > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > < felipe.albrecht at gmail.com > > > > > > > > > wrote: > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment > classes > > > > > > (SmithWaterman > > > > > > > > and > > > > > > > > > > NeedlemanWunsch): > > > > > > > > > > Why it is necessary two Sequence for alignment and not > two > > > > > > SymbolList? > > > > > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between > > then, > > > > > > > > > > by this way I need to create some "dummies" Sequence > for to > > do > > > > the > > > > > > > > > > alignment. > > > > > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > > > > exclusive to > > > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to > > accept > > > > > > > > SymbolList or > > > > > > > > > > may be a new method that the parameters are 2 SymbolList > and > > > > returns > > > > > > the > > > > > > > > > > alignment score? > > > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > _______________________________________________ > > > > > > > > > > biojava-dev mailing list > > > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From markjschreiber at gmail.com Fri Jan 25 00:40:20 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 25 Jan 2008 13:40:20 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> Message-ID: <93b45ca50801242140j299b91e7td9fb295a7393ced9@mail.gmail.com> On Jan 25, 2008 12:25 PM, Felipe Albrecht wrote: > Hello again :-) > > > On Jan 25, 2008 1:43 AM, Mark Schreiber wrote: > > > > On Jan 25, 2008 10:06 AM, Felipe Albrecht > wrote: > > > Hi, > > > > > > is not possible to add into the SequenceAlignment interface something > like: > > > "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList > > > symbolList2)"? > > > Okay, the name is horrible, but you know what it means. > > > > If I'm correct, the SequenceAlignment is an abstract class, so, we can > define there with an empty implementation, and SmithWaterman and others > classes implements it. Anyone that implemented SequenceAlignment will not > see anything different. OK in that case adding the method would be OK, even desirable. Probably this would be the best way to merge in your code. > Okay, now I understood, biojava is not a library for bioinformatics > applications, but for interconnect bioinformatics applications. So, biojava Actually it is a library for bioinformatics that you use to build bioinformatics applications. It is possibly not as loosely coupled as you might like for your purpose. It is definitely not as loosely coupled as the Unix collection of executables or an SOA system. Due to heavy use of interfaces and abstract classes there is some possibility for custom code. For example you can recode the SmithWaterman object to be optimal for your needs and then create an application where you use your class in place of the normal biojava SmithWaterman. > in the actual way is not appropriate for the application that I am > developing. I will develop some "optimized" classes and functions for my use > and when it will be ready I will announce in this mailing list and ask if > want to merge in biojava. If biojava team needs somebody to improve some > biojava functions, specially sequences and sequences IO, can ask me. Code improvements and optimizations are always welcome especially if current interfaces can be preserved (that way the end user gets the improvement without having to change their code). I always advise potential optimizers to use a profiler because it is sometimes hard to predict how the JVM will behave, for example JIT compiling may mean parts of code that are theoretically CPU intensive may not be the CPU bottleneck when the JVM compiles them. - Mark > > Thank you > > Felipe Albrecht > > > > > > > > > > > > > > > > > - Mark > > > > > On Jan 24, 2008 11:26 PM, Mark Schreiber > wrote: > > > > Hi Felipe - > > > > > > > > I agree your method is more efficient but I think it violates the > > > > SequenceAlignment interface which would cause compatibility problems. > > > > I also wonder what should happen if a user calls the getAlignment() > > > > method if you have only calculated a score. > > > > > > > > instanceof is potentially expensive but it is nothing compared to > > > > actually performing the SmithWaterman. > > > > > > > > Biojava is somewhat memory heavy but this is largely because it is > > > > object oriented. Certainly something in C would be lighter and faster > > > > but the whole point in using Java is the relative benefits of object > > > > oriented design. While ultra optimized algorithms where once a major > > > > feature of bioinformatics this is becoming less necessary as standard > > > > desktops are now equivalent to the super computers of 5 years ago. > > > > > > > > I actually find the SW and NW to be reasonably fast. This is because > > > > all the heavy lifting is done in loops that the JVM presumably > > > > compiles and executes natively. > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht > > > wrote: > > > > > Hello, > > > > > > > > > > I saw the commit and I think that this solution is not the better. > > > > > I think it because you are creating internally two Sequence and > probably > > > the > > > > > programmer will not use others alignment information, he will use > only > > > the > > > > > score. > > > > > > > > > > Because it, I think that if you have 2 SymbolList, just do the > alignment > > > and > > > > > return the score, as I did.Otherwise, If the programmer want the > "visual > > > > > alignment", he should create externally the SimpleSequences, it is, > not > > > the > > > > > method must do it. > > > > > > > > > > IMHO, one [serious] problem in biojava is the memory consumption, it > > > have > > > > > not "lightweight" classes or methods that do the things quickly. > Because > > > it, > > > > > may be is a good choice to have a method that simply gives the > alignment > > > > > score, and not do the others things, like backtracking. Another > think, > > > the > > > > > cost of the "instanceof" is high. > > > > > > > > > > Thank you, > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber < markjschreiber at gmail.com > > > > > wrote: > > > > > > Hi - > > > > > > > > > > > > I have just commited changes that let you use SymbolLists in all > parts > > > > > > of the NW and SW SequenceAlignment objects. > > > > > > > > > > > > As you suggested I made the matrix a method local variable. I also > > > > > > removed calls to the garbage collector. > > > > > > > > > > > > This can be checked out from SVN. > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht < > felipe.albrecht at gmail.com > > > > > > wrote: > > > > > > > If you prefer, I can send a diff and should I do the same thing > in > > > > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > > wrote: > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > Thanks for the input on this. As a general rule the GC should > > > never be > > > > > > > > called from code. Generally this degrades performance of the > JVM. > > > > > > > > Unless there is a very good reason I will remove this. > Probably > > > you > > > > > > > > are right a method parameter may work better. > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > > > > > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > > > > Implement (or just copy and cut) a pairwiseAlignment > utilizing > > > > > SymboList > > > > > > > as > > > > > > > > > parameters and do no creating a alignment, just the > calculating > > > it > > > > > and > > > > > > > > > returning the value. > > > > > > > > > > > > > > > > > > Another thing that is a bit stange for me, is the > utilization of > > > > > garbage > > > > > > > > > collector direcly, that is: The field "scoreMatrix" is a > class > > > > > field, > > > > > > > why at > > > > > > > > > the end of pairwiseAlignment it is set to null and the > garbage > > > > > collector > > > > > > > > > run? It is not better (and simpler) to use scoreMatrix as > method > > > > > > > variable? > > > > > > > > > > > > > > > > > > I'm annexing the class code with my changes that is doing > well > > > the > > > > > (4^8) > > > > > > > * > > > > > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > > > markjschreiber at gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if > Sequences > > > are > > > > > > > > > > required internally for some obscure reason there is no > reason > > > why > > > > > > > > > > dummy Sequences cannot be made inside the aligner. These > > > > > sequences > > > > > > > > > > could be given names like 'query' and 'subject' or even > 'seq1' > > > and > > > > > > > > > > 'seq2'. > > > > > > > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > < felipe.albrecht at gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment > classes > > > > > > > (SmithWaterman > > > > > > > > > and > > > > > > > > > > > NeedlemanWunsch): > > > > > > > > > > > Why it is necessary two Sequence for alignment and not > two > > > > > > > SymbolList? > > > > > > > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between > > > then, > > > > > > > > > > > by this way I need to create some "dummies" Sequence > for to > > > do > > > > > the > > > > > > > > > > > alignment. > > > > > > > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > > > > > exclusive to > > > > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to > > > accept > > > > > > > > > SymbolList or > > > > > > > > > > > may be a new method that the parameters are 2 SymbolList > and > > > > > returns > > > > > > > the > > > > > > > > > > > alignment score? > > > > > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > biojava-dev mailing list > > > > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From felipe.albrecht at gmail.com Fri Jan 25 01:39:45 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Fri, 25 Jan 2008 04:39:45 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801242140j299b91e7td9fb295a7393ced9@mail.gmail.com> References: <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> <93b45ca50801242140j299b91e7td9fb295a7393ced9@mail.gmail.com> Message-ID: Okay, I agree with what you said. I was looking the SequenceAlignment source and I realize a strange thing. At formatOutput method, the editDistance is multiplied by -1, if you use a NeedlemanWunsch pairwiseAlignment method, the editDistance is returned without any multiplication. That is, the score/editDistance of formatOutput is different from there that is given by NeedlemanWunsch pairwiseAlignment. What is the correct? Thank you again Felipe Albrecht On Jan 25, 2008 3:40 AM, Mark Schreiber wrote: > On Jan 25, 2008 12:25 PM, Felipe Albrecht > wrote: > > Hello again :-) > > > > > > On Jan 25, 2008 1:43 AM, Mark Schreiber > wrote: > > > > > > On Jan 25, 2008 10:06 AM, Felipe Albrecht > > wrote: > > > > Hi, > > > > > > > > is not possible to add into the SequenceAlignment interface > something > > like: > > > > "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList > > > > symbolList2)"? > > > > Okay, the name is horrible, but you know what it means. > > > > > > > If I'm correct, the SequenceAlignment is an abstract class, so, we can > > define there with an empty implementation, and SmithWaterman and others > > classes implements it. Anyone that implemented SequenceAlignment will > not > > see anything different. > > OK in that case adding the method would be OK, even desirable. > Probably this would be the best way to merge in your code. > > > Okay, now I understood, biojava is not a library for bioinformatics > > applications, but for interconnect bioinformatics applications. So, > biojava > > Actually it is a library for bioinformatics that you use to build > bioinformatics applications. It is possibly not as loosely coupled as > you might like for your purpose. It is definitely not as loosely > coupled as the Unix collection of executables or an SOA system. Due > to heavy use of interfaces and abstract classes there is some > possibility for custom code. For example you can recode the > SmithWaterman object to be optimal for your needs and then create an > application where you use your class in place of the normal biojava > SmithWaterman. > > > in the actual way is not appropriate for the application that I am > > developing. I will develop some "optimized" classes and functions for my > use > > and when it will be ready I will announce in this mailing list and ask > if > > want to merge in biojava. If biojava team needs somebody to improve some > > biojava functions, specially sequences and sequences IO, can ask me. > > Code improvements and optimizations are always welcome especially if > current interfaces can be preserved (that way the end user gets the > improvement without having to change their code). I always advise > potential optimizers to use a profiler because it is sometimes hard to > predict how the JVM will behave, for example JIT compiling may mean > parts of code that are theoretically CPU intensive may not be the CPU > bottleneck when the JVM compiles them. > > - Mark > > > > > Thank you > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mark > > > > > > > On Jan 24, 2008 11:26 PM, Mark Schreiber > > wrote: > > > > > Hi Felipe - > > > > > > > > > > I agree your method is more efficient but I think it violates the > > > > > SequenceAlignment interface which would cause compatibility > problems. > > > > > I also wonder what should happen if a user calls the > getAlignment() > > > > > method if you have only calculated a score. > > > > > > > > > > instanceof is potentially expensive but it is nothing compared to > > > > > actually performing the SmithWaterman. > > > > > > > > > > Biojava is somewhat memory heavy but this is largely because it is > > > > > object oriented. Certainly something in C would be lighter and > faster > > > > > but the whole point in using Java is the relative benefits of > object > > > > > oriented design. While ultra optimized algorithms where once a > major > > > > > feature of bioinformatics this is becoming less necessary as > standard > > > > > desktops are now equivalent to the super computers of 5 years ago. > > > > > > > > > > I actually find the SW and NW to be reasonably fast. This is > because > > > > > all the heavy lifting is done in loops that the JVM presumably > > > > > compiles and executes natively. > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht < > felipe.albrecht at gmail.com > > > > > wrote: > > > > > > Hello, > > > > > > > > > > > > I saw the commit and I think that this solution is not the > better. > > > > > > I think it because you are creating internally two Sequence and > > probably > > > > the > > > > > > programmer will not use others alignment information, he will > use > > only > > > > the > > > > > > score. > > > > > > > > > > > > Because it, I think that if you have 2 SymbolList, just do the > > alignment > > > > and > > > > > > return the score, as I did.Otherwise, If the programmer want the > > "visual > > > > > > alignment", he should create externally the SimpleSequences, it > is, > > not > > > > the > > > > > > method must do it. > > > > > > > > > > > > IMHO, one [serious] problem in biojava is the memory > consumption, it > > > > have > > > > > > not "lightweight" classes or methods that do the things quickly. > > Because > > > > it, > > > > > > may be is a good choice to have a method that simply gives the > > alignment > > > > > > score, and not do the others things, like backtracking. Another > > think, > > > > the > > > > > > cost of the "instanceof" is high. > > > > > > > > > > > > Thank you, > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > > > wrote: > > > > > > > Hi - > > > > > > > > > > > > > > I have just commited changes that let you use SymbolLists in > all > > parts > > > > > > > of the NW and SW SequenceAlignment objects. > > > > > > > > > > > > > > As you suggested I made the matrix a method local variable. I > also > > > > > > > removed calls to the garbage collector. > > > > > > > > > > > > > > This can be checked out from SVN. > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht < > > felipe.albrecht at gmail.com > > > > > > > wrote: > > > > > > > > If you prefer, I can send a diff and should I do the same > thing > > in > > > > > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < > > markjschreiber at gmail.com > > > > > > > wrote: > > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > > > Thanks for the input on this. As a general rule the GC > should > > > > never be > > > > > > > > > called from code. Generally this degrades performance of > the > > JVM. > > > > > > > > > Unless there is a very good reason I will remove this. > > Probably > > > > you > > > > > > > > > are right a method parameter may work better. > > > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > > > > > > > > > > wrote: > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > > > > > Implement (or just copy and cut) a pairwiseAlignment > > utilizing > > > > > > SymboList > > > > > > > > as > > > > > > > > > > parameters and do no creating a alignment, just the > > calculating > > > > it > > > > > > and > > > > > > > > > > returning the value. > > > > > > > > > > > > > > > > > > > > Another thing that is a bit stange for me, is the > > utilization of > > > > > > garbage > > > > > > > > > > collector direcly, that is: The field "scoreMatrix" is a > > class > > > > > > field, > > > > > > > > why at > > > > > > > > > > the end of pairwiseAlignment it is set to null and the > > garbage > > > > > > collector > > > > > > > > > > run? It is not better (and simpler) to use scoreMatrix > as > > method > > > > > > > > variable? > > > > > > > > > > > > > > > > > > > > I'm annexing the class code with my changes that is > doing > > well > > > > the > > > > > > (4^8) > > > > > > > > * > > > > > > > > > > (4^8) SymbolList pairwise alignments that I am needing > :-) > > > > > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > > > > markjschreiber at gmail.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if > > Sequences > > > > are > > > > > > > > > > > required internally for some obscure reason there is > no > > reason > > > > why > > > > > > > > > > > dummy Sequences cannot be made inside the aligner. > These > > > > > > sequences > > > > > > > > > > > could be given names like 'query' and 'subject' or > even > > 'seq1' > > > > and > > > > > > > > > > > 'seq2'. > > > > > > > > > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > > < felipe.albrecht at gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment > > classes > > > > > > > > (SmithWaterman > > > > > > > > > > and > > > > > > > > > > > > NeedlemanWunsch): > > > > > > > > > > > > Why it is necessary two Sequence for alignment and > not > > two > > > > > > > > SymbolList? > > > > > > > > > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align > between > > > > then, > > > > > > > > > > > > by this way I need to create some "dummies" > Sequence > > for to > > > > do > > > > > > the > > > > > > > > > > > > alignment. > > > > > > > > > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that > is > > > > > > exclusive to > > > > > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > > > > > but if I need only the alignment result, it is > useless. > > > > > > > > > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment > to > > > > accept > > > > > > > > > > SymbolList or > > > > > > > > > > > > may be a new method that the parameters are 2 > SymbolList > > and > > > > > > returns > > > > > > > > the > > > > > > > > > > > > alignment score? > > > > > > > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > biojava-dev mailing list > > > > > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From ap3 at sanger.ac.uk Fri Jan 25 04:49:22 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri, 25 Jan 2008 09:49:22 +0000 Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: <8adccabf0801241217gfa82d47r2728b08c6bcde862@mail.gmail.com> References: <8adccabf0801241217gfa82d47r2728b08c6bcde862@mail.gmail.com> Message-ID: <80544F81-E3DE-4212-8C62-2CA4865E4758@sanger.ac.uk> > DAS looks just wonderful, and I am very glad to be made aware of it ? > it seems like a much better solution than my initial, highly naive > reaction (accessing public SQL connections). don;t think that this is naive. There is also ensembldb.ensembl.org which is a public mysql server if you prefer sql.... > > As I understand it, the easiest way to access DAS services in Java is > via an API such as JAX-WS? Jumping into the Javadocs, and looking > over a JAX-WS tutorial that I found here: > http://java.sun.com/webservices/docs/2.0/tutorial/doc/ it looks there > is a lot to this. I have not tried JAX-WS yet, so I can not comment on that, but if you want a library that makes it easier to talk to DAS servers, you can use my DAS client library at: http://www.spice-3d.org/dasobert/ Andreas > ---------------------------------------------------------------------- > - Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ayates at ebi.ac.uk Fri Jan 25 04:58:37 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 25 Jan 2008 09:58:37 +0000 Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: <80544F81-E3DE-4212-8C62-2CA4865E4758@sanger.ac.uk> References: <8adccabf0801241217gfa82d47r2728b08c6bcde862@mail.gmail.com> <80544F81-E3DE-4212-8C62-2CA4865E4758@sanger.ac.uk> Message-ID: <4799B2CD.3010603@ebi.ac.uk> Andreas Prlic wrote: > >> DAS looks just wonderful, and I am very glad to be made aware of it ? >> it seems like a much better solution than my initial, highly naive >> reaction (accessing public SQL connections). > > don;t think that this is naive. There is also ensembldb.ensembl.org > which is a public mysql server if you prefer sql.... I have to agree. There's a lot to be said about using a database directly (in fact my group does just that). But it is potentially a more fragile solution than using a public api. >> >> As I understand it, the easiest way to access DAS services in Java is >> via an API such as JAX-WS? Jumping into the Javadocs, and looking >> over a JAX-WS tutorial that I found here: >> http://java.sun.com/webservices/docs/2.0/tutorial/doc/ it looks there >> is a lot to this. > > I have not tried JAX-WS yet, so I can not comment on that, but if you > want a library > that makes it easier to talk to DAS servers, you can use my > DAS client library at: http://www.spice-3d.org/dasobert/ I would stay away from JAX-WS. It's a web services framework which is more for SOAP access than anything else. Even when using it for SOAP remoting I've found it less than intuitive (maybe later version have gotten better about this especially with the Java6 compiler api hopefully removing any need for explicit annotation processing). Chances are it would be easier to use JAX-WS when the REST service support comes in but until then I would say stay away from it. Andreas' library is a far superior solution :) Andy From bugzilla-daemon at portal.open-bio.org Fri Jan 25 07:28:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 25 Jan 2008 07:28:08 -0500 Subject: [Biojava-dev] [Bug 2432] non conventional fasta header && RichSequence.IOTools In-Reply-To: Message-ID: <200801251228.m0PCS8A5003219@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2432 mark.schreiber at novartis.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mark.schreiber at novartis.com 2008-01-25 07:28 EST ------- I have added a class called FastaHeader. This class lets you specify which fields you want to see in the fasta header output. There are overloaded writeFasta methods in RichSequence.IOTools that let you use this class easily. For example the following test program reads in a fasta file from Genbank with a full header and outputs it with only the accession number and description after the '>' /* * To change this template, choose Tools | Templates * and open the template in the editor. */ package io; import java.io.BufferedReader; import java.io.FileReader; import org.biojavax.bio.seq.RichSequenceIterator; import org.biojavax.bio.seq.io.FastaHeader; import static org.biojavax.bio.seq.RichSequence.IOTools; /** * * @author Mark */ public class WriteFasta { public static void main(String[] args) throws Exception{ BufferedReader br = new BufferedReader( new FileReader("files/dna.fasta")); RichSequenceIterator iter = IOTools.readFastaDNA(br, null); //IOTools.writeFasta(System.out, iter, null); FastaHeader header = new FastaHeader(); header.setShowDescription(true); header.setShowIdentifier(false); header.setShowNamespace(false); header.setShowName(false); header.setShowVersion(false); IOTools.writeFasta(System.out, iter, null, header); } } -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From miwalsh125 at gmail.com Tue Jan 29 10:22:07 2008 From: miwalsh125 at gmail.com (michael walsh) Date: Tue, 29 Jan 2008 10:22:07 -0500 Subject: [Biojava-dev] Genbank Feature extraction question. Message-ID: BioJava, I am writing a program that extracts features from a Genbank file using BioJava The program needs to extract the feature locations from the file. This is easy enough to do but the location information returned by the program does not specify whether or not the location is on the complementary strand of DNA. This is some of my code: FeatureHolder fltrHold = seq.filter(codeFltr);//codeFltr is a filter that retrieves coding features only. //iterate over the Features in fh for (Iterator i = fltrHold.features(); i.hasNext(); ){ Feature f = (Feature)i.next(); System.out.println(f.getLocation().toString()); An example of the output of the print command is: join:[32775..32948,31801..32052] However, the entry in the Genbank file reads: complement(join(31801..32052,32775..32948)) Is there any way to get my program to output the fact that a features location is on the complementary strand of DNA? Any help that anyone can give me would be greatly appreciated. Sincerely, M Walsh From holland at ebi.ac.uk Wed Jan 30 03:53:29 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 30 Jan 2008 08:53:29 +0000 Subject: [Biojava-dev] Genbank Feature extraction question. In-Reply-To: References: Message-ID: <47A03B09.1030905@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello. If you are using the BioJavaX parsers as opposed to the older deprecated ones, then yes, the information is there. Your parser will be returning instances of RichSequence (there will be a nextRichSequence or getRichSequence method depending on the way in which you are using the parser and/or iterator). Features on a RichSequence are all instances of RichFeature. Locations for RichFeatures are all instances of RichLocation. If you cast appropriately (or locate and find the getRich* equivalents of the get* methods you already using on the non-Rich interfaces) then you will find that RichLocation does have a getStrand method which will give you the information you need. cheers, Richard michael walsh wrote: > BioJava, > > I am writing a program that extracts features from a Genbank file using > BioJava The program needs to extract the feature locations from the file. > This is easy enough to do but the location information returned by the > program does not specify whether or not the location is on the complementary > strand of DNA. > > This is some of my code: > FeatureHolder fltrHold = seq.filter(codeFltr);//codeFltr is a filter that > retrieves coding features only. > //iterate over the Features in fh > for (Iterator i = fltrHold.features(); i.hasNext(); ){ > Feature f = (Feature)i.next(); > System.out.println(f.getLocation().toString()); > > An example of the output of the print command is: > join:[32775..32948,31801..32052] > > However, the entry in the Genbank file reads: > complement(join(31801..32052,32775..32948)) > > Is there any way to get my program to output the fact that a features > location is on the complementary strand of DNA? > > Any help that anyone can give me would be greatly appreciated. > > Sincerely, > > M Walsh > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHoDsJ4C5LeMEKA/QRAhIsAJ4lzIn0bBMjYIaZqNMz0gUm3c1vHgCePGW4 jYPFCjw1pBlMMp94mgRQOsc= =OoGL -----END PGP SIGNATURE----- From felipe.albrecht at gmail.com Thu Jan 24 00:47:37 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Thu, 24 Jan 2008 05:47:37 -0000 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> Message-ID: Hello, I think that it can be solved by a simple way: Implement (or just copy and cut) a pairwiseAlignment utilizing SymboList as parameters and do no creating a alignment, just the calculating it and returning the value. Another thing that is a bit stange for me, is the utilization of garbage collector direcly, that is: The field "scoreMatrix" is a class field, why at the end of pairwiseAlignment it is set to null and the garbage collector run? It is not better (and simpler) to use scoreMatrix as method variable? I'm annexing the class code with my changes that is doing well the (4^8) * (4^8) SymbolList pairwise alignments that I am needing :-) Thank you, Felipe Albrecht On Jan 23, 2008 6:50 AM, Mark Schreiber wrote: > Hi Felipe - > > I agree this is a barrier to ease of use. Even if Sequences are > required internally for some obscure reason there is no reason why > dummy Sequences cannot be made inside the aligner. These sequences > could be given names like 'query' and 'subject' or even 'seq1' and > 'seq2'. > > I will take a look at adding some methods. > > Best regards, > > - Mark > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > wrote: > > Hello all, > > > > I have a simple question about pairwise alignment classes (SmithWaterman > and > > NeedlemanWunsch): > > Why it is necessary two Sequence for alignment and not two SymbolList? > > > > Example, I have a SymbolList collection to align between then, > > by this way I need to create some "dummies" Sequence for to do the > > alignment. > > > > Reading the source, I saw that the unique field that is exclusive to > > Sequence is the name, for the alignment output, > > but if I need only the alignment result, it is useless. > > > > It is not possible to override the pairwiseAlignment to accept > SymbolList or > > may be a new method that the parameters are 2 SymbolList and returns the > > alignment score? > > > > Thank you > > > > Felipe Albrecht > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: SmithWaterman.java Type: application/octet-stream Size: 17664 bytes Desc: not available URL: From holland at ebi.ac.uk Wed Jan 2 11:52:12 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 02 Jan 2008 11:52:12 +0000 Subject: [Biojava-dev] BioJava 3 design discussion coming to an end Message-ID: <477B7AEC.9000401@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all. At the end of January I will be taking the contents of our BioJava 3 discussion wiki page and compiling them into a more formal design proposal. If you have made any comments elsewhere (e.g. by email) which you would like to be considered in the final design proposal, then please add them to the wiki page (or its associated Talk page) before the end of the month. (I won't be trawling through email archives looking for comments so you really must copy your comments across to the wiki if you want them to be included!). The wiki address is: http://www.biojava.org/wiki/BioJava3_Proposal cheers, Richard - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHe3rs4C5LeMEKA/QRAgHqAJ0VR2utTbzjfPYNPXINv26yc1PRNgCZAUnX 978uKqgbePpnHm+3Ynfp7X4= =8nqF -----END PGP SIGNATURE----- From ap3 at sanger.ac.uk Sun Jan 6 12:41:37 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Sun, 6 Jan 2008 12:41:37 +0000 (GMT) Subject: [Biojava-dev] bioperl like blastparser Message-ID: Hi Michael, I just had a look at your patch for the query length. Several of the unit tests are now failing at org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase.java:143) The problem is that most blast related unit tests extend the SSBindCase, which expects a fixed number of attributes. With the new patch some of the blast-flavors have the additional queryLength attribute. Could you have a look at the behaviour of the parser for some of the files where the tests now fail? If you think the new behaviour of the parser is correct, we can simply update the tests to accept the different number of attributes. Thanks, Andreas -------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From markjschreiber at gmail.com Mon Jan 7 08:05:33 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 7 Jan 2008 16:05:33 +0800 Subject: [Biojava-dev] Biojava - svn migration In-Reply-To: References: <93b45ca50712271748x3019ce27m2d45008c8ce13ece@mail.gmail.com> Message-ID: <93b45ca50801070005n10338f8q572b1295bc9ccdff@mail.gmail.com> Hi - This URL also works on Windows Vista. I made a secure tunnel using plink which is part of the PuTTY package. Thanks, - Mark On Dec 28, 2007 12:34 PM, Michael Heuer wrote: > Hello Mark, Andreas > > I was able to check out on linux with commandline subversion and > in eclipse with the following URL: > > svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk > > michael > > > > On Fri, 28 Dec 2007, Mark Schreiber wrote: > > > Hi all - > > > > I am trying to do a SVN check out with Netbeans but the connection > > just seems to hang without doing anything (same happens with command > > line?). I am using the path specified in the biojava wiki but the > > conversation below suggests that the actual path may be different. > > What then is the 'home' path? > > > > Also, does the SSH+SVN command need to be prepended by an SSH command > > (or plink -ssh on windows)? > > > > - Mark > > > > On Dec 28, 2007 5:37 AM, Hilmar Lapp wrote: > > > I see. Makes sense. -hilmar > > > > > > > > > On Dec 27, 2007, at 4:30 PM, Jason Stajich wrote: > > > > > > > My idea was that just like > > > > /home/reposiitory/biojava > > > > we'd put the SVN in > > > > /home/svn-repository/biojava > > > > > > > > So each of the biojava SVN sub-projects ought would be > > > > /home/svn-repository/biojava/biojava-live > > > > /home/svn-repository/biojava/biojava-ensj > > > > > > > > /home/svn is really the homedir for the svn user and som utility > > > > stuff like the svn login passwords so I think it is better not to > > > > put the repos in there. > > > > > > > > -jason > > > > On Dec 27, 2007, at 1:31 PM, Hilmar Lapp wrote: > > > > > > > >> > > > >> On Dec 27, 2007, at 12:32 PM, Chris Fields wrote: > > > >> > > > >>> > > > >>> I agree, but there is already a /home/svn directory which appears > > > >>> related to blipkit. We would need to move the blipkit stuff into > > > >>> it's own subdir and go from there. > > > >>> > > > >> > > > >> > > > >> I.e., it was set up such that blipkit would be the only project > > > >> with in it? I'd assume that was by mistake. Also, the last update > > > >> that I have is that blipkit hasn't been active for a while, though > > > >> I'm not sure. > > > >> > > > >> ChrisM - do you recall the decisions leading to the blipkit svn > > > >> setup? Could it be moved to a subdirectory as ChrisF suggests? > > > >> > > > >> -hilmar > > > >> -- > > > >> =========================================================== > > > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > > > >> =========================================================== > > > >> > > > >> > > > >> > > > > > > > > > > -- > > > =========================================================== > > > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > > > =========================================================== > > > > > > > > > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > From ayates at ebi.ac.uk Mon Jan 7 09:34:33 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 7 Jan 2008 09:34:33 +0000 Subject: [Biojava-dev] Error while reading byte data for creating a Trace. In-Reply-To: <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> Message-ID: <04BE1C71-B7CF-4428-86C7-300E4283DAE8@ebi.ac.uk> Hi, As far as I am aware there isn't a problem with the current ABI parser however if you could send a code snippit of reading in the byte array & the stack trace of the index out of bounds exception that would be most helpful Andy On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > Hi all, > I am having a byte array which is having the data from an .ab1 > file.The > biojava library provides a class called as ABITrace which takes as > input > either a byte[] array , a file or a url.If i use the later > parameters (the > file or the url )the program works but if I pass the byte array to the > constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a > problem with the ABITrace class or how can I bypass this particular > error. > I am printing the length of the byte array and it comes to > 144930...Can > that cause a problem in my code? > > Thanks in advance. > Abhinav > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From markjschreiber at gmail.com Mon Jan 7 09:43:14 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 7 Jan 2008 17:43:14 +0800 Subject: [Biojava-dev] JUnit Message-ID: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> Hi all - What do people think about adding the JUnit jar to the test directory of the biojava-live repository and make the appropriate changes to the ant classpath? This would make it easier for people to test the package directly from the ant build rather than having to specifically place junit on the system classpath. It would probably make it more likely that people run and contribute tests as well. If we add JUnit 4.1 then it will allow the creation of Unit tests by just annotating class methods. If there are no objections I will add it in the next few days. - Mark From ayates at ebi.ac.uk Mon Jan 7 09:46:56 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 7 Jan 2008 09:46:56 +0000 Subject: [Biojava-dev] JUnit In-Reply-To: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> References: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> Message-ID: <42D03EDC-6AEE-4FF3-A3BC-12B304AE91EE@ebi.ac.uk> Yeah I'm happy for that to happen. Just on a side note is Junit 4 compatible with Junit 3's tests? Otherwise will we have to maintain two sets of unit test directories depending on the age of the test? Andy On 7 Jan 2008, at 09:43, Mark Schreiber wrote: > Hi all - > > What do people think about adding the JUnit jar to the test directory > of the biojava-live repository and make the appropriate changes to the > ant classpath? This would make it easier for people to test the > package directly from the ant build rather than having to specifically > place junit on the system classpath. It would probably make it more > likely that people run and contribute tests as well. > > If we add JUnit 4.1 then it will allow the creation of Unit tests by > just annotating class methods. > > If there are no objections I will add it in the next few days. > > - Mark > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From markjschreiber at gmail.com Mon Jan 7 09:48:31 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 7 Jan 2008 17:48:31 +0800 Subject: [Biojava-dev] JUnit In-Reply-To: <42D03EDC-6AEE-4FF3-A3BC-12B304AE91EE@ebi.ac.uk> References: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> <42D03EDC-6AEE-4FF3-A3BC-12B304AE91EE@ebi.ac.uk> Message-ID: <93b45ca50801070148kc1fb1bel88b9c365fca4d27a@mail.gmail.com> >From my preliminary tests it seems to be compatable. It seems that version 4.4 is out now. Allows all kinds of strange assertions, assumptions and theorys. - Mark On Jan 7, 2008 5:46 PM, Andy Yates wrote: > Yeah I'm happy for that to happen. Just on a side note is Junit 4 > compatible with Junit 3's tests? Otherwise will we have to maintain > two sets of unit test directories depending on the age of the test? > > Andy > > > On 7 Jan 2008, at 09:43, Mark Schreiber wrote: > > > Hi all - > > > > What do people think about adding the JUnit jar to the test directory > > of the biojava-live repository and make the appropriate changes to the > > ant classpath? This would make it easier for people to test the > > package directly from the ant build rather than having to specifically > > place junit on the system classpath. It would probably make it more > > likely that people run and contribute tests as well. > > > > If we add JUnit 4.1 then it will allow the creation of Unit tests by > > just annotating class methods. > > > > If there are no objections I will add it in the next few days. > > > > - Mark > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From ayates at ebi.ac.uk Mon Jan 7 09:50:02 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 7 Jan 2008 09:50:02 +0000 Subject: [Biojava-dev] JUnit In-Reply-To: <93b45ca50801070148kc1fb1bel88b9c365fca4d27a@mail.gmail.com> References: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> <42D03EDC-6AEE-4FF3-A3BC-12B304AE91EE@ebi.ac.uk> <93b45ca50801070148kc1fb1bel88b9c365fca4d27a@mail.gmail.com> Message-ID: Sounds good to me :) Andy On 7 Jan 2008, at 09:48, Mark Schreiber wrote: >> From my preliminary tests it seems to be compatable. > > It seems that version 4.4 is out now. Allows all kinds of strange > assertions, assumptions and theorys. > > - Mark > > On Jan 7, 2008 5:46 PM, Andy Yates wrote: >> Yeah I'm happy for that to happen. Just on a side note is Junit 4 >> compatible with Junit 3's tests? Otherwise will we have to maintain >> two sets of unit test directories depending on the age of the test? >> >> Andy >> >> >> On 7 Jan 2008, at 09:43, Mark Schreiber wrote: >> >>> Hi all - >>> >>> What do people think about adding the JUnit jar to the test >>> directory >>> of the biojava-live repository and make the appropriate changes to >>> the >>> ant classpath? This would make it easier for people to test the >>> package directly from the ant build rather than having to >>> specifically >>> place junit on the system classpath. It would probably make it more >>> likely that people run and contribute tests as well. >>> >>> If we add JUnit 4.1 then it will allow the creation of Unit tests by >>> just annotating class methods. >>> >>> If there are no objections I will add it in the next few days. >>> >>> - Mark >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> From ap3 at sanger.ac.uk Mon Jan 7 10:08:42 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 7 Jan 2008 10:08:42 +0000 Subject: [Biojava-dev] JUnit In-Reply-To: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> References: <93b45ca50801070143jccb9deeldf23a5da80de46e9@mail.gmail.com> Message-ID: <034C9BC2-8AA5-4220-BB34-C3D86E991200@sanger.ac.uk> > What do people think about adding the JUnit jar to the test directory > of the biojava-live repository and make the appropriate changes to the > ant classpath? This would make it easier for people to test the > I would suggest to move all the jar files where we have dependencies on into a common subdirectory. e.g something called "libs" or "jars" Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From michaelgang at gmail.com Mon Jan 7 10:16:22 2008 From: michaelgang at gmail.com (Michael Gang) Date: Mon, 7 Jan 2008 12:16:22 +0200 Subject: [Biojava-dev] Fwd: bioperl like blastparser In-Reply-To: <6994d82b0801070050t5d9b513fhc53ff758554116ff@mail.gmail.com> References: <6994d82b0801070050t5d9b513fhc53ff758554116ff@mail.gmail.com> Message-ID: <6994d82b0801070216q1df26e72ic131592048100f3f@mail.gmail.com> Hi Andreas, You are correct. The junit.jar library was missing in my ant_home. Eclipse wrote that it was running the tests, but did not run any. Now I corrected it and see that tests are failing. I ran the program BlastEcho.java manually on the blast test files and on the ncbi blast. Judging after manually curation it worked good but at wu_blast id did not parse the query length. The reason is that in wu_blast the query length line has just 8 spaces at the beginning instead of 9. So I corrected the line which identifies the querylength at org.biojava.bio.program.sax.BlastSaxParser line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { Now it works also on wu_blast. It would be now a good idea to update the blast tests regarding the number of arguments and see if the fail still. Thanks in advance, Michael On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: > Hi Michael, > > I just had a look at your patch for the query length. > Several of the unit tests are now failing at > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase.java:143) > > The problem is that most blast related unit tests extend the SSBindCase, > which expects a fixed number of attributes. With the new patch some of the > blast-flavors have the additional queryLength attribute. > > Could you have a look at the behaviour of the parser for some of the files > where the tests now fail? If you think the new behaviour of the > parser is correct, we can simply update the tests to accept the different > number of attributes. > > Thanks, > Andreas > > > -------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > From holland at ebi.ac.uk Mon Jan 7 12:01:55 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) Subject: [Biojava-dev] Error while reading byte data for creating a Trace. In-Reply-To: <04BE1C71-B7CF-4428-86C7-300E4283DAE8@ebi.ac.uk> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> <04BE1C71-B7CF-4428-86C7-300E4283DAE8@ebi.ac.uk> Message-ID: <50442.80.42.95.78.1199707315.squirrel@webmail.ebi.ac.uk> This problem was resolved back in November. For some reason during the last couple of weeks the BioJava mailing list has been sending out occasional duplicate copies of emails sent several months ago! This was one of them. cheers, Richard On Mon, January 7, 2008 9:34 am, Andy Yates wrote: > Hi, > > As far as I am aware there isn't a problem with the current ABI parser > however if you could send a code snippit of reading in the byte array > & the stack trace of the index out of bounds exception that would be > most helpful > > Andy > > On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > >> Hi all, >> I am having a byte array which is having the data from an .ab1 >> file.The >> biojava library provides a class called as ABITrace which takes as >> input >> either a byte[] array , a file or a url.If i use the later >> parameters (the >> file or the url )the program works but if I pass the byte array to the >> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >> problem with the ABITrace class or how can I bypass this particular >> error. >> I am printing the length of the byte array and it comes to >> 144930...Can >> that cause a problem in my code? >> >> Thanks in advance. >> Abhinav >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland BioMart (http://www.biomart.org/) EMBL-EBI Hinxton, Cambridgeshire CB10 1SD, UK From ayates at ebi.ac.uk Mon Jan 7 12:18:50 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 7 Jan 2008 12:18:50 +0000 Subject: [Biojava-dev] Error while reading byte data for creating a Trace. In-Reply-To: <50442.80.42.95.78.1199707315.squirrel@webmail.ebi.ac.uk> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> <04BE1C71-B7CF-4428-86C7-300E4283DAE8@ebi.ac.uk> <50442.80.42.95.78.1199707315.squirrel@webmail.ebi.ac.uk> Message-ID: <065714BD-6D4F-4B5F-8AAE-E6C47C9405AB@ebi.ac.uk> Oh for ... :). Thought I'd seen this one before Andy On 7 Jan 2008, at 12:01, Richard Holland wrote: > This problem was resolved back in November. For some reason during the > last couple of weeks the BioJava mailing list has been sending out > occasional duplicate copies of emails sent several months ago! This > was > one of them. > > cheers, > Richard > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: >> Hi, >> >> As far as I am aware there isn't a problem with the current ABI >> parser >> however if you could send a code snippit of reading in the byte array >> & the stack trace of the index out of bounds exception that would be >> most helpful >> >> Andy >> >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: >> >>> Hi all, >>> I am having a byte array which is having the data from an .ab1 >>> file.The >>> biojava library provides a class called as ABITrace which takes as >>> input >>> either a byte[] array , a file or a url.If i use the later >>> parameters (the >>> file or the url )the program works but if I pass the byte array to >>> the >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is >>> there a >>> problem with the ABITrace class or how can I bypass this particular >>> error. >>> I am printing the length of the byte array and it comes to >>> 144930...Can >>> that cause a problem in my code? >>> >>> Thanks in advance. >>> Abhinav >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > -- > Richard Holland > BioMart (http://www.biomart.org/) > EMBL-EBI > Hinxton, Cambridgeshire CB10 1SD, UK From ap3 at sanger.ac.uk Mon Jan 7 21:54:21 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 7 Jan 2008 21:54:21 +0000 (GMT) Subject: [Biojava-dev] bioperl like blastparser Message-ID: Hi Michael, thanks for your patch, I commited it to the new svn repository and updated the unit tests to now either take 4 or 5 args. Andreas -------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From heuermh at acm.org Tue Jan 8 00:36:04 2008 From: heuermh at acm.org (Michael Heuer) Date: Mon, 7 Jan 2008 19:36:04 -0500 (EST) Subject: [Biojava-dev] JUnit In-Reply-To: <034C9BC2-8AA5-4220-BB34-C3D86E991200@sanger.ac.uk> Message-ID: Andreas Prlic wrote: > > What do people think about adding the JUnit jar to the test directory > > of the biojava-live repository and make the appropriate changes to the > > ant classpath? This would make it easier for people to test the > > > > I would suggest to move all the jar files where we have dependencies > on into a common subdirectory. > e.g something called "libs" or "jars" Using maven would resolve all of these issues. Or alternatively, a maven build can create an ant build.xml that downloads its dependencies from the maven central repository http://maven.apache.org/plugins/maven-ant-plugin/ or there is Ivy for ant, which can be configured to use the maven central repository http://ant.apache.org/ivy/ The 'lib' directory doesn't really have a place any more. michael From markjschreiber at gmail.com Tue Jan 8 03:17:35 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 8 Jan 2008 11:17:35 +0800 Subject: [Biojava-dev] JUnit In-Reply-To: References: <034C9BC2-8AA5-4220-BB34-C3D86E991200@sanger.ac.uk> Message-ID: <93b45ca50801071917t1cef45epf8772b4370ef3f97@mail.gmail.com> Hi - I have added the junit jar and modified the build.xml I will leave the decision about a lib directory etc for some more debate. - Mark On Jan 8, 2008 8:36 AM, Michael Heuer wrote: > Andreas Prlic wrote: > > > > What do people think about adding the JUnit jar to the test directory > > > of the biojava-live repository and make the appropriate changes to the > > > ant classpath? This would make it easier for people to test the > > > > > > > I would suggest to move all the jar files where we have dependencies > > on into a common subdirectory. > > e.g something called "libs" or "jars" > > Using maven would resolve all of these issues. > > Or alternatively, a maven build can create an ant build.xml that downloads > its dependencies from the maven central repository > > http://maven.apache.org/plugins/maven-ant-plugin/ > > or there is Ivy for ant, which can be configured to use the maven central > repository > > http://ant.apache.org/ivy/ > > The 'lib' directory doesn't really have a place any more. > > michael > > From heuermh at acm.org Tue Jan 8 04:55:42 2008 From: heuermh at acm.org (Michael Heuer) Date: Mon, 7 Jan 2008 23:55:42 -0500 (EST) Subject: [Biojava-dev] JUnit In-Reply-To: <93b45ca50801071917t1cef45epf8772b4370ef3f97@mail.gmail.com> Message-ID: Mark Schreiber wrote: > I will leave the decision about a lib directory etc for some more debate. Now that we have a subversion repository in place, I would be happy to create a maven-based build out on branch for consideration at some point. Ideally this would happen after refactoring/cleanup/purge so I have less work to do. ;) michael From michaelgang at gmail.com Tue Jan 8 08:23:56 2008 From: michaelgang at gmail.com (Michael Gang) Date: Tue, 8 Jan 2008 10:23:56 +0200 Subject: [Biojava-dev] read fasta file Message-ID: <6994d82b0801080023y3cdcc005g57b08c6566c37445@mail.gmail.com> Dear All, I want to read a fasta file of dna (the accessions are internal to our company and may not be like the convention), make manipulations on it and write it to another file. When i take the example from the book "Biojava in Anger" it works fine, but I get warnings that the SeqIOTools type is deprecated. When using the RichSequence.IOTools package I have problems that when writing the fasta it changes the fasta header (it adds the lcl: prefix). I want that the fasta header will be in the output file like in the input file. Will the SeqIOTools type supported further ? If not, is there another way to solve the problem ? Thanks in advance, Michael From holland at ebi.ac.uk Tue Jan 8 08:51:32 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 08 Jan 2008 08:51:32 +0000 Subject: [Biojava-dev] read fasta file In-Reply-To: <6994d82b0801080023y3cdcc005g57b08c6566c37445@mail.gmail.com> References: <6994d82b0801080023y3cdcc005g57b08c6566c37445@mail.gmail.com> Message-ID: <47833994.7090608@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 SeqIOTools is deprecated - this means that it _may_ get dropped in a future release and so you can't rely on it being present in any future release. RichSequence.IOTools follows the FASTA format exactly, which requires a namespace prefix in the header, and it will change the existing header if it does not already meet the FASTA standard. There is currently no way to stop it from doing that, although you might want to raise a bug report so that it goes on our list of things to change. You can do that here: http://bugzilla.open-bio.org/enter_bug.cgi?product=BioJava cheers, Richard Michael Gang wrote: > Dear All, > > I want to read a fasta file of dna (the accessions are internal to our > company and may not be like the convention), make manipulations on it > and write it to another file. > When i take the example from the book "Biojava in Anger" it works > fine, but I get warnings that the SeqIOTools type is deprecated. > When using the RichSequence.IOTools package I have problems that when > writing the fasta it changes the fasta header (it adds the lcl: > prefix). > I want that the fasta header will be in the output file like in the input file. > Will the SeqIOTools type supported further ? > If not, is there another way to solve the problem ? > > Thanks in advance, > Michael > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHgzmT4C5LeMEKA/QRAoF8AJ9SLAMGvm7SpByOyfL1/7tUZ9NbZgCgjeTq FjmCDFlMygy68q1zkbpwX2o= =bTSb -----END PGP SIGNATURE----- From bugzilla-daemon at portal.open-bio.org Tue Jan 8 15:00:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Jan 2008 10:00:58 -0500 Subject: [Biojava-dev] [Bug 2432] New: non conventional fasta header && RichSequence.IOTools Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2432 Summary: non conventional fasta header && RichSequence.IOTools Product: BioJava Version: unspecified Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: seq.io AssignedTo: biojava-dev at biojava.org ReportedBy: michaelgang at gmail.com When reading a fasta file with non conventional header (for example company intern accessions) and writing it with RichSequence.IOTools the fasta header get changed. With deprecated SeqIOTools it works -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From minhduc.cao at gmail.com Wed Jan 9 00:57:43 2008 From: minhduc.cao at gmail.com (Minh Duc, Cao) Date: Wed, 9 Jan 2008 11:57:43 +1100 Subject: [Biojava-dev] Problem with read RichFormat file from an applet Message-ID: Hi, I used IOTools.readFastaDNA(in,null) to read Fasta file and, for a stand alone application, it works perfectly. However, when the code is called from an applet, the following exception is thrown Exception in thread "Thread-8" java.lang.ExceptionInInitializerError at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1813) at org.biojava.bio.seq.SimpleFeatureHolder.( SimpleFeatureHolder.java:54) at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature( RichFeature.java:167) at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java :61) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( SimpleRichSequenceBuilder.java:100) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( SimpleRichSequenceBuilder.java:81) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder (SimpleRichSequenceBuilderFactory.java:68) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) at org.biojavax.bio.seq.io.RichStreamReader.nextSequence( RichStreamReader.java:92) at dnaPlatform.function.ReadFormatFileFunction.guessFormat( ReadFormatFileFunction.java:134) at dnaPlatform.gui.RunFunction.run(MainPanel.java:929) Caused by: java.security.AccessControlException: access denied ( java.lang.RuntimePermission createClassLoader) at java.security.AccessControlContext.checkPermission(Unknown Source) at java.security.AccessController.checkPermission(Unknown Source) at java.lang.SecurityManager.checkPermission(Unknown Source) at java.lang.SecurityManager.checkCreateClassLoader(Unknown Source) at java.lang.ClassLoader.(Unknown Source) at org.biojava.utils.bytecode.GeneratedClassLoader.( GeneratedClassLoader.java:29) at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java :51) at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java :58) at org.biojava.bio.seq.FeatureFilter$OnlyChildren.( FeatureFilter.java:1270) ... 11 more It is noted that the applet is signed and can read files from client harddisk if other method is used. Do anyone have an idea how can I go about to fix this problem? Thank you very much Minh From markjschreiber at gmail.com Wed Jan 9 08:30:45 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 9 Jan 2008 16:30:45 +0800 Subject: [Biojava-dev] read fasta file In-Reply-To: <47833994.7090608@ebi.ac.uk> References: <6994d82b0801080023y3cdcc005g57b08c6566c37445@mail.gmail.com> <47833994.7090608@ebi.ac.uk> Message-ID: <93b45ca50801090030t6ffd9907ieb1082fd20c5b713@mail.gmail.com> Along these lines I have plans to add a way to format the Fasta header in the RichSequence.IOTools so that the content of it can be customised. Currently it follows the NCBI model and tries to add everything it can. I would be interested in proposals for a template mechanism. - Mark On Jan 8, 2008 4:51 PM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > SeqIOTools is deprecated - this means that it _may_ get dropped in a > future release and so you can't rely on it being present in any future > release. > > RichSequence.IOTools follows the FASTA format exactly, which requires a > namespace prefix in the header, and it will change the existing header > if it does not already meet the FASTA standard. There is currently no > way to stop it from doing that, although you might want to raise a bug > report so that it goes on our list of things to change. You can do that > here: http://bugzilla.open-bio.org/enter_bug.cgi?product=BioJava > > cheers, > Richard > > > Michael Gang wrote: > > Dear All, > > > > I want to read a fasta file of dna (the accessions are internal to our > > company and may not be like the convention), make manipulations on it > > and write it to another file. > > When i take the example from the book "Biojava in Anger" it works > > fine, but I get warnings that the SeqIOTools type is deprecated. > > When using the RichSequence.IOTools package I have problems that when > > writing the fasta it changes the fasta header (it adds the lcl: > > prefix). > > I want that the fasta header will be in the output file like in the input file. > > Will the SeqIOTools type supported further ? > > If not, is there another way to solve the problem ? > > > > Thanks in advance, > > Michael > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHgzmT4C5LeMEKA/QRAoF8AJ9SLAMGvm7SpByOyfL1/7tUZ9NbZgCgjeTq > FjmCDFlMygy68q1zkbpwX2o= > =bTSb > -----END PGP SIGNATURE----- > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at ebi.ac.uk Wed Jan 9 08:38:12 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 09 Jan 2008 08:38:12 +0000 Subject: [Biojava-dev] Problem with read RichFormat file from an applet In-Reply-To: References: Message-ID: <478487F4.3090209@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello. This is the root of your problem: Caused by: java.security.AccessControlException: access denied ( java.lang.RuntimePermission createClassLoader) at org.biojava.utils.bytecode.GeneratedClassLoader.( GeneratedClassLoader.java:29) at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) The applet runtime environment is not allowing BioJava to create a custom class loader. It's not to do with disk access at all unfortunately. I don't know of a solution myself as I've not done much work with applets. Does anyone else on this list have any suggestions? cheers, Richard Minh Duc, Cao wrote: > Hi, > > I used IOTools.readFastaDNA(in,null) to read Fasta file and, for a stand > alone application, it works perfectly. However, when the code is called from > an applet, the following exception is thrown > > Exception in thread "Thread-8" java.lang.ExceptionInInitializerError > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1813) > at org.biojava.bio.seq.SimpleFeatureHolder.( > SimpleFeatureHolder.java:54) > at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature( > RichFeature.java:167) > at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java > :61) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > SimpleRichSequenceBuilder.java:100) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > SimpleRichSequenceBuilder.java:81) > at > org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder > (SimpleRichSequenceBuilderFactory.java:68) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > at org.biojavax.bio.seq.io.RichStreamReader.nextSequence( > RichStreamReader.java:92) > at dnaPlatform.function.ReadFormatFileFunction.guessFormat( > ReadFormatFileFunction.java:134) > at dnaPlatform.gui.RunFunction.run(MainPanel.java:929) > Caused by: java.security.AccessControlException: access denied ( > java.lang.RuntimePermission createClassLoader) > at java.security.AccessControlContext.checkPermission(Unknown Source) > at java.security.AccessController.checkPermission(Unknown Source) > at java.lang.SecurityManager.checkPermission(Unknown Source) > at java.lang.SecurityManager.checkCreateClassLoader(Unknown Source) > at java.lang.ClassLoader.(Unknown Source) > at org.biojava.utils.bytecode.GeneratedClassLoader.( > GeneratedClassLoader.java:29) > at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) > at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java > :51) > at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java > :58) > at org.biojava.bio.seq.FeatureFilter$OnlyChildren.( > FeatureFilter.java:1270) > ... 11 more > > It is noted that the applet is signed and can read files from client > harddisk if other method is used. > > Do anyone have an idea how can I go about to fix this problem? > > Thank you very much > > Minh > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhIfz4C5LeMEKA/QRAhZqAJ9k36tFYC7wdBt6eScgCn5MK9uVZwCeIVHU R0e4dCpmpjJnHOrfjfw0wYc= =WayD -----END PGP SIGNATURE----- From markjschreiber at gmail.com Wed Jan 9 08:50:11 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 9 Jan 2008 16:50:11 +0800 Subject: [Biojava-dev] Problem with read RichFormat file from an applet In-Reply-To: <478487F4.3090209@ebi.ac.uk> References: <478487F4.3090209@ebi.ac.uk> Message-ID: <93b45ca50801090050j40455c2bid1af46c277a62582@mail.gmail.com> Consulting a good book on the java security model may reveal a way that you can modify the policy to allow this. However, I think you should give serious consideration to why you would want to use an applet in any context. The technology has major limitations and has been long since superceeded by either severlet or other technologies for server side stuff or webstart for client side apps distributed from a server. - Mark On Jan 9, 2008 4:38 PM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello. > > This is the root of your problem: > > Caused by: java.security.AccessControlException: access denied ( > java.lang.RuntimePermission createClassLoader) > at org.biojava.utils.bytecode.GeneratedClassLoader.( > GeneratedClassLoader.java:29) > at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) > > The applet runtime environment is not allowing BioJava to create a > custom class loader. It's not to do with disk access at all unfortunately. > > I don't know of a solution myself as I've not done much work with applets. > > Does anyone else on this list have any suggestions? > > cheers, > Richard > > > > Minh Duc, Cao wrote: > > Hi, > > > > I used IOTools.readFastaDNA(in,null) to read Fasta file and, for a stand > > alone application, it works perfectly. However, when the code is called from > > an applet, the following exception is thrown > > > > Exception in thread "Thread-8" java.lang.ExceptionInInitializerError > > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1813) > > at org.biojava.bio.seq.SimpleFeatureHolder.( > > SimpleFeatureHolder.java:54) > > at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature( > > RichFeature.java:167) > > at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java > > :61) > > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > > SimpleRichSequenceBuilder.java:100) > > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > > SimpleRichSequenceBuilder.java:81) > > at > > org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder > > (SimpleRichSequenceBuilderFactory.java:68) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > at org.biojavax.bio.seq.io.RichStreamReader.nextSequence( > > RichStreamReader.java:92) > > at dnaPlatform.function.ReadFormatFileFunction.guessFormat( > > ReadFormatFileFunction.java:134) > > at dnaPlatform.gui.RunFunction.run(MainPanel.java:929) > > Caused by: java.security.AccessControlException: access denied ( > > java.lang.RuntimePermission createClassLoader) > > at java.security.AccessControlContext.checkPermission(Unknown Source) > > at java.security.AccessController.checkPermission(Unknown Source) > > at java.lang.SecurityManager.checkPermission(Unknown Source) > > at java.lang.SecurityManager.checkCreateClassLoader(Unknown Source) > > at java.lang.ClassLoader.(Unknown Source) > > at org.biojava.utils.bytecode.GeneratedClassLoader.( > > GeneratedClassLoader.java:29) > > at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java:68) > > at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java > > :51) > > at org.biojava.utils.walker.WalkerFactory.getInstance(WalkerFactory.java > > :58) > > at org.biojava.bio.seq.FeatureFilter$OnlyChildren.( > > FeatureFilter.java:1270) > > ... 11 more > > > > It is noted that the applet is signed and can read files from client > > harddisk if other method is used. > > > > Do anyone have an idea how can I go about to fix this problem? > > > > Thank you very much > > > > Minh > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHhIfz4C5LeMEKA/QRAhZqAJ9k36tFYC7wdBt6eScgCn5MK9uVZwCeIVHU > R0e4dCpmpjJnHOrfjfw0wYc= > =WayD > -----END PGP SIGNATURE----- > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From jcope at cableone.net Wed Jan 9 15:59:16 2008 From: jcope at cableone.net (Jeff Cope) Date: Wed, 9 Jan 2008 08:59:16 -0700 Subject: [Biojava-dev] BioJava Development In-Reply-To: References: Message-ID: <000801c852d8$966eb370$6402a8c0@roadrunner> Hi, My name is Jeff Cope, and I'm currently working on a bioinformatics project for a professor at BSU. What we have so far is mostly in java, but with the data calculations taking place using functionality found in the BioPython library (Molecular Weight, Instability Index, Isoelectric Point, Aromaticity, GRAVY, etc...). Anywho, currently we are only looking at protein sequences, and thought that we could help you out on the BioJava project by seeing if that functionality could be added into your library instead... So I guess my question is, now that I'm signed up on the developers mail list, what would I need to do to get started (assuming you want my help), and what kind of programming ground rules do you have... I feel pretty comfortable in Java code, and if you would like to see an example of my source code, and some of the work I've done so far, you can find it here: Current project: http://trac.boisestate.edu/protcalc/ Source Code: http://trac.boisestate.edu/protcalc/src/ API docs: http://trac.boisestate.edu/protcalc/docs/ Thanks, Jeff Cope -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of biojava-dev-request at lists.open-bio.org Sent: Tuesday, January 08, 2008 1:52 AM To: biojava-dev at lists.open-bio.org Subject: biojava-dev Digest, Vol 59, Issue 2 Send biojava-dev mailing list submissions to biojava-dev at lists.open-bio.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.open-bio.org/mailman/listinfo/biojava-dev or, via email, send a message with subject or body 'help' to biojava-dev-request at lists.open-bio.org You can reach the person managing the list at biojava-dev-owner at lists.open-bio.org When replying, please edit your Subject line so it is more specific than "Re: Contents of biojava-dev digest..." Today's Topics: 1. Fwd: bioperl like blastparser (Michael Gang) 2. Re: Error while reading byte data for creating a Trace. (Richard Holland) 3. Re: Error while reading byte data for creating a Trace. (Andy Yates) 4. Re: bioperl like blastparser (Andreas Prlic) 5. Re: JUnit (Michael Heuer) 6. Re: JUnit (Mark Schreiber) 7. Re: JUnit (Michael Heuer) 8. read fasta file (Michael Gang) 9. Re: read fasta file (Richard Holland) ---------------------------------------------------------------------- Message: 1 Date: Mon, 7 Jan 2008 12:16:22 +0200 From: "Michael Gang" Subject: [Biojava-dev] Fwd: bioperl like blastparser To: biojava-dev at biojava.org Message-ID: <6994d82b0801070216q1df26e72ic131592048100f3f at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi Andreas, You are correct. The junit.jar library was missing in my ant_home. Eclipse wrote that it was running the tests, but did not run any. Now I corrected it and see that tests are failing. I ran the program BlastEcho.java manually on the blast test files and on the ncbi blast. Judging after manually curation it worked good but at wu_blast id did not parse the query length. The reason is that in wu_blast the query length line has just 8 spaces at the beginning instead of 9. So I corrected the line which identifies the querylength at org.biojava.bio.program.sax.BlastSaxParser line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { Now it works also on wu_blast. It would be now a good idea to update the blast tests regarding the number of arguments and see if the fail still. Thanks in advance, Michael On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: > Hi Michael, > > I just had a look at your patch for the query length. > Several of the unit tests are now failing at > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase .java:143) > > The problem is that most blast related unit tests extend the SSBindCase, > which expects a fixed number of attributes. With the new patch some of the > blast-flavors have the additional queryLength attribute. > > Could you have a look at the behaviour of the parser for some of the files > where the tests now fail? If you think the new behaviour of the > parser is correct, we can simply update the tests to accept the different > number of attributes. > > Thanks, > Andreas > > > -------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > ------------------------------ Message: 2 Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) From: "Richard Holland" Subject: Re: [Biojava-dev] Error while reading byte data for creating a Trace. To: "Andy Yates" Cc: biojava-l at biojava.org, biojava-dev at biojava.org, abhi232 at cc.gatech.edu Message-ID: <50442.80.42.95.78.1199707315.squirrel at webmail.ebi.ac.uk> Content-Type: text/plain;charset=iso-8859-1 This problem was resolved back in November. For some reason during the last couple of weeks the BioJava mailing list has been sending out occasional duplicate copies of emails sent several months ago! This was one of them. cheers, Richard On Mon, January 7, 2008 9:34 am, Andy Yates wrote: > Hi, > > As far as I am aware there isn't a problem with the current ABI parser > however if you could send a code snippit of reading in the byte array > & the stack trace of the index out of bounds exception that would be > most helpful > > Andy > > On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > >> Hi all, >> I am having a byte array which is having the data from an .ab1 >> file.The >> biojava library provides a class called as ABITrace which takes as >> input >> either a byte[] array , a file or a url.If i use the later >> parameters (the >> file or the url )the program works but if I pass the byte array to the >> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >> problem with the ABITrace class or how can I bypass this particular >> error. >> I am printing the length of the byte array and it comes to >> 144930...Can >> that cause a problem in my code? >> >> Thanks in advance. >> Abhinav >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland BioMart (http://www.biomart.org/) EMBL-EBI Hinxton, Cambridgeshire CB10 1SD, UK ------------------------------ Message: 3 Date: Mon, 7 Jan 2008 12:18:50 +0000 From: Andy Yates Subject: Re: [Biojava-dev] Error while reading byte data for creating a Trace. To: "Richard Holland" Cc: biojava-l at biojava.org, biojava-dev at biojava.org, abhi232 at cc.gatech.edu Message-ID: <065714BD-6D4F-4B5F-8AAE-E6C47C9405AB at ebi.ac.uk> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Oh for ... :). Thought I'd seen this one before Andy On 7 Jan 2008, at 12:01, Richard Holland wrote: > This problem was resolved back in November. For some reason during the > last couple of weeks the BioJava mailing list has been sending out > occasional duplicate copies of emails sent several months ago! This > was > one of them. > > cheers, > Richard > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: >> Hi, >> >> As far as I am aware there isn't a problem with the current ABI >> parser >> however if you could send a code snippit of reading in the byte array >> & the stack trace of the index out of bounds exception that would be >> most helpful >> >> Andy >> >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: >> >>> Hi all, >>> I am having a byte array which is having the data from an .ab1 >>> file.The >>> biojava library provides a class called as ABITrace which takes as >>> input >>> either a byte[] array , a file or a url.If i use the later >>> parameters (the >>> file or the url )the program works but if I pass the byte array to >>> the >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is >>> there a >>> problem with the ABITrace class or how can I bypass this particular >>> error. >>> I am printing the length of the byte array and it comes to >>> 144930...Can >>> that cause a problem in my code? >>> >>> Thanks in advance. >>> Abhinav >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > -- > Richard Holland > BioMart (http://www.biomart.org/) > EMBL-EBI > Hinxton, Cambridgeshire CB10 1SD, UK ------------------------------ Message: 4 Date: Mon, 7 Jan 2008 21:54:21 +0000 (GMT) From: Andreas Prlic Subject: Re: [Biojava-dev] bioperl like blastparser To: michaelgang at gmail.com Cc: biojava-dev at biojava.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Hi Michael, thanks for your patch, I commited it to the new svn repository and updated the unit tests to now either take 4 or 5 args. Andreas -------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------ Message: 5 Date: Mon, 7 Jan 2008 19:36:04 -0500 (EST) From: Michael Heuer Subject: Re: [Biojava-dev] JUnit To: Andreas Prlic Cc: biojava-dev at biojava.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII Andreas Prlic wrote: > > What do people think about adding the JUnit jar to the test directory > > of the biojava-live repository and make the appropriate changes to the > > ant classpath? This would make it easier for people to test the > > > > I would suggest to move all the jar files where we have dependencies > on into a common subdirectory. > e.g something called "libs" or "jars" Using maven would resolve all of these issues. Or alternatively, a maven build can create an ant build.xml that downloads its dependencies from the maven central repository http://maven.apache.org/plugins/maven-ant-plugin/ or there is Ivy for ant, which can be configured to use the maven central repository http://ant.apache.org/ivy/ The 'lib' directory doesn't really have a place any more. michael ------------------------------ Message: 6 Date: Tue, 8 Jan 2008 11:17:35 +0800 From: "Mark Schreiber" Subject: Re: [Biojava-dev] JUnit To: "Michael Heuer" Cc: biojava-dev at biojava.org Message-ID: <93b45ca50801071917t1cef45epf8772b4370ef3f97 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi - I have added the junit jar and modified the build.xml I will leave the decision about a lib directory etc for some more debate. - Mark On Jan 8, 2008 8:36 AM, Michael Heuer wrote: > Andreas Prlic wrote: > > > > What do people think about adding the JUnit jar to the test directory > > > of the biojava-live repository and make the appropriate changes to the > > > ant classpath? This would make it easier for people to test the > > > > > > > I would suggest to move all the jar files where we have dependencies > > on into a common subdirectory. > > e.g something called "libs" or "jars" > > Using maven would resolve all of these issues. > > Or alternatively, a maven build can create an ant build.xml that downloads > its dependencies from the maven central repository > > http://maven.apache.org/plugins/maven-ant-plugin/ > > or there is Ivy for ant, which can be configured to use the maven central > repository > > http://ant.apache.org/ivy/ > > The 'lib' directory doesn't really have a place any more. > > michael > > ------------------------------ Message: 7 Date: Mon, 7 Jan 2008 23:55:42 -0500 (EST) From: Michael Heuer Subject: Re: [Biojava-dev] JUnit To: Mark Schreiber Cc: biojava-dev at biojava.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII Mark Schreiber wrote: > I will leave the decision about a lib directory etc for some more debate. Now that we have a subversion repository in place, I would be happy to create a maven-based build out on branch for consideration at some point. Ideally this would happen after refactoring/cleanup/purge so I have less work to do. ;) michael ------------------------------ Message: 8 Date: Tue, 8 Jan 2008 10:23:56 +0200 From: "Michael Gang" Subject: [Biojava-dev] read fasta file To: biojava-dev at biojava.org Message-ID: <6994d82b0801080023y3cdcc005g57b08c6566c37445 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Dear All, I want to read a fasta file of dna (the accessions are internal to our company and may not be like the convention), make manipulations on it and write it to another file. When i take the example from the book "Biojava in Anger" it works fine, but I get warnings that the SeqIOTools type is deprecated. When using the RichSequence.IOTools package I have problems that when writing the fasta it changes the fasta header (it adds the lcl: prefix). I want that the fasta header will be in the output file like in the input file. Will the SeqIOTools type supported further ? If not, is there another way to solve the problem ? Thanks in advance, Michael ------------------------------ Message: 9 Date: Tue, 08 Jan 2008 08:51:32 +0000 From: Richard Holland Subject: Re: [Biojava-dev] read fasta file To: Michael Gang Cc: biojava-dev at biojava.org Message-ID: <47833994.7090608 at ebi.ac.uk> Content-Type: text/plain; charset=ISO-8859-1 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 SeqIOTools is deprecated - this means that it _may_ get dropped in a future release and so you can't rely on it being present in any future release. RichSequence.IOTools follows the FASTA format exactly, which requires a namespace prefix in the header, and it will change the existing header if it does not already meet the FASTA standard. There is currently no way to stop it from doing that, although you might want to raise a bug report so that it goes on our list of things to change. You can do that here: http://bugzilla.open-bio.org/enter_bug.cgi?product=BioJava cheers, Richard Michael Gang wrote: > Dear All, > > I want to read a fasta file of dna (the accessions are internal to our > company and may not be like the convention), make manipulations on it > and write it to another file. > When i take the example from the book "Biojava in Anger" it works > fine, but I get warnings that the SeqIOTools type is deprecated. > When using the RichSequence.IOTools package I have problems that when > writing the fasta it changes the fasta header (it adds the lcl: > prefix). > I want that the fasta header will be in the output file like in the input file. > Will the SeqIOTools type supported further ? > If not, is there another way to solve the problem ? > > Thanks in advance, > Michael > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHgzmT4C5LeMEKA/QRAoF8AJ9SLAMGvm7SpByOyfL1/7tUZ9NbZgCgjeTq FjmCDFlMygy68q1zkbpwX2o= =bTSb -----END PGP SIGNATURE----- ------------------------------ _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev End of biojava-dev Digest, Vol 59, Issue 2 ****************************************** From holland at ebi.ac.uk Wed Jan 9 17:09:13 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 09 Jan 2008 17:09:13 +0000 Subject: [Biojava-dev] BioJava Development In-Reply-To: <000801c852d8$966eb370$6402a8c0@roadrunner> References: <000801c852d8$966eb370$6402a8c0@roadrunner> Message-ID: <4784FFB9.1050402@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Jeff. Thanks for volunteering! Sounds like you've got a useful set of tools which we would definitely appreciate as new BioJava features. We're currently planning a Big Reorganisation for a new BioJava 3 from the ground-up. Details will be published some time in February I expect, if not then early March. See this Wiki for things that are currently being considered (most will make it into the plan, some may not, but I won't know until I've written it up and identified the conflicting areas): http://www.biojava.org/wiki/BioJava3_Proposal (also see the associated Discussion page for further comments) If I were you, I'd hang on until the final plan is published. It will contain everything you need to know on how to write modules for the new BioJava 3. However, if you're in a rush to get it into the current BioJava 2 release, then take a look round the JavaDocs to see what is present and what is not. You'll soon get a good idea of how things are organised and what features are absent. We also have a bugzilla page with some unresolved bugs - always a good starting point to learn how a system works and where the opportunities for development are! http://bugzilla.open-bio.org/buglist.cgi?product=BioJava&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED cheers, Richard Jeff Cope wrote: > Hi, > > My name is Jeff Cope, and I'm currently working on a bioinformatics > project for a professor at BSU. What we have so far is mostly in java, but > with the data calculations taking place using functionality found in the > BioPython library (Molecular Weight, Instability Index, Isoelectric Point, > Aromaticity, GRAVY, etc...). Anywho, currently we are only looking at > protein sequences, and thought that we could help you out on the BioJava > project by seeing if that functionality could be added into your library > instead... > > So I guess my question is, now that I'm signed up on the developers > mail list, what would I need to do to get started (assuming you want my > help), and what kind of programming ground rules do you have... > > I feel pretty comfortable in Java code, and if you would like to see > an example of my source code, and some of the work I've done so far, you can > find it here: > Current project: > http://trac.boisestate.edu/protcalc/ > Source Code: > http://trac.boisestate.edu/protcalc/src/ > API docs: > http://trac.boisestate.edu/protcalc/docs/ > > > Thanks, > Jeff Cope > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > biojava-dev-request at lists.open-bio.org > Sent: Tuesday, January 08, 2008 1:52 AM > To: biojava-dev at lists.open-bio.org > Subject: biojava-dev Digest, Vol 59, Issue 2 > > Send biojava-dev mailing list submissions to > biojava-dev at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biojava-dev > or, via email, send a message with subject or body 'help' to > biojava-dev-request at lists.open-bio.org > > You can reach the person managing the list at > biojava-dev-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of biojava-dev digest..." > > > Today's Topics: > > 1. Fwd: bioperl like blastparser (Michael Gang) > 2. Re: Error while reading byte data for creating a Trace. > (Richard Holland) > 3. Re: Error while reading byte data for creating a Trace. > (Andy Yates) > 4. Re: bioperl like blastparser (Andreas Prlic) > 5. Re: JUnit (Michael Heuer) > 6. Re: JUnit (Mark Schreiber) > 7. Re: JUnit (Michael Heuer) > 8. read fasta file (Michael Gang) > 9. Re: read fasta file (Richard Holland) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 7 Jan 2008 12:16:22 +0200 > From: "Michael Gang" > Subject: [Biojava-dev] Fwd: bioperl like blastparser > To: biojava-dev at biojava.org > Message-ID: > <6994d82b0801070216q1df26e72ic131592048100f3f at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi Andreas, > > You are correct. > The junit.jar library was missing in my ant_home. > Eclipse wrote that it was running the tests, but did not run any. > Now I corrected it and see that tests are failing. > I ran the program BlastEcho.java manually on the blast test files and > on the ncbi blast. > Judging after manually curation it worked good but at wu_blast id did > not parse the query length. > The reason is that in wu_blast the query length line has just 8 spaces > at the beginning instead of 9. > So I corrected the line which identifies the querylength at > org.biojava.bio.program.sax.BlastSaxParser > line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { > > Now it works also on wu_blast. > It would be now a good idea to update the blast tests regarding the > number of arguments and see if the fail still. > > Thanks in advance, > Michael > > > > On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: >> Hi Michael, >> >> I just had a look at your patch for the query length. >> Several of the unit tests are now failing at >> > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase > .java:143) >> The problem is that most blast related unit tests extend the SSBindCase, >> which expects a fixed number of attributes. With the new patch some of the >> blast-flavors have the additional queryLength attribute. >> >> Could you have a look at the behaviour of the parser for some of the files >> where the tests now fail? If you think the new behaviour of the >> parser is correct, we can simply update the tests to accept the different >> number of attributes. >> >> Thanks, >> Andreas >> >> >> -------------------------------------------------- >> >> Andreas Prlic Wellcome Trust Sanger Institute >> Hinxton, Cambridge CB10 1SA, UK >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> > > > ------------------------------ > > Message: 2 > Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) > From: "Richard Holland" > Subject: Re: [Biojava-dev] Error while reading byte data for creating > a Trace. > To: "Andy Yates" > Cc: biojava-l at biojava.org, biojava-dev at biojava.org, > abhi232 at cc.gatech.edu > Message-ID: <50442.80.42.95.78.1199707315.squirrel at webmail.ebi.ac.uk> > Content-Type: text/plain;charset=iso-8859-1 > > This problem was resolved back in November. For some reason during the > last couple of weeks the BioJava mailing list has been sending out > occasional duplicate copies of emails sent several months ago! This was > one of them. > > cheers, > Richard > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: >> Hi, >> >> As far as I am aware there isn't a problem with the current ABI parser >> however if you could send a code snippit of reading in the byte array >> & the stack trace of the index out of bounds exception that would be >> most helpful >> >> Andy >> >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: >> >>> Hi all, >>> I am having a byte array which is having the data from an .ab1 >>> file.The >>> biojava library provides a class called as ABITrace which takes as >>> input >>> either a byte[] array , a file or a url.If i use the later >>> parameters (the >>> file or the url )the program works but if I pass the byte array to the >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >>> problem with the ABITrace class or how can I bypass this particular >>> error. >>> I am printing the length of the byte array and it comes to >>> 144930...Can >>> that cause a problem in my code? >>> >>> Thanks in advance. >>> Abhinav >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhP+44C5LeMEKA/QRAroBAJ9oU/by7joNIkdpkOoEtPFzcP+6ZwCfcVYV YpEnZK4o2READMXOsaE9oMo= =QZDP -----END PGP SIGNATURE----- From minhduc.cao at gmail.com Wed Jan 9 19:47:28 2008 From: minhduc.cao at gmail.com (Minh Duc, Cao) Date: Thu, 10 Jan 2008 06:47:28 +1100 Subject: [Biojava-dev] Problem with read RichFormat file from an applet In-Reply-To: <93b45ca50801090050j40455c2bid1af46c277a62582@mail.gmail.com> References: <478487F4.3090209@ebi.ac.uk> <93b45ca50801090050j40455c2bid1af46c277a62582@mail.gmail.com> Message-ID: Hi, I figured out the problem. Once the biojava jar file is signed, the applet can read files without any problems. Many thanks to you both. Also thank Mark for your suggestions, I will certainly try WebStart out. Cheers Minh On Jan 9, 2008 7:50 PM, Mark Schreiber wrote: > Consulting a good book on the java security model may reveal a way > that you can modify the policy to allow this. > > However, I think you should give serious consideration to why you > would want to use an applet in any context. The technology has major > limitations and has been long since superceeded by either severlet or > other technologies for server side stuff or webstart for client side > apps distributed from a server. > > - Mark > > On Jan 9, 2008 4:38 PM, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hello. > > > > This is the root of your problem: > > > > Caused by: java.security.AccessControlException: access denied ( > > java.lang.RuntimePermission createClassLoader) > > at org.biojava.utils.bytecode.GeneratedClassLoader.( > > GeneratedClassLoader.java:29) > > at org.biojava.utils.walker.WalkerFactory.(WalkerFactory.java > :68) > > > > The applet runtime environment is not allowing BioJava to create a > > custom class loader. It's not to do with disk access at all > unfortunately. > > > > I don't know of a solution myself as I've not done much work with > applets. > > > > Does anyone else on this list have any suggestions? > > > > cheers, > > Richard > > > > > > > > Minh Duc, Cao wrote: > > > Hi, > > > > > > I used IOTools.readFastaDNA(in,null) to read Fasta file and, for a > stand > > > alone application, it works perfectly. However, when the code is > called from > > > an applet, the following exception is thrown > > > > > > Exception in thread "Thread-8" java.lang.ExceptionInInitializerError > > > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java > :1813) > > > at org.biojava.bio.seq.SimpleFeatureHolder.( > > > SimpleFeatureHolder.java:54) > > > at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature( > > > RichFeature.java:167) > > > at org.biojavax.bio.seq.io.RichSeqIOAdapter.( > RichSeqIOAdapter.java > > > :61) > > > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > > > SimpleRichSequenceBuilder.java:100) > > > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.( > > > SimpleRichSequenceBuilder.java:81) > > > at > > > > org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder > > > (SimpleRichSequenceBuilderFactory.java:68) > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > > RichStreamReader.java:109) > > > at org.biojavax.bio.seq.io.RichStreamReader.nextSequence( > > > RichStreamReader.java:92) > > > at dnaPlatform.function.ReadFormatFileFunction.guessFormat( > > > ReadFormatFileFunction.java:134) > > > at dnaPlatform.gui.RunFunction.run(MainPanel.java:929) > > > Caused by: java.security.AccessControlException: access denied ( > > > java.lang.RuntimePermission createClassLoader) > > > at java.security.AccessControlContext.checkPermission(Unknown > Source) > > > at java.security.AccessController.checkPermission(Unknown Source) > > > at java.lang.SecurityManager.checkPermission(Unknown Source) > > > at java.lang.SecurityManager.checkCreateClassLoader(Unknown > Source) > > > at java.lang.ClassLoader.(Unknown Source) > > > at org.biojava.utils.bytecode.GeneratedClassLoader.( > > > GeneratedClassLoader.java:29) > > > at org.biojava.utils.walker.WalkerFactory.( > WalkerFactory.java:68) > > > at org.biojava.utils.walker.WalkerFactory.getInstance( > WalkerFactory.java > > > :51) > > > at org.biojava.utils.walker.WalkerFactory.getInstance( > WalkerFactory.java > > > :58) > > > at org.biojava.bio.seq.FeatureFilter$OnlyChildren.( > > > FeatureFilter.java:1270) > > > ... 11 more > > > > > > It is noted that the applet is signed and can read files from client > > > harddisk if other method is used. > > > > > > Do anyone have an idea how can I go about to fix this problem? > > > > > > Thank you very much > > > > > > Minh > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > - -- > > Richard Holland (BioMart) > > EMBL EBI, Wellcome Trust Genome Campus, > > Hinxton, Cambridgeshire CB10 1SD, UK > > Tel. +44 (0)1223 494416 > > > > http://www.biomart.org/ > > http://www.biojava.org/ > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.2.2 (GNU/Linux) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > > > iD8DBQFHhIfz4C5LeMEKA/QRAhZqAJ9k36tFYC7wdBt6eScgCn5MK9uVZwCeIVHU > > R0e4dCpmpjJnHOrfjfw0wYc= > > =WayD > > -----END PGP SIGNATURE----- > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > From markjschreiber at gmail.com Thu Jan 10 07:04:00 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 10 Jan 2008 15:04:00 +0800 Subject: [Biojava-dev] BioJava Development In-Reply-To: <4784FFB9.1050402@ebi.ac.uk> References: <000801c852d8$966eb370$6402a8c0@roadrunner> <4784FFB9.1050402@ebi.ac.uk> Message-ID: <93b45ca50801092304p6c97d4adve40f1f83959b2c32@mail.gmail.com> Hi Jeff - I think the type of functionality you describe would be best placed in the proteomic package where there is already some stuff for calculating MW and PI. Probably the reorganisation for BJ3 won't mean much change for your code as these types of algorithms can usually be readily ported to any new system. - Mark On Jan 10, 2008 1:09 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Jeff. > > Thanks for volunteering! Sounds like you've got a useful set of tools > which we would definitely appreciate as new BioJava features. > > We're currently planning a Big Reorganisation for a new BioJava 3 from > the ground-up. Details will be published some time in February I expect, > if not then early March. See this Wiki for things that are currently > being considered (most will make it into the plan, some may not, but I > won't know until I've written it up and identified the conflicting areas): > > http://www.biojava.org/wiki/BioJava3_Proposal > > (also see the associated Discussion page for further comments) > > If I were you, I'd hang on until the final plan is published. It will > contain everything you need to know on how to write modules for the new > BioJava 3. > > However, if you're in a rush to get it into the current BioJava 2 > release, then take a look round the JavaDocs to see what is present and > what is not. You'll soon get a good idea of how things are organised and > what features are absent. We also have a bugzilla page with some > unresolved bugs - always a good starting point to learn how a system > works and where the opportunities for development are! > http://bugzilla.open-bio.org/buglist.cgi?product=BioJava&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED > > cheers, > Richard > > > Jeff Cope wrote: > > Hi, > > > > My name is Jeff Cope, and I'm currently working on a bioinformatics > > project for a professor at BSU. What we have so far is mostly in java, but > > with the data calculations taking place using functionality found in the > > BioPython library (Molecular Weight, Instability Index, Isoelectric Point, > > Aromaticity, GRAVY, etc...). Anywho, currently we are only looking at > > protein sequences, and thought that we could help you out on the BioJava > > project by seeing if that functionality could be added into your library > > instead... > > > > So I guess my question is, now that I'm signed up on the developers > > mail list, what would I need to do to get started (assuming you want my > > help), and what kind of programming ground rules do you have... > > > > I feel pretty comfortable in Java code, and if you would like to see > > an example of my source code, and some of the work I've done so far, you can > > find it here: > > Current project: > > http://trac.boisestate.edu/protcalc/ > > Source Code: > > http://trac.boisestate.edu/protcalc/src/ > > API docs: > > http://trac.boisestate.edu/protcalc/docs/ > > > > > > Thanks, > > Jeff Cope > > > > -----Original Message----- > > From: biojava-dev-bounces at lists.open-bio.org > > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > > biojava-dev-request at lists.open-bio.org > > Sent: Tuesday, January 08, 2008 1:52 AM > > To: biojava-dev at lists.open-bio.org > > Subject: biojava-dev Digest, Vol 59, Issue 2 > > > > Send biojava-dev mailing list submissions to > > biojava-dev at lists.open-bio.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > or, via email, send a message with subject or body 'help' to > > biojava-dev-request at lists.open-bio.org > > > > You can reach the person managing the list at > > biojava-dev-owner at lists.open-bio.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of biojava-dev digest..." > > > > > > Today's Topics: > > > > 1. Fwd: bioperl like blastparser (Michael Gang) > > 2. Re: Error while reading byte data for creating a Trace. > > (Richard Holland) > > 3. Re: Error while reading byte data for creating a Trace. > > (Andy Yates) > > 4. Re: bioperl like blastparser (Andreas Prlic) > > 5. Re: JUnit (Michael Heuer) > > 6. Re: JUnit (Mark Schreiber) > > 7. Re: JUnit (Michael Heuer) > > 8. read fasta file (Michael Gang) > > 9. Re: read fasta file (Richard Holland) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Mon, 7 Jan 2008 12:16:22 +0200 > > From: "Michael Gang" > > Subject: [Biojava-dev] Fwd: bioperl like blastparser > > To: biojava-dev at biojava.org > > Message-ID: > > <6994d82b0801070216q1df26e72ic131592048100f3f at mail.gmail.com> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Hi Andreas, > > > > You are correct. > > The junit.jar library was missing in my ant_home. > > Eclipse wrote that it was running the tests, but did not run any. > > Now I corrected it and see that tests are failing. > > I ran the program BlastEcho.java manually on the blast test files and > > on the ncbi blast. > > Judging after manually curation it worked good but at wu_blast id did > > not parse the query length. > > The reason is that in wu_blast the query length line has just 8 spaces > > at the beginning instead of 9. > > So I corrected the line which identifies the querylength at > > org.biojava.bio.program.sax.BlastSaxParser > > line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { > > > > Now it works also on wu_blast. > > It would be now a good idea to update the blast tests regarding the > > number of arguments and see if the fail still. > > > > Thanks in advance, > > Michael > > > > > > > > On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: > >> Hi Michael, > >> > >> I just had a look at your patch for the query length. > >> Several of the unit tests are now failing at > >> > > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase > > .java:143) > >> The problem is that most blast related unit tests extend the SSBindCase, > >> which expects a fixed number of attributes. With the new patch some of the > >> blast-flavors have the additional queryLength attribute. > >> > >> Could you have a look at the behaviour of the parser for some of the files > >> where the tests now fail? If you think the new behaviour of the > >> parser is correct, we can simply update the tests to accept the different > >> number of attributes. > >> > >> Thanks, > >> Andreas > >> > >> > >> -------------------------------------------------- > >> > >> Andreas Prlic Wellcome Trust Sanger Institute > >> Hinxton, Cambridge CB10 1SA, UK > >> > >> > >> > >> > >> -- > >> The Wellcome Trust Sanger Institute is operated by Genome Research > >> Limited, a charity registered in England with number 1021457 and a > >> company registered in England with number 2742969, whose registered > >> office is 215 Euston Road, London, NW1 2BE. > >> > > > > > > ------------------------------ > > > > Message: 2 > > Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) > > From: "Richard Holland" > > Subject: Re: [Biojava-dev] Error while reading byte data for creating > > a Trace. > > To: "Andy Yates" > > Cc: biojava-l at biojava.org, biojava-dev at biojava.org, > > abhi232 at cc.gatech.edu > > Message-ID: <50442.80.42.95.78.1199707315.squirrel at webmail.ebi.ac.uk> > > Content-Type: text/plain;charset=iso-8859-1 > > > > This problem was resolved back in November. For some reason during the > > last couple of weeks the BioJava mailing list has been sending out > > occasional duplicate copies of emails sent several months ago! This was > > one of them. > > > > cheers, > > Richard > > > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: > >> Hi, > >> > >> As far as I am aware there isn't a problem with the current ABI parser > >> however if you could send a code snippit of reading in the byte array > >> & the stack trace of the index out of bounds exception that would be > >> most helpful > >> > >> Andy > >> > >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > >> > >>> Hi all, > >>> I am having a byte array which is having the data from an .ab1 > >>> file.The > >>> biojava library provides a class called as ABITrace which takes as > >>> input > >>> either a byte[] array , a file or a url.If i use the later > >>> parameters (the > >>> file or the url )the program works but if I pass the byte array to the > >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a > >>> problem with the ABITrace class or how can I bypass this particular > >>> error. > >>> I am printing the length of the byte array and it comes to > >>> 144930...Can > >>> that cause a problem in my code? > >>> > >>> Thanks in advance. > >>> Abhinav > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > > > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHhP+44C5LeMEKA/QRAroBAJ9oU/by7joNIkdpkOoEtPFzcP+6ZwCfcVYV > YpEnZK4o2READMXOsaE9oMo= > =QZDP > > -----END PGP SIGNATURE----- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From michaelgang at gmail.com Thu Jan 10 12:44:49 2008 From: michaelgang at gmail.com (Michael Gang) Date: Thu, 10 Jan 2008 14:44:49 +0200 Subject: [Biojava-dev] problem with blast parser Message-ID: <6994d82b0801100444u653d412ara835691fe316ae2a@mail.gmail.com> Hi All, I've observed the following pronlem with the blast parser. When parsing a blast with more than one query, it skips a part of the queries. When I wanted to understand what the problem is I got to the following conclusion.(There are good chances that I am wrong) When the BlastSaxParser in the function hitsSectionReached calls the line: oHits.parse(oContents,poLine,"Database:");. It just get's back when it get to the line :Database= but this is in the middle of the next query (the query began with the line "BLASTX 2.2.16 [Mar-25-2007]". I also tested it with the blastecho program Did someone observed the same problem. Can someone help me in this issue ? Thanks in Advance, Michael From holland at ebi.ac.uk Thu Jan 10 12:50:21 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 10 Jan 2008 12:50:21 +0000 Subject: [Biojava-dev] problem with blast parser In-Reply-To: <6994d82b0801100444u653d412ara835691fe316ae2a@mail.gmail.com> References: <6994d82b0801100444u653d412ara835691fe316ae2a@mail.gmail.com> Message-ID: <4786148D.5010003@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello. This is related to your previous email about it not being able to read all data (e.g. query length). I believe Andreas Prlic (copied on this) is looking into this, and some other people might be as well. One of them will get back to you. cheers, Richard Michael Gang wrote: > Hi All, > I've observed the following pronlem with the blast parser. > When parsing a blast with more than one query, it skips a part of the queries. > When I wanted to understand what the problem is I got to the following > conclusion.(There are good chances that I am wrong) > When the BlastSaxParser in the function hitsSectionReached calls the > line: oHits.parse(oContents,poLine,"Database:");. > It just get's back when it get to the line :Database= but this is in > the middle of the next query (the query began with the line "BLASTX > 2.2.16 [Mar-25-2007]". > I also tested it with the blastecho program > Did someone observed the same problem. > Can someone help me in this issue ? > > Thanks in Advance, > Michael > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhhSM4C5LeMEKA/QRAvYuAJ9r2Tkx7DzSmgAZ6sfLEnFewmSfNgCfWpvf ZL5W/VHtWS6vDZe00Yc1MoY= =4Qbs -----END PGP SIGNATURE----- From ap3 at sanger.ac.uk Sun Jan 13 13:44:42 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Sun, 13 Jan 2008 13:44:42 +0000 Subject: [Biojava-dev] anonymous svn Message-ID: Hi, seems the anonymous svn access is making progress, but I need to do some admin work on the svn repository in the next hours: The decision from open-bio was to move the svn repository data store from berkely db to fsfs, which will make the replication of the developer repository onto the server that will provide the anonymous access much more smoothly. so checkout and commits will not work for 1 or 2 hours now, but should be fine again afterwards. >> http://svnbook.red-bean.com/en/1.4/svn- >> book.html#svn.reposadmin.basics.backends >> >> http://subversion.tigris.org/faq.html#bdb-fsfs-convert Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From bugzilla-daemon at portal.open-bio.org Mon Jan 14 03:31:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 13 Jan 2008 22:31:53 -0500 Subject: [Biojava-dev] [Bug 2434] New: bio.seq.io.UniProtFormat error? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2434 Summary: bio.seq.io.UniProtFormat error? Product: BioJava Version: 1.5 Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: seq.io AssignedTo: biojava-dev at biojava.org ReportedBy: lisujun at gmail.com Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at org.biojavax.bio.seq.io.RichStreamReader.nextSequence(RichStreamReader.java:92) at DeleteHighAbundance.getDescription(DeleteHighAbundance.java:41) at DeleteHighAbundance.main(DeleteHighAbundance.java:47) Caused by: org.biojava.bio.seq.io.ParseException: A Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.UniProtFormat Accession=null Id= Comments=Bad ID line Parse_block=ID IPI00000001.1 IPI; PRT; 577 AA. Stack trace follows .... at org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:286) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 3 more ======== when i used the RichSequence.IOTools.readUniprot to read the IPI DAT(Uniprot format), this error happened. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ap3 at sanger.ac.uk Mon Jan 14 17:26:30 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 14 Jan 2008 17:26:30 +0000 Subject: [Biojava-dev] biojava svn migration complete Message-ID: <1B7EE4F0-142D-47BC-8143-677627AAC1AC@sanger.ac.uk> Hi! The BioJava SVN migration has been completed. Thanks a lot to everyone who has made contributions to this!! The new anonymous checkout of BioJava is now possible via svn co svn://code.open-bio.org/biojava/biojava-live/trunk biojava-live Developers can obtain a checkout from svn co svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/ biojava-live/trunk/ ./biojava-live and it is possible to browse the repository online at http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/ trunk Also the automated builds have been updated http://www.spice-3d.org/cruise/ see http://biojava.org/wiki/CVS_to_SVN_Migration for more details. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jcope at cableone.net Tue Jan 15 16:27:38 2008 From: jcope at cableone.net (Jeff Cope) Date: Tue, 15 Jan 2008 09:27:38 -0700 Subject: [Biojava-dev] BioJava Development In-Reply-To: <93b45ca50801092304p6c97d4adve40f1f83959b2c32@mail.gmail.com> References: <000801c852d8$966eb370$6402a8c0@roadrunner> <4784FFB9.1050402@ebi.ac.uk> <93b45ca50801092304p6c97d4adve40f1f83959b2c32@mail.gmail.com> Message-ID: <000001c85793$8b8c83f0$6402a8c0@roadrunner> Hi Richard and Mark, Thanks for the quick response, and help in pointing out where the new classes should go. Here are the classes I wanted to propose and where I thought they should be. Due time constraints, this would probably be something that I could put into BioJava 2, and would be willing to translate to BioJava 3 when time came around. As Mark had suggested it appears that the org.biojava.bio.proteomics package would be the ideal area for the added classes, so I would be interpreting functionality for most of the classes from the BioPython library. AromaticityCalc.java - Returns the sum of the percentage of the AA sequence that is made up of the AA's Tyrosine, Tryptophan, and Phenylalanine. GRAVYCalc.java (Grand Average of Hydropathy) - Returns the average sum of hydropathy of the AA sequence InstabilityIndexCalc.java - Returns an estimated stability for a protein sequence (<= 40 is considered stable) AACompositionCalc.java - Breaks down protein sequences, and returns the percent that individual AA's make up the sequence. The functionality for Aromaticity, GRAVY, and Instability Index would all come from the biopython library (biopython-1.43/Bio/SeqUtils/ProtParam.py), and AAComposition is something that we worked together that was useful for our application. Most of the classes are pretty simple for the most part, I think the most difficult part would be figuring out how the strings of characters are translated to protein sequences in BioJava. Anyway, Let me know if this sounds like it would be useful... Thanks, Jeff -----Original Message----- From: Mark Schreiber [mailto:markjschreiber at gmail.com] Sent: Thursday, January 10, 2008 12:04 AM To: Richard Holland Cc: Jeff Cope; biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] BioJava Development Hi Jeff - I think the type of functionality you describe would be best placed in the proteomic package where there is already some stuff for calculating MW and PI. Probably the reorganisation for BJ3 won't mean much change for your code as these types of algorithms can usually be readily ported to any new system. - Mark On Jan 10, 2008 1:09 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Jeff. > > Thanks for volunteering! Sounds like you've got a useful set of tools > which we would definitely appreciate as new BioJava features. > > We're currently planning a Big Reorganisation for a new BioJava 3 from > the ground-up. Details will be published some time in February I expect, > if not then early March. See this Wiki for things that are currently > being considered (most will make it into the plan, some may not, but I > won't know until I've written it up and identified the conflicting areas): > > http://www.biojava.org/wiki/BioJava3_Proposal > > (also see the associated Discussion page for further comments) > > If I were you, I'd hang on until the final plan is published. It will > contain everything you need to know on how to write modules for the new > BioJava 3. > > However, if you're in a rush to get it into the current BioJava 2 > release, then take a look round the JavaDocs to see what is present and > what is not. You'll soon get a good idea of how things are organised and > what features are absent. We also have a bugzilla page with some > unresolved bugs - always a good starting point to learn how a system > works and where the opportunities for development are! > http://bugzilla.open-bio.org/buglist.cgi?product=BioJava&bug_status=NEW&bug_ status=ASSIGNED&bug_status=REOPENED > > cheers, > Richard > > > Jeff Cope wrote: > > Hi, > > > > My name is Jeff Cope, and I'm currently working on a bioinformatics > > project for a professor at BSU. What we have so far is mostly in java, but > > with the data calculations taking place using functionality found in the > > BioPython library (Molecular Weight, Instability Index, Isoelectric Point, > > Aromaticity, GRAVY, etc...). Anywho, currently we are only looking at > > protein sequences, and thought that we could help you out on the BioJava > > project by seeing if that functionality could be added into your library > > instead... > > > > So I guess my question is, now that I'm signed up on the developers > > mail list, what would I need to do to get started (assuming you want my > > help), and what kind of programming ground rules do you have... > > > > I feel pretty comfortable in Java code, and if you would like to see > > an example of my source code, and some of the work I've done so far, you can > > find it here: > > Current project: > > http://trac.boisestate.edu/protcalc/ > > Source Code: > > http://trac.boisestate.edu/protcalc/src/ > > API docs: > > http://trac.boisestate.edu/protcalc/docs/ > > > > > > Thanks, > > Jeff Cope > > > > -----Original Message----- > > From: biojava-dev-bounces at lists.open-bio.org > > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > > biojava-dev-request at lists.open-bio.org > > Sent: Tuesday, January 08, 2008 1:52 AM > > To: biojava-dev at lists.open-bio.org > > Subject: biojava-dev Digest, Vol 59, Issue 2 > > > > Send biojava-dev mailing list submissions to > > biojava-dev at lists.open-bio.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > or, via email, send a message with subject or body 'help' to > > biojava-dev-request at lists.open-bio.org > > > > You can reach the person managing the list at > > biojava-dev-owner at lists.open-bio.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of biojava-dev digest..." > > > > > > Today's Topics: > > > > 1. Fwd: bioperl like blastparser (Michael Gang) > > 2. Re: Error while reading byte data for creating a Trace. > > (Richard Holland) > > 3. Re: Error while reading byte data for creating a Trace. > > (Andy Yates) > > 4. Re: bioperl like blastparser (Andreas Prlic) > > 5. Re: JUnit (Michael Heuer) > > 6. Re: JUnit (Mark Schreiber) > > 7. Re: JUnit (Michael Heuer) > > 8. read fasta file (Michael Gang) > > 9. Re: read fasta file (Richard Holland) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Mon, 7 Jan 2008 12:16:22 +0200 > > From: "Michael Gang" > > Subject: [Biojava-dev] Fwd: bioperl like blastparser > > To: biojava-dev at biojava.org > > Message-ID: > > <6994d82b0801070216q1df26e72ic131592048100f3f at mail.gmail.com> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Hi Andreas, > > > > You are correct. > > The junit.jar library was missing in my ant_home. > > Eclipse wrote that it was running the tests, but did not run any. > > Now I corrected it and see that tests are failing. > > I ran the program BlastEcho.java manually on the blast test files and > > on the ncbi blast. > > Judging after manually curation it worked good but at wu_blast id did > > not parse the query length. > > The reason is that in wu_blast the query length line has just 8 spaces > > at the beginning instead of 9. > > So I corrected the line which identifies the querylength at > > org.biojava.bio.program.sax.BlastSaxParser > > line 68 to: if (poLine.matches("^\\s+\\(\\d+\\sletters\\)\\s*$")) { > > > > Now it works also on wu_blast. > > It would be now a good idea to update the blast tests regarding the > > number of arguments and see if the fail still. > > > > Thanks in advance, > > Michael > > > > > > > > On Jan 6, 2008 2:41 PM, Andreas Prlic wrote: > >> Hi Michael, > >> > >> I just had a look at your patch for the query length. > >> Several of the unit tests are now failing at > >> > > org.biojava.bio.program.ssbind.SSBindCase.testResultGetAnnotation(SSBindCase > > .java:143) > >> The problem is that most blast related unit tests extend the SSBindCase, > >> which expects a fixed number of attributes. With the new patch some of the > >> blast-flavors have the additional queryLength attribute. > >> > >> Could you have a look at the behaviour of the parser for some of the files > >> where the tests now fail? If you think the new behaviour of the > >> parser is correct, we can simply update the tests to accept the different > >> number of attributes. > >> > >> Thanks, > >> Andreas > >> > >> > >> -------------------------------------------------- > >> > >> Andreas Prlic Wellcome Trust Sanger Institute > >> Hinxton, Cambridge CB10 1SA, UK > >> > >> > >> > >> > >> -- > >> The Wellcome Trust Sanger Institute is operated by Genome Research > >> Limited, a charity registered in England with number 1021457 and a > >> company registered in England with number 2742969, whose registered > >> office is 215 Euston Road, London, NW1 2BE. > >> > > > > > > ------------------------------ > > > > Message: 2 > > Date: Mon, 7 Jan 2008 12:01:55 -0000 (GMT) > > From: "Richard Holland" > > Subject: Re: [Biojava-dev] Error while reading byte data for creating > > a Trace. > > To: "Andy Yates" > > Cc: biojava-l at biojava.org, biojava-dev at biojava.org, > > abhi232 at cc.gatech.edu > > Message-ID: <50442.80.42.95.78.1199707315.squirrel at webmail.ebi.ac.uk> > > Content-Type: text/plain;charset=iso-8859-1 > > > > This problem was resolved back in November. For some reason during the > > last couple of weeks the BioJava mailing list has been sending out > > occasional duplicate copies of emails sent several months ago! This was > > one of them. > > > > cheers, > > Richard > > > > On Mon, January 7, 2008 9:34 am, Andy Yates wrote: > >> Hi, > >> > >> As far as I am aware there isn't a problem with the current ABI parser > >> however if you could send a code snippit of reading in the byte array > >> & the stack trace of the index out of bounds exception that would be > >> most helpful > >> > >> Andy > >> > >> On 5 Nov 2007, at 17:59, abhi232 at cc.gatech.edu wrote: > >> > >>> Hi all, > >>> I am having a byte array which is having the data from an .ab1 > >>> file.The > >>> biojava library provides a class called as ABITrace which takes as > >>> input > >>> either a byte[] array , a file or a url.If i use the later > >>> parameters (the > >>> file or the url )the program works but if I pass the byte array to the > >>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a > >>> problem with the ABITrace class or how can I bypass this particular > >>> error. > >>> I am printing the length of the byte array and it comes to > >>> 144930...Can > >>> that cause a problem in my code? > >>> > >>> Thanks in advance. > >>> Abhinav > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > > > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHhP+44C5LeMEKA/QRAroBAJ9oU/by7joNIkdpkOoEtPFzcP+6ZwCfcVYV > YpEnZK4o2READMXOsaE9oMo= > =QZDP > > -----END PGP SIGNATURE----- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From bugzilla-daemon at portal.open-bio.org Wed Jan 16 15:16:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jan 2008 10:16:30 -0500 Subject: [Biojava-dev] [Bug 2435] New: Mistake in createRecord( ) of GFF3Parser Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2435 Summary: Mistake in createRecord( ) of GFF3Parser Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: P3 Component: bio AssignedTo: biojava-dev at biojava.org ReportedBy: pudimat at gmail.com CC: pudimat at gmail.com When setting the fields in a new GFF3Record, the source field is set twice, however the second time it is set to the GFF type value. Thus, a record has its type in the source field, and the type field has value "any". See line: 256 in GFF3Parser -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 16 17:07:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jan 2008 12:07:36 -0500 Subject: [Biojava-dev] [Bug 2435] Mistake in createRecord( ) of GFF3Parser In-Reply-To: Message-ID: <200801161707.m0GH7a8C005347@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2435 ------- Comment #1 from pudimat at gmail.com 2008-01-16 12:07 EST ------- Another error in line 344 (method parseAttribute() ): when splitting the key-value-pair of an attribute, the separating '=' is the first symbol of the attribute value. Reason: attValList = attVal.substring(spaceIndex).trim(); must be changed to attValList = attVal.substring(spaceIndex+1).trim(); -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 17 12:36:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 17 Jan 2008 07:36:06 -0500 Subject: [Biojava-dev] [Bug 2435] Mistake in createRecord( ) of GFF3Parser In-Reply-To: Message-ID: <200801171236.m0HCa6eE030734@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2435 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from holland at ebi.ac.uk 2008-01-17 07:36 EST ------- I have fixed this in the new subversion repository. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 20 07:17:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jan 2008 02:17:42 -0500 Subject: [Biojava-dev] [Bug 2360] saving of ProfileHmm cause NullPointerException In-Reply-To: Message-ID: <200801200717.m0K7Hg0H009027@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2360 mark.schreiber at novartis.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mark.schreiber at novartis.com 2008-01-20 02:17 EST ------- Bug was caused by serialization of an untrained distribution with no weights. Now fixed with Unit test -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 20 07:26:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jan 2008 02:26:00 -0500 Subject: [Biojava-dev] [Bug 2371] ChromatogramFactory.create fails on Windows In-Reply-To: Message-ID: <200801200726.m0K7Q013009324@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2371 mark.schreiber at novartis.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WORKSFORME ------- Comment #1 from mark.schreiber at novartis.com 2008-01-20 02:25 EST ------- This works using biojava 1.6 RC-1 on Windows Vista. I can't replicate the error using your chromatogram. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 20 08:28:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jan 2008 03:28:33 -0500 Subject: [Biojava-dev] [Bug 2164] Restriction Mapper - Thread (or dual core cpu) problem In-Reply-To: Message-ID: <200801200828.m0K8SXrV011671@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2164 ------- Comment #10 from mark.schreiber at novartis.com 2008-01-20 03:28 EST ------- As suggested I have added back the synchronized blockas as they are certainly needed even though this doesn't fully solve the problem. Interestingly on windows vista on a dual core CPU a race condition develops and never seems to resolve (not even a stack trace)! Notably this was on a short sequence not one likely to be a PackedSymbolList as mentioned below. I wonder if there is a problem with the SimpleThreadPool. Maybe a switch to a normal Java thread pool might be better? (In reply to comment #9) > Here are the comments at the top of org.biojava.bio.symbol.PackedSymbolList, > and I quote: > "WARNING: these variables constitute an opportunity > for things to go wrong when doing multithreaded access > via symbolAt(). Keep SymbolAt() synchronized so they > don't get changed during a lookup! Naaasssty." -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 21 09:47:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jan 2008 04:47:37 -0500 Subject: [Biojava-dev] [Bug 2164] Restriction Mapper - Thread (or dual core cpu) problem In-Reply-To: Message-ID: <200801210947.m0L9lbY5028968@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2164 ------- Comment #11 from andyyatz at gmail.com 2008-01-21 04:47 EST ------- If we're looking at Java5+ features here then maybe something like: http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/locks/ReadWriteLock.html This is a far superior solution to synchronized blocks as they offer a difference between reading something & altering it. The blocks can last for as little or as long as required with us only needing to make sure that we perform the code in a try {} finally {} block to ensure we do not continually lock out a resource. (In reply to comment #10) > As suggested I have added back the synchronized blockas as they are certainly > needed even though this doesn't fully solve the problem. Interestingly on > windows vista on a dual core CPU a race condition develops and never seems to > resolve (not even a stack trace)! Notably this was on a short sequence not one > likely to be a PackedSymbolList as mentioned below. > > I wonder if there is a problem with the SimpleThreadPool. Maybe a switch to a > normal Java thread pool might be better? > > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 21 19:15:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jan 2008 14:15:24 -0500 Subject: [Biojava-dev] [Bug 2164] Restriction Mapper - Thread (or dual core cpu) problem In-Reply-To: Message-ID: <200801211915.m0LJFOcP031316@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2164 ------- Comment #12 from gwaldon at geneinfinity.org 2008-01-21 14:15 EST ------- I have seen a similar problem (at least producing similar log) and it was solved by adding pool.stopThreads() at the end of your code: SequenceIterator iter = SeqIOTools.readFastaDNA(br); SimpleThreadPool pool = new SimpleThreadPool(); RestrictionMapper mapper = new RestrictionMapper(pool); mapper.addEnzyme(RestrictionEnzymeManager.getEnzyme("MseI")); mapper.addEnzyme(RestrictionEnzymeManager.getEnzyme("HpaII")); mapper.addEnzyme(RestrictionEnzymeManager.getEnzyme("AluI")); Sequence seq; while(iter.hasNext()){ seq = iter.nextSequence(); mapper.annotate(seq); } pool.stopThreads(); Hope it helps. - George -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ap3 at sanger.ac.uk Tue Jan 22 08:25:49 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 22 Jan 2008 08:25:49 +0000 Subject: [Biojava-dev] biojava looking for maintainers Message-ID: <20765D52-2C0E-40C6-A769-7D13CF1DB489@sanger.ac.uk> Hi, BioJava is a widely used Java library that provides standard APIs, parsers, and solutions for common bioinformatics problems. It is used in a number of applications and referenced in many scientific publications. See here for an overview of these: http://biojava.org/wiki/ BioJava:BioJavaInside In order to continue our first class efforts to serve the community and to further the quality of our source code we are looking for motivated individuals who want to claim responsibility for some of the core libraries and take over maintenance of these. Please see here for a list of modules for which we are currently looking for maintainers: http://biojava.org/wiki/Maintainers_wanted If you want to become a BioJava maintainer, please post to the biojava-dev mailing list. As always - happy biojavaing, Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From felipe.albrecht at gmail.com Wed Jan 23 06:58:04 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Wed, 23 Jan 2008 04:58:04 -0200 Subject: [Biojava-dev] Pairwise Alignment methods Message-ID: Hello all, I have a simple question about pairwise alignment classes (SmithWaterman and NeedlemanWunsch): Why it is necessary two Sequence for alignment and not two SymbolList? Example, I have a SymbolList collection to align between then, by this way I need to create some "dummies" Sequence for to do the alignment. Reading the source, I saw that the unique field that is exclusive to Sequence is the name, for the alignment output, but if I need only the alignment result, it is useless. It is not possible to override the pairwiseAlignment to accept SymbolList or may be a new method that the parameters are 2 SymbolList and returns the alignment score? Thank you Felipe Albrecht From markjschreiber at gmail.com Wed Jan 23 08:50:05 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 23 Jan 2008 16:50:05 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: Message-ID: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> Hi Felipe - I agree this is a barrier to ease of use. Even if Sequences are required internally for some obscure reason there is no reason why dummy Sequences cannot be made inside the aligner. These sequences could be given names like 'query' and 'subject' or even 'seq1' and 'seq2'. I will take a look at adding some methods. Best regards, - Mark On Jan 23, 2008 2:58 PM, Felipe Albrecht wrote: > Hello all, > > I have a simple question about pairwise alignment classes (SmithWaterman and > NeedlemanWunsch): > Why it is necessary two Sequence for alignment and not two SymbolList? > > Example, I have a SymbolList collection to align between then, > by this way I need to create some "dummies" Sequence for to do the > alignment. > > Reading the source, I saw that the unique field that is exclusive to > Sequence is the name, for the alignment output, > but if I need only the alignment result, it is useless. > > It is not possible to override the pairwiseAlignment to accept SymbolList or > may be a new method that the parameters are 2 SymbolList and returns the > alignment score? > > Thank you > > Felipe Albrecht > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From dankoc at gmail.com Wed Jan 23 21:56:19 2008 From: dankoc at gmail.com (Charles Danko) Date: Wed, 23 Jan 2008 16:56:19 -0500 Subject: [Biojava-dev] Direct access to public genome databases Message-ID: <8adccabf0801231356o16c51b55s43b3637459277f66@mail.gmail.com> Hello, Direct access to public genome databases (i.e. a class to import sequence, annotations, etc. and create the applicable biojava object) would be a very useful addition to BioJava. The Ensj project doesn't look like it has been updated since official support was dropped. Are there any plans to work these features into BioJava? Have I missed features that already exist? Depending on the amount of time required, I may be willing to contribute to such an endeavor -- particularly for the purpose of importing sequence. I have quite a bit of experience working with java, but not much in a collaborative environment. Best, Charles From heuermh at acm.org Thu Jan 24 01:06:53 2008 From: heuermh at acm.org (Michael Heuer) Date: Wed, 23 Jan 2008 20:06:53 -0500 (EST) Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: <8adccabf0801231356o16c51b55s43b3637459277f66@mail.gmail.com> Message-ID: Charles Danko wrote: > Hello, > > Direct access to public genome databases (i.e. a class to import > sequence, annotations, etc. and create the applicable biojava object) > would be a very useful addition to BioJava. The Ensj project doesn't > look like it has been updated since official support was dropped. Are > there any plans to work these features into BioJava? Have I missed > features that already exist? > > Depending on the amount of time required, I may be willing to > contribute to such an endeavor -- particularly for the purpose of > importing sequence. I have quite a bit of experience working with > java, but not much in a collaborative environment. What sort of client API would you have in mind? I'm not a Taverna expert, but it seems to me that access to third-party data resources is already well covered with the web services available there. > http://taverna.sourceforge.net/ > http://www.mygrid.org.uk/wiki/Mygrid/BiologicalWebServices Or simply call the web services directly. michael From markjschreiber at gmail.com Thu Jan 24 04:00:27 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 23 Jan 2008 23:00:27 -0500 Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: References: <8adccabf0801231356o16c51b55s43b3637459277f66@mail.gmail.com> Message-ID: <93b45ca50801232000p1cfd83c0o47f7b714abfd2d01@mail.gmail.com> Hi - >From personal experience you can access data from NCBI and KEGG using their webservice API's and (depending on the result) use the biojava parsers to return biojava objects. The question is, should this activity be wrapped into biojava (ie should the webservice stuff be inside biojava)? Pros: Happens behind the scenes, users don't need to know about WS, possible uniform interface to several sources Cons: Lots more dependencies on WS jars (take a look at JAX-WS for example) and WS client jars I'm interested in hearing more pros and cons from other people. This is timely given the upcoming webservices meet up in Tokyo. Best regards, - Mark On Jan 23, 2008 8:06 PM, Michael Heuer wrote: > Charles Danko wrote: > > > Hello, > > > > Direct access to public genome databases (i.e. a class to import > > sequence, annotations, etc. and create the applicable biojava object) > > would be a very useful addition to BioJava. The Ensj project doesn't > > look like it has been updated since official support was dropped. Are > > there any plans to work these features into BioJava? Have I missed > > features that already exist? > > > > Depending on the amount of time required, I may be willing to > > contribute to such an endeavor -- particularly for the purpose of > > importing sequence. I have quite a bit of experience working with > > java, but not much in a collaborative environment. > > What sort of client API would you have in mind? > > > I'm not a Taverna expert, but it seems to me that access to third-party > data resources is already well covered with the web services available > there. > > > http://taverna.sourceforge.net/ > > http://www.mygrid.org.uk/wiki/Mygrid/BiologicalWebServices > > Or simply call the web services directly. > > michael > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Thu Jan 24 07:50:32 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 24 Jan 2008 15:50:32 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> Message-ID: <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> Hi Felipe - Thanks for the input on this. As a general rule the GC should never be called from code. Generally this degrades performance of the JVM. Unless there is a very good reason I will remove this. Probably you are right a method parameter may work better. - Mark On Jan 24, 2008 1:47 PM, Felipe Albrecht wrote: > Hello, > > I think that it can be solved by a simple way: > Implement (or just copy and cut) a pairwiseAlignment utilizing SymboList as > parameters and do no creating a alignment, just the calculating it and > returning the value. > > Another thing that is a bit stange for me, is the utilization of garbage > collector direcly, that is: The field "scoreMatrix" is a class field, why at > the end of pairwiseAlignment it is set to null and the garbage collector > run? It is not better (and simpler) to use scoreMatrix as method variable? > > I'm annexing the class code with my changes that is doing well the (4^8) * > (4^8) SymbolList pairwise alignments that I am needing :-) > > Thank you, > > Felipe Albrecht > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber wrote: > > Hi Felipe - > > > > I agree this is a barrier to ease of use. Even if Sequences are > > required internally for some obscure reason there is no reason why > > dummy Sequences cannot be made inside the aligner. These sequences > > could be given names like 'query' and 'subject' or even 'seq1' and > > 'seq2'. > > > > I will take a look at adding some methods. > > > > Best regards, > > > > - Mark > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > wrote: > > > Hello all, > > > > > > I have a simple question about pairwise alignment classes (SmithWaterman > and > > > NeedlemanWunsch): > > > Why it is necessary two Sequence for alignment and not two SymbolList? > > > > > > Example, I have a SymbolList collection to align between then, > > > by this way I need to create some "dummies" Sequence for to do the > > > alignment. > > > > > > Reading the source, I saw that the unique field that is exclusive to > > > Sequence is the name, for the alignment output, > > > but if I need only the alignment result, it is useless. > > > > > > It is not possible to override the pairwiseAlignment to accept > SymbolList or > > > may be a new method that the parameters are 2 SymbolList and returns the > > > alignment score? > > > > > > Thank you > > > > > > Felipe Albrecht > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > From ap3 at sanger.ac.uk Thu Jan 24 08:40:39 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu, 24 Jan 2008 08:40:39 +0000 (GMT) Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: References: Message-ID: Hi, >> Direct access to public genome databases (i.e. a class to import >> sequence, annotations, etc. and create the applicable biojava object) >> would be a very useful addition to BioJava. The Ensj project doesn't >> look like it has been updated since official support was dropped. Are >> there any plans to work these features into BioJava? Have I missed >> features that already exist? >> >> Depending on the amount of time required, I may be willing to >> contribute to such an endeavor -- particularly for the purpose of >> importing sequence. I have quite a bit of experience working with >> java, but not much in a collaborative environment. > Ensembl provides access to more and more of its data via DAS, the Distributed Annotation System. DAS is a RESTful protocol to access data from distributed sites over the internet. http://www.ensembl.org/info/using/external_data/das/ensembl_das.html it is quite heavily used and to see a list of available DAS services see http://www.dasregistry.org Andreas -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From bugzilla-daemon at portal.open-bio.org Thu Jan 24 12:22:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 24 Jan 2008 07:22:41 -0500 Subject: [Biojava-dev] [Bug 2164] Restriction Mapper - Thread (or dual core cpu) problem In-Reply-To: Message-ID: <200801241222.m0OCMfZ2026239@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2164 ------- Comment #13 from mark.schreiber at novartis.com 2008-01-24 07:22 EST ------- Using the stopThread() method prevents the race condition. The other option is to make the threads deamon threads which terminate when all other threads are finished. I have tested this on a dual core machine with windows vista on a 1 million base pair sequence and there is no problem. Unless someone can determine this bug remains on other operating systems I will close the bug. Please use the current version of biojava from SVN for testing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From felipe.albrecht at gmail.com Thu Jan 24 13:05:39 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Thu, 24 Jan 2008 11:05:39 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> Message-ID: If you prefer, I can send a diff and should I do the same thing in SequenceAlignment and NeedlemanWunsch classes? Thank you, Felipe Albrecht On Jan 24, 2008 5:50 AM, Mark Schreiber wrote: > Hi Felipe - > > Thanks for the input on this. As a general rule the GC should never be > called from code. Generally this degrades performance of the JVM. > Unless there is a very good reason I will remove this. Probably you > are right a method parameter may work better. > > - Mark > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > wrote: > > Hello, > > > > I think that it can be solved by a simple way: > > Implement (or just copy and cut) a pairwiseAlignment utilizing SymboList > as > > parameters and do no creating a alignment, just the calculating it and > > returning the value. > > > > Another thing that is a bit stange for me, is the utilization of garbage > > collector direcly, that is: The field "scoreMatrix" is a class field, > why at > > the end of pairwiseAlignment it is set to null and the garbage collector > > run? It is not better (and simpler) to use scoreMatrix as method > variable? > > > > I'm annexing the class code with my changes that is doing well the (4^8) > * > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > Thank you, > > > > Felipe Albrecht > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber > wrote: > > > Hi Felipe - > > > > > > I agree this is a barrier to ease of use. Even if Sequences are > > > required internally for some obscure reason there is no reason why > > > dummy Sequences cannot be made inside the aligner. These sequences > > > could be given names like 'query' and 'subject' or even 'seq1' and > > > 'seq2'. > > > > > > I will take a look at adding some methods. > > > > > > Best regards, > > > > > > - Mark > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > wrote: > > > > Hello all, > > > > > > > > I have a simple question about pairwise alignment classes > (SmithWaterman > > and > > > > NeedlemanWunsch): > > > > Why it is necessary two Sequence for alignment and not two > SymbolList? > > > > > > > > Example, I have a SymbolList collection to align between then, > > > > by this way I need to create some "dummies" Sequence for to do the > > > > alignment. > > > > > > > > Reading the source, I saw that the unique field that is exclusive to > > > > Sequence is the name, for the alignment output, > > > > but if I need only the alignment result, it is useless. > > > > > > > > It is not possible to override the pairwiseAlignment to accept > > SymbolList or > > > > may be a new method that the parameters are 2 SymbolList and returns > the > > > > alignment score? > > > > > > > > Thank you > > > > > > > > Felipe Albrecht > > > > _______________________________________________ > > > > biojava-dev mailing list > > > > biojava-dev at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > From markjschreiber at gmail.com Thu Jan 24 13:35:43 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 24 Jan 2008 21:35:43 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> Message-ID: <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> Hi - I have just commited changes that let you use SymbolLists in all parts of the NW and SW SequenceAlignment objects. As you suggested I made the matrix a method local variable. I also removed calls to the garbage collector. This can be checked out from SVN. - Mark On Jan 24, 2008 9:05 PM, Felipe Albrecht wrote: > If you prefer, I can send a diff and should I do the same thing in > SequenceAlignment and NeedlemanWunsch classes? > > Thank you, > > Felipe Albrecht > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com> wrote: > > Hi Felipe - > > > > Thanks for the input on this. As a general rule the GC should never be > > called from code. Generally this degrades performance of the JVM. > > Unless there is a very good reason I will remove this. Probably you > > are right a method parameter may work better. > > > > - Mark > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > wrote: > > > Hello, > > > > > > > > > > > > I think that it can be solved by a simple way: > > > Implement (or just copy and cut) a pairwiseAlignment utilizing SymboList > as > > > parameters and do no creating a alignment, just the calculating it and > > > returning the value. > > > > > > Another thing that is a bit stange for me, is the utilization of garbage > > > collector direcly, that is: The field "scoreMatrix" is a class field, > why at > > > the end of pairwiseAlignment it is set to null and the garbage collector > > > run? It is not better (and simpler) to use scoreMatrix as method > variable? > > > > > > I'm annexing the class code with my changes that is doing well the (4^8) > * > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > Thank you, > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber > wrote: > > > > Hi Felipe - > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences are > > > > required internally for some obscure reason there is no reason why > > > > dummy Sequences cannot be made inside the aligner. These sequences > > > > could be given names like 'query' and 'subject' or even 'seq1' and > > > > 'seq2'. > > > > > > > > I will take a look at adding some methods. > > > > > > > > Best regards, > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > wrote: > > > > > Hello all, > > > > > > > > > > I have a simple question about pairwise alignment classes > (SmithWaterman > > > and > > > > > NeedlemanWunsch): > > > > > Why it is necessary two Sequence for alignment and not two > SymbolList? > > > > > > > > > > Example, I have a SymbolList collection to align between then, > > > > > by this way I need to create some "dummies" Sequence for to do the > > > > > alignment. > > > > > > > > > > Reading the source, I saw that the unique field that is exclusive to > > > > > Sequence is the name, for the alignment output, > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > It is not possible to override the pairwiseAlignment to accept > > > SymbolList or > > > > > may be a new method that the parameters are 2 SymbolList and returns > the > > > > > alignment score? > > > > > > > > > > Thank you > > > > > > > > > > Felipe Albrecht > > > > > _______________________________________________ > > > > > biojava-dev mailing list > > > > > biojava-dev at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > From felipe.albrecht at gmail.com Thu Jan 24 19:40:48 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Thu, 24 Jan 2008 17:40:48 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> Message-ID: Hello, I saw the commit and I think that this solution is not the better. I think it because you are creating internally two Sequence and probably the programmer will not use others alignment information, he will use only the score. Because it, I think that if you have 2 SymbolList, just do the alignment and return the score, as I did.Otherwise, If the programmer want the "visual alignment", he should create externally the SimpleSequences, it is, not the method must do it. IMHO, one [serious] problem in biojava is the memory consumption, it have not "lightweight" classes or methods that do the things quickly. Because it, may be is a good choice to have a method that simply gives the alignment score, and not do the others things, like backtracking. Another think, the cost of the "instanceof" is high. Thank you, Felipe Albrecht On Jan 24, 2008 11:35 AM, Mark Schreiber wrote: > Hi - > > I have just commited changes that let you use SymbolLists in all parts > of the NW and SW SequenceAlignment objects. > > As you suggested I made the matrix a method local variable. I also > removed calls to the garbage collector. > > This can be checked out from SVN. > > - Mark > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > wrote: > > If you prefer, I can send a diff and should I do the same thing in > > SequenceAlignment and NeedlemanWunsch classes? > > > > Thank you, > > > > Felipe Albrecht > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com> > wrote: > > > Hi Felipe - > > > > > > Thanks for the input on this. As a general rule the GC should never be > > > called from code. Generally this degrades performance of the JVM. > > > Unless there is a very good reason I will remove this. Probably you > > > are right a method parameter may work better. > > > > > > - Mark > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > wrote: > > > > Hello, > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > Implement (or just copy and cut) a pairwiseAlignment utilizing > SymboList > > as > > > > parameters and do no creating a alignment, just the calculating it > and > > > > returning the value. > > > > > > > > Another thing that is a bit stange for me, is the utilization of > garbage > > > > collector direcly, that is: The field "scoreMatrix" is a class > field, > > why at > > > > the end of pairwiseAlignment it is set to null and the garbage > collector > > > > run? It is not better (and simpler) to use scoreMatrix as method > > variable? > > > > > > > > I'm annexing the class code with my changes that is doing well the > (4^8) > > * > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > Thank you, > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber > > wrote: > > > > > Hi Felipe - > > > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences are > > > > > required internally for some obscure reason there is no reason why > > > > > dummy Sequences cannot be made inside the aligner. These > sequences > > > > > could be given names like 'query' and 'subject' or even 'seq1' and > > > > > 'seq2'. > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > Best regards, > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht < > felipe.albrecht at gmail.com> > > > > wrote: > > > > > > Hello all, > > > > > > > > > > > > I have a simple question about pairwise alignment classes > > (SmithWaterman > > > > and > > > > > > NeedlemanWunsch): > > > > > > Why it is necessary two Sequence for alignment and not two > > SymbolList? > > > > > > > > > > > > Example, I have a SymbolList collection to align between then, > > > > > > by this way I need to create some "dummies" Sequence for to do > the > > > > > > alignment. > > > > > > > > > > > > Reading the source, I saw that the unique field that is > exclusive to > > > > > > Sequence is the name, for the alignment output, > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to accept > > > > SymbolList or > > > > > > may be a new method that the parameters are 2 SymbolList and > returns > > the > > > > > > alignment score? > > > > > > > > > > > > Thank you > > > > > > > > > > > > Felipe Albrecht > > > > > > _______________________________________________ > > > > > > biojava-dev mailing list > > > > > > biojava-dev at lists.open-bio.org > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > From dankoc at gmail.com Thu Jan 24 20:17:38 2008 From: dankoc at gmail.com (Charles Danko) Date: Thu, 24 Jan 2008 15:17:38 -0500 Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: References: Message-ID: <8adccabf0801241217gfa82d47r2728b08c6bcde862@mail.gmail.com> DAS looks just wonderful, and I am very glad to be made aware of it ? it seems like a much better solution than my initial, highly naive reaction (accessing public SQL connections). As I understand it, the easiest way to access DAS services in Java is via an API such as JAX-WS? Jumping into the Javadocs, and looking over a JAX-WS tutorial that I found here: http://java.sun.com/webservices/docs/2.0/tutorial/doc/ it looks there is a lot to this. In this sense, a BioJava class that that takes care of much of the connecting, data transfer, and parsing would be a welcome convenience for users not already familiar with this API (like me :). Even for those who are well-versed in all of this, such a class would allow DB access with a lot less code. Given the frequency that most of us accesses these public databases, this seems like a worthy goal to me!! Since its so easy to understand the basics of constructing a DAS request URL, even something as simple as a class that takes a pre-formed URL (in the constructor) and acts as an iterator over whatever information is in the result, would be very useful. Best, Charles On Jan 24, 2008 3:40 AM, Andreas Prlic wrote: > Hi, > > >> Direct access to public genome databases (i.e. a class to import > >> sequence, annotations, etc. and create the applicable biojava object) > >> would be a very useful addition to BioJava. The Ensj project doesn't > >> look like it has been updated since official support was dropped. Are > >> there any plans to work these features into BioJava? Have I missed > >> features that already exist? > >> > >> Depending on the amount of time required, I may be willing to > >> contribute to such an endeavor -- particularly for the purpose of > >> importing sequence. I have quite a bit of experience working with > >> java, but not much in a collaborative environment. > > > > Ensembl provides access to more and more of its data via DAS, > the Distributed Annotation System. DAS is a RESTful protocol to > access data from distributed sites over the internet. > > http://www.ensembl.org/info/using/external_data/das/ensembl_das.html > > it is quite heavily used and to see a list of available DAS services > see > http://www.dasregistry.org > > Andreas > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > From markjschreiber at gmail.com Fri Jan 25 01:26:41 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 25 Jan 2008 09:26:41 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> Message-ID: <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> Hi Felipe - I agree your method is more efficient but I think it violates the SequenceAlignment interface which would cause compatibility problems. I also wonder what should happen if a user calls the getAlignment() method if you have only calculated a score. instanceof is potentially expensive but it is nothing compared to actually performing the SmithWaterman. Biojava is somewhat memory heavy but this is largely because it is object oriented. Certainly something in C would be lighter and faster but the whole point in using Java is the relative benefits of object oriented design. While ultra optimized algorithms where once a major feature of bioinformatics this is becoming less necessary as standard desktops are now equivalent to the super computers of 5 years ago. I actually find the SW and NW to be reasonably fast. This is because all the heavy lifting is done in loops that the JVM presumably compiles and executes natively. - Mark On Jan 25, 2008 3:40 AM, Felipe Albrecht wrote: > Hello, > > I saw the commit and I think that this solution is not the better. > I think it because you are creating internally two Sequence and probably the > programmer will not use others alignment information, he will use only the > score. > > Because it, I think that if you have 2 SymbolList, just do the alignment and > return the score, as I did.Otherwise, If the programmer want the "visual > alignment", he should create externally the SimpleSequences, it is, not the > method must do it. > > IMHO, one [serious] problem in biojava is the memory consumption, it have > not "lightweight" classes or methods that do the things quickly. Because it, > may be is a good choice to have a method that simply gives the alignment > score, and not do the others things, like backtracking. Another think, the > cost of the "instanceof" is high. > > Thank you, > > Felipe Albrecht > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber wrote: > > Hi - > > > > I have just commited changes that let you use SymbolLists in all parts > > of the NW and SW SequenceAlignment objects. > > > > As you suggested I made the matrix a method local variable. I also > > removed calls to the garbage collector. > > > > This can be checked out from SVN. > > > > - Mark > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > wrote: > > > If you prefer, I can send a diff and should I do the same thing in > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > Thank you, > > > > > > Felipe Albrecht > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com> > wrote: > > > > Hi Felipe - > > > > > > > > Thanks for the input on this. As a general rule the GC should never be > > > > called from code. Generally this degrades performance of the JVM. > > > > Unless there is a very good reason I will remove this. Probably you > > > > are right a method parameter may work better. > > > > > > > > - Mark > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > wrote: > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > Implement (or just copy and cut) a pairwiseAlignment utilizing > SymboList > > > as > > > > > parameters and do no creating a alignment, just the calculating it > and > > > > > returning the value. > > > > > > > > > > Another thing that is a bit stange for me, is the utilization of > garbage > > > > > collector direcly, that is: The field "scoreMatrix" is a class > field, > > > why at > > > > > the end of pairwiseAlignment it is set to null and the garbage > collector > > > > > run? It is not better (and simpler) to use scoreMatrix as method > > > variable? > > > > > > > > > > I'm annexing the class code with my changes that is doing well the > (4^8) > > > * > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > Thank you, > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < markjschreiber at gmail.com > > > > > wrote: > > > > > > Hi Felipe - > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences are > > > > > > required internally for some obscure reason there is no reason why > > > > > > dummy Sequences cannot be made inside the aligner. These > sequences > > > > > > could be given names like 'query' and 'subject' or even 'seq1' and > > > > > > 'seq2'. > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > > wrote: > > > > > > > Hello all, > > > > > > > > > > > > > > I have a simple question about pairwise alignment classes > > > (SmithWaterman > > > > > and > > > > > > > NeedlemanWunsch): > > > > > > > Why it is necessary two Sequence for alignment and not two > > > SymbolList? > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between then, > > > > > > > by this way I need to create some "dummies" Sequence for to do > the > > > > > > > alignment. > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > exclusive to > > > > > > > Sequence is the name, for the alignment output, > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to accept > > > > > SymbolList or > > > > > > > may be a new method that the parameters are 2 SymbolList and > returns > > > the > > > > > > > alignment score? > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > _______________________________________________ > > > > > > > biojava-dev mailing list > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From felipe.albrecht at gmail.com Fri Jan 25 02:06:53 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Fri, 25 Jan 2008 00:06:53 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> Message-ID: Hi, is not possible to add into the SequenceAlignment interface something like: "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList symbolList2)"? Okay, the name is horrible, but you know what it means. > While ultra optimized algorithms where once a major > feature of bioinformatics this is becoming less necessary as standard > desktops are now equivalent to the super computers of 5 years ago. Okay, but do not forget that the bioinformatics data size is growing faster than the computer processing and main memory capacity. What im trying to say is that the actual methods are fast [and light] enough for do 1, 10, 100, 1000 pairwise alignments, but not for 10k, 100k or in my case, 65k * 65k. Really, I dont see problems of having optimized functions for specifics operations, as unix phylosofies: "do small programs for specifics things, for big things join then" (Something like it :-) ). Thank you, Felipe Albrecht On Jan 24, 2008 11:26 PM, Mark Schreiber wrote: > Hi Felipe - > > I agree your method is more efficient but I think it violates the > SequenceAlignment interface which would cause compatibility problems. > I also wonder what should happen if a user calls the getAlignment() > method if you have only calculated a score. > > instanceof is potentially expensive but it is nothing compared to > actually performing the SmithWaterman. > > Biojava is somewhat memory heavy but this is largely because it is > object oriented. Certainly something in C would be lighter and faster > but the whole point in using Java is the relative benefits of object > oriented design. While ultra optimized algorithms where once a major > feature of bioinformatics this is becoming less necessary as standard > desktops are now equivalent to the super computers of 5 years ago. > > I actually find the SW and NW to be reasonably fast. This is because > all the heavy lifting is done in loops that the JVM presumably > compiles and executes natively. > > - Mark > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht > wrote: > > Hello, > > > > I saw the commit and I think that this solution is not the better. > > I think it because you are creating internally two Sequence and probably > the > > programmer will not use others alignment information, he will use only > the > > score. > > > > Because it, I think that if you have 2 SymbolList, just do the alignment > and > > return the score, as I did.Otherwise, If the programmer want the "visual > > alignment", he should create externally the SimpleSequences, it is, not > the > > method must do it. > > > > IMHO, one [serious] problem in biojava is the memory consumption, it > have > > not "lightweight" classes or methods that do the things quickly. Because > it, > > may be is a good choice to have a method that simply gives the alignment > > score, and not do the others things, like backtracking. Another think, > the > > cost of the "instanceof" is high. > > > > Thank you, > > > > Felipe Albrecht > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber > wrote: > > > Hi - > > > > > > I have just commited changes that let you use SymbolLists in all parts > > > of the NW and SW SequenceAlignment objects. > > > > > > As you suggested I made the matrix a method local variable. I also > > > removed calls to the garbage collector. > > > > > > This can be checked out from SVN. > > > > > > - Mark > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > > wrote: > > > > If you prefer, I can send a diff and should I do the same thing in > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > Thank you, > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com> > > wrote: > > > > > Hi Felipe - > > > > > > > > > > Thanks for the input on this. As a general rule the GC should > never be > > > > > called from code. Generally this degrades performance of the JVM. > > > > > Unless there is a very good reason I will remove this. Probably > you > > > > > are right a method parameter may work better. > > > > > > > > > > - Mark > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht < > felipe.albrecht at gmail.com> > > > > wrote: > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > Implement (or just copy and cut) a pairwiseAlignment utilizing > > SymboList > > > > as > > > > > > parameters and do no creating a alignment, just the calculating > it > > and > > > > > > returning the value. > > > > > > > > > > > > Another thing that is a bit stange for me, is the utilization of > > garbage > > > > > > collector direcly, that is: The field "scoreMatrix" is a class > > field, > > > > why at > > > > > > the end of pairwiseAlignment it is set to null and the garbage > > collector > > > > > > run? It is not better (and simpler) to use scoreMatrix as method > > > > variable? > > > > > > > > > > > > I'm annexing the class code with my changes that is doing well > the > > (4^8) > > > > * > > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > > > Thank you, > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > > > wrote: > > > > > > > Hi Felipe - > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences > are > > > > > > > required internally for some obscure reason there is no reason > why > > > > > > > dummy Sequences cannot be made inside the aligner. These > > sequences > > > > > > > could be given names like 'query' and 'subject' or even 'seq1' > and > > > > > > > 'seq2'. > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > > > > wrote: > > > > > > > > Hello all, > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment classes > > > > (SmithWaterman > > > > > > and > > > > > > > > NeedlemanWunsch): > > > > > > > > Why it is necessary two Sequence for alignment and not two > > > > SymbolList? > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between > then, > > > > > > > > by this way I need to create some "dummies" Sequence for to > do > > the > > > > > > > > alignment. > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > > exclusive to > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to > accept > > > > > > SymbolList or > > > > > > > > may be a new method that the parameters are 2 SymbolList and > > returns > > > > the > > > > > > > > alignment score? > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > _______________________________________________ > > > > > > > > biojava-dev mailing list > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From markjschreiber at gmail.com Fri Jan 25 03:43:30 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 25 Jan 2008 11:43:30 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> Message-ID: <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> On Jan 25, 2008 10:06 AM, Felipe Albrecht wrote: > Hi, > > is not possible to add into the SequenceAlignment interface something like: > "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList > symbolList2)"? > Okay, the name is horrible, but you know what it means. > You could but you would break backwards compatibility with anyone who implemented this interface previously. We have sometimes done this in biojava but we would need to make sure it will break no ones code. Another option would be to extend the interface with another that adds this method (not very tidy I know). > > > While ultra optimized algorithms where once a major > > feature of bioinformatics this is becoming less necessary as standard > > desktops are now equivalent to the super computers of 5 years ago. > > Okay, but do not forget that the bioinformatics data size is growing faster > than the computer processing and main memory capacity. > > What im trying to say is that the actual methods are fast [and light] enough > for do 1, 10, 100, 1000 pairwise alignments, but not for 10k, 100k or in my > case, 65k * 65k. One could also argue that Smith Waterman is not ideal for large sequences. I think it is o(NM) or something. > Really, I dont see problems of having optimized functions for specifics > operations, as unix phylosofies: "do small programs for specifics things, > for big things join then" (Something like it :-) ). > Yes, this would be an argument for workflow or service oriented architecture built from multiple inter operable biojava sub-projects. This is obviously not what biojava is. Indeed biojava is not even an application you just use it to build applications. Maybe for your use case you could use biojava to handle the I/O and then do the more efficient SW using your own code. BioJava is a collection of objects that are (somewhat) related and interoperable. It doesn't mean you have to use biojava throughout your application. - Mark > On Jan 24, 2008 11:26 PM, Mark Schreiber wrote: > > Hi Felipe - > > > > I agree your method is more efficient but I think it violates the > > SequenceAlignment interface which would cause compatibility problems. > > I also wonder what should happen if a user calls the getAlignment() > > method if you have only calculated a score. > > > > instanceof is potentially expensive but it is nothing compared to > > actually performing the SmithWaterman. > > > > Biojava is somewhat memory heavy but this is largely because it is > > object oriented. Certainly something in C would be lighter and faster > > but the whole point in using Java is the relative benefits of object > > oriented design. While ultra optimized algorithms where once a major > > feature of bioinformatics this is becoming less necessary as standard > > desktops are now equivalent to the super computers of 5 years ago. > > > > I actually find the SW and NW to be reasonably fast. This is because > > all the heavy lifting is done in loops that the JVM presumably > > compiles and executes natively. > > > > - Mark > > > > > > > > > > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht > wrote: > > > Hello, > > > > > > I saw the commit and I think that this solution is not the better. > > > I think it because you are creating internally two Sequence and probably > the > > > programmer will not use others alignment information, he will use only > the > > > score. > > > > > > Because it, I think that if you have 2 SymbolList, just do the alignment > and > > > return the score, as I did.Otherwise, If the programmer want the "visual > > > alignment", he should create externally the SimpleSequences, it is, not > the > > > method must do it. > > > > > > IMHO, one [serious] problem in biojava is the memory consumption, it > have > > > not "lightweight" classes or methods that do the things quickly. Because > it, > > > may be is a good choice to have a method that simply gives the alignment > > > score, and not do the others things, like backtracking. Another think, > the > > > cost of the "instanceof" is high. > > > > > > Thank you, > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber > wrote: > > > > Hi - > > > > > > > > I have just commited changes that let you use SymbolLists in all parts > > > > of the NW and SW SequenceAlignment objects. > > > > > > > > As you suggested I made the matrix a method local variable. I also > > > > removed calls to the garbage collector. > > > > > > > > This can be checked out from SVN. > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > > > wrote: > > > > > If you prefer, I can send a diff and should I do the same thing in > > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > > > Thank you, > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < markjschreiber at gmail.com > > > > wrote: > > > > > > Hi Felipe - > > > > > > > > > > > > Thanks for the input on this. As a general rule the GC should > never be > > > > > > called from code. Generally this degrades performance of the JVM. > > > > > > Unless there is a very good reason I will remove this. Probably > you > > > > > > are right a method parameter may work better. > > > > > > > > > > > > - Mark > > > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > > > > wrote: > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > > Implement (or just copy and cut) a pairwiseAlignment utilizing > > > SymboList > > > > > as > > > > > > > parameters and do no creating a alignment, just the calculating > it > > > and > > > > > > > returning the value. > > > > > > > > > > > > > > Another thing that is a bit stange for me, is the utilization of > > > garbage > > > > > > > collector direcly, that is: The field "scoreMatrix" is a class > > > field, > > > > > why at > > > > > > > the end of pairwiseAlignment it is set to null and the garbage > > > collector > > > > > > > run? It is not better (and simpler) to use scoreMatrix as method > > > > > variable? > > > > > > > > > > > > > > I'm annexing the class code with my changes that is doing well > the > > > (4^8) > > > > > * > > > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > > > > > wrote: > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if Sequences > are > > > > > > > > required internally for some obscure reason there is no reason > why > > > > > > > > dummy Sequences cannot be made inside the aligner. These > > > sequences > > > > > > > > could be given names like 'query' and 'subject' or even 'seq1' > and > > > > > > > > 'seq2'. > > > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > > > > > > wrote: > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment classes > > > > > (SmithWaterman > > > > > > > and > > > > > > > > > NeedlemanWunsch): > > > > > > > > > Why it is necessary two Sequence for alignment and not two > > > > > SymbolList? > > > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between > then, > > > > > > > > > by this way I need to create some "dummies" Sequence for to > do > > > the > > > > > > > > > alignment. > > > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > > > exclusive to > > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to > accept > > > > > > > SymbolList or > > > > > > > > > may be a new method that the parameters are 2 SymbolList and > > > returns > > > > > the > > > > > > > > > alignment score? > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > _______________________________________________ > > > > > > > > > biojava-dev mailing list > > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From felipe.albrecht at gmail.com Fri Jan 25 04:25:23 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Fri, 25 Jan 2008 02:25:23 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> Message-ID: Hello again :-) On Jan 25, 2008 1:43 AM, Mark Schreiber wrote: > On Jan 25, 2008 10:06 AM, Felipe Albrecht > wrote: > > Hi, > > > > is not possible to add into the SequenceAlignment interface something > like: > > "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList > > symbolList2)"? > > Okay, the name is horrible, but you know what it means. > > > > You could but you would break backwards compatibility with anyone who > implemented this interface previously. We have sometimes done this in > biojava but we would need to make sure it will break no ones code. > Another option would be to extend the interface with another that adds > this method (not very tidy I know). If I'm correct, the SequenceAlignment is an abstract class, so, we can define there with an empty implementation, and SmithWaterman and others classes implements it. Anyone that implemented SequenceAlignment will not see anything different. > > > > > > While ultra optimized algorithms where once a major > > > feature of bioinformatics this is becoming less necessary as standard > > > desktops are now equivalent to the super computers of 5 years ago. > > > > Okay, but do not forget that the bioinformatics data size is growing > faster > > than the computer processing and main memory capacity. > > > > What im trying to say is that the actual methods are fast [and light] > enough > > for do 1, 10, 100, 1000 pairwise alignments, but not for 10k, 100k or in > my > > case, 65k * 65k. > > One could also argue that Smith Waterman is not ideal for large > sequences. I think it is o(NM) or something. I'm not comparing two sequences with 65k * 65k bases, but doing the alignment of 65k little sequences between then. > > > > Really, I dont see problems of having optimized functions for specifics > > operations, as unix phylosofies: "do small programs for specifics > things, > > for big things join then" (Something like it :-) ). > > > > Yes, this would be an argument for workflow or service oriented > architecture built from multiple inter operable biojava sub-projects. > This is obviously not what biojava is. Indeed biojava is not even an > application you just use it to build applications. Maybe for your use > case you could use biojava to handle the I/O and then do the more > efficient SW using your own code. BioJava is a collection of objects > that are (somewhat) related and interoperable. It doesn't mean you > have to use biojava throughout your application. Okay, now I understood, biojava is not a library for bioinformatics applications, but for interconnect bioinformatics applications. So, biojava in the actual way is not appropriate for the application that I am developing. I will develop some "optimized" classes and functions for my use and when it will be ready I will announce in this mailing list and ask if want to merge in biojava. If biojava team needs somebody to improve some biojava functions, specially sequences and sequences IO, can ask me. Thank you Felipe Albrecht > > > - Mark > > > On Jan 24, 2008 11:26 PM, Mark Schreiber > wrote: > > > Hi Felipe - > > > > > > I agree your method is more efficient but I think it violates the > > > SequenceAlignment interface which would cause compatibility problems. > > > I also wonder what should happen if a user calls the getAlignment() > > > method if you have only calculated a score. > > > > > > instanceof is potentially expensive but it is nothing compared to > > > actually performing the SmithWaterman. > > > > > > Biojava is somewhat memory heavy but this is largely because it is > > > object oriented. Certainly something in C would be lighter and faster > > > but the whole point in using Java is the relative benefits of object > > > oriented design. While ultra optimized algorithms where once a major > > > feature of bioinformatics this is becoming less necessary as standard > > > desktops are now equivalent to the super computers of 5 years ago. > > > > > > I actually find the SW and NW to be reasonably fast. This is because > > > all the heavy lifting is done in loops that the JVM presumably > > > compiles and executes natively. > > > > > > - Mark > > > > > > > > > > > > > > > > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht > > wrote: > > > > Hello, > > > > > > > > I saw the commit and I think that this solution is not the better. > > > > I think it because you are creating internally two Sequence and > probably > > the > > > > programmer will not use others alignment information, he will use > only > > the > > > > score. > > > > > > > > Because it, I think that if you have 2 SymbolList, just do the > alignment > > and > > > > return the score, as I did.Otherwise, If the programmer want the > "visual > > > > alignment", he should create externally the SimpleSequences, it is, > not > > the > > > > method must do it. > > > > > > > > IMHO, one [serious] problem in biojava is the memory consumption, it > > > have > > > > not "lightweight" classes or methods that do the things quickly. > Because > > it, > > > > may be is a good choice to have a method that simply gives the > alignment > > > > score, and not do the others things, like backtracking. Another > think, > > the > > > > cost of the "instanceof" is high. > > > > > > > > Thank you, > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber > > wrote: > > > > > Hi - > > > > > > > > > > I have just commited changes that let you use SymbolLists in all > parts > > > > > of the NW and SW SequenceAlignment objects. > > > > > > > > > > As you suggested I made the matrix a method local variable. I also > > > > > removed calls to the garbage collector. > > > > > > > > > > This can be checked out from SVN. > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht > > > > wrote: > > > > > > If you prefer, I can send a diff and should I do the same thing > in > > > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > > > > > Thank you, > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > wrote: > > > > > > > Hi Felipe - > > > > > > > > > > > > > > Thanks for the input on this. As a general rule the GC should > > never be > > > > > > > called from code. Generally this degrades performance of the > JVM. > > > > > > > Unless there is a very good reason I will remove this. > Probably > > you > > > > > > > are right a method parameter may work better. > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > > > > > > wrote: > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > > > Implement (or just copy and cut) a pairwiseAlignment > utilizing > > > > SymboList > > > > > > as > > > > > > > > parameters and do no creating a alignment, just the > calculating > > it > > > > and > > > > > > > > returning the value. > > > > > > > > > > > > > > > > Another thing that is a bit stange for me, is the > utilization of > > > > garbage > > > > > > > > collector direcly, that is: The field "scoreMatrix" is a > class > > > > field, > > > > > > why at > > > > > > > > the end of pairwiseAlignment it is set to null and the > garbage > > > > collector > > > > > > > > run? It is not better (and simpler) to use scoreMatrix as > method > > > > > > variable? > > > > > > > > > > > > > > > > I'm annexing the class code with my changes that is doing > well > > the > > > > (4^8) > > > > > > * > > > > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > > markjschreiber at gmail.com > > > > > > > > > > > wrote: > > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if > Sequences > > are > > > > > > > > > required internally for some obscure reason there is no > reason > > why > > > > > > > > > dummy Sequences cannot be made inside the aligner. These > > > > sequences > > > > > > > > > could be given names like 'query' and 'subject' or even > 'seq1' > > and > > > > > > > > > 'seq2'. > > > > > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > < felipe.albrecht at gmail.com > > > > > > > > > wrote: > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment > classes > > > > > > (SmithWaterman > > > > > > > > and > > > > > > > > > > NeedlemanWunsch): > > > > > > > > > > Why it is necessary two Sequence for alignment and not > two > > > > > > SymbolList? > > > > > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between > > then, > > > > > > > > > > by this way I need to create some "dummies" Sequence > for to > > do > > > > the > > > > > > > > > > alignment. > > > > > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > > > > exclusive to > > > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to > > accept > > > > > > > > SymbolList or > > > > > > > > > > may be a new method that the parameters are 2 SymbolList > and > > > > returns > > > > > > the > > > > > > > > > > alignment score? > > > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > _______________________________________________ > > > > > > > > > > biojava-dev mailing list > > > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From markjschreiber at gmail.com Fri Jan 25 05:40:20 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 25 Jan 2008 13:40:20 +0800 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: References: <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> Message-ID: <93b45ca50801242140j299b91e7td9fb295a7393ced9@mail.gmail.com> On Jan 25, 2008 12:25 PM, Felipe Albrecht wrote: > Hello again :-) > > > On Jan 25, 2008 1:43 AM, Mark Schreiber wrote: > > > > On Jan 25, 2008 10:06 AM, Felipe Albrecht > wrote: > > > Hi, > > > > > > is not possible to add into the SequenceAlignment interface something > like: > > > "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList > > > symbolList2)"? > > > Okay, the name is horrible, but you know what it means. > > > > If I'm correct, the SequenceAlignment is an abstract class, so, we can > define there with an empty implementation, and SmithWaterman and others > classes implements it. Anyone that implemented SequenceAlignment will not > see anything different. OK in that case adding the method would be OK, even desirable. Probably this would be the best way to merge in your code. > Okay, now I understood, biojava is not a library for bioinformatics > applications, but for interconnect bioinformatics applications. So, biojava Actually it is a library for bioinformatics that you use to build bioinformatics applications. It is possibly not as loosely coupled as you might like for your purpose. It is definitely not as loosely coupled as the Unix collection of executables or an SOA system. Due to heavy use of interfaces and abstract classes there is some possibility for custom code. For example you can recode the SmithWaterman object to be optimal for your needs and then create an application where you use your class in place of the normal biojava SmithWaterman. > in the actual way is not appropriate for the application that I am > developing. I will develop some "optimized" classes and functions for my use > and when it will be ready I will announce in this mailing list and ask if > want to merge in biojava. If biojava team needs somebody to improve some > biojava functions, specially sequences and sequences IO, can ask me. Code improvements and optimizations are always welcome especially if current interfaces can be preserved (that way the end user gets the improvement without having to change their code). I always advise potential optimizers to use a profiler because it is sometimes hard to predict how the JVM will behave, for example JIT compiling may mean parts of code that are theoretically CPU intensive may not be the CPU bottleneck when the JVM compiles them. - Mark > > Thank you > > Felipe Albrecht > > > > > > > > > > > > > > > > > - Mark > > > > > On Jan 24, 2008 11:26 PM, Mark Schreiber > wrote: > > > > Hi Felipe - > > > > > > > > I agree your method is more efficient but I think it violates the > > > > SequenceAlignment interface which would cause compatibility problems. > > > > I also wonder what should happen if a user calls the getAlignment() > > > > method if you have only calculated a score. > > > > > > > > instanceof is potentially expensive but it is nothing compared to > > > > actually performing the SmithWaterman. > > > > > > > > Biojava is somewhat memory heavy but this is largely because it is > > > > object oriented. Certainly something in C would be lighter and faster > > > > but the whole point in using Java is the relative benefits of object > > > > oriented design. While ultra optimized algorithms where once a major > > > > feature of bioinformatics this is becoming less necessary as standard > > > > desktops are now equivalent to the super computers of 5 years ago. > > > > > > > > I actually find the SW and NW to be reasonably fast. This is because > > > > all the heavy lifting is done in loops that the JVM presumably > > > > compiles and executes natively. > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht > > > wrote: > > > > > Hello, > > > > > > > > > > I saw the commit and I think that this solution is not the better. > > > > > I think it because you are creating internally two Sequence and > probably > > > the > > > > > programmer will not use others alignment information, he will use > only > > > the > > > > > score. > > > > > > > > > > Because it, I think that if you have 2 SymbolList, just do the > alignment > > > and > > > > > return the score, as I did.Otherwise, If the programmer want the > "visual > > > > > alignment", he should create externally the SimpleSequences, it is, > not > > > the > > > > > method must do it. > > > > > > > > > > IMHO, one [serious] problem in biojava is the memory consumption, it > > > have > > > > > not "lightweight" classes or methods that do the things quickly. > Because > > > it, > > > > > may be is a good choice to have a method that simply gives the > alignment > > > > > score, and not do the others things, like backtracking. Another > think, > > > the > > > > > cost of the "instanceof" is high. > > > > > > > > > > Thank you, > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber < markjschreiber at gmail.com > > > > > wrote: > > > > > > Hi - > > > > > > > > > > > > I have just commited changes that let you use SymbolLists in all > parts > > > > > > of the NW and SW SequenceAlignment objects. > > > > > > > > > > > > As you suggested I made the matrix a method local variable. I also > > > > > > removed calls to the garbage collector. > > > > > > > > > > > > This can be checked out from SVN. > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht < > felipe.albrecht at gmail.com > > > > > > wrote: > > > > > > > If you prefer, I can send a diff and should I do the same thing > in > > > > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > > wrote: > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > Thanks for the input on this. As a general rule the GC should > > > never be > > > > > > > > called from code. Generally this degrades performance of the > JVM. > > > > > > > > Unless there is a very good reason I will remove this. > Probably > > > you > > > > > > > > are right a method parameter may work better. > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > > > > > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > > > > Implement (or just copy and cut) a pairwiseAlignment > utilizing > > > > > SymboList > > > > > > > as > > > > > > > > > parameters and do no creating a alignment, just the > calculating > > > it > > > > > and > > > > > > > > > returning the value. > > > > > > > > > > > > > > > > > > Another thing that is a bit stange for me, is the > utilization of > > > > > garbage > > > > > > > > > collector direcly, that is: The field "scoreMatrix" is a > class > > > > > field, > > > > > > > why at > > > > > > > > > the end of pairwiseAlignment it is set to null and the > garbage > > > > > collector > > > > > > > > > run? It is not better (and simpler) to use scoreMatrix as > method > > > > > > > variable? > > > > > > > > > > > > > > > > > > I'm annexing the class code with my changes that is doing > well > > > the > > > > > (4^8) > > > > > > > * > > > > > > > > > (4^8) SymbolList pairwise alignments that I am needing :-) > > > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > > > markjschreiber at gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if > Sequences > > > are > > > > > > > > > > required internally for some obscure reason there is no > reason > > > why > > > > > > > > > > dummy Sequences cannot be made inside the aligner. These > > > > > sequences > > > > > > > > > > could be given names like 'query' and 'subject' or even > 'seq1' > > > and > > > > > > > > > > 'seq2'. > > > > > > > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > < felipe.albrecht at gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment > classes > > > > > > > (SmithWaterman > > > > > > > > > and > > > > > > > > > > > NeedlemanWunsch): > > > > > > > > > > > Why it is necessary two Sequence for alignment and not > two > > > > > > > SymbolList? > > > > > > > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align between > > > then, > > > > > > > > > > > by this way I need to create some "dummies" Sequence > for to > > > do > > > > > the > > > > > > > > > > > alignment. > > > > > > > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that is > > > > > exclusive to > > > > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > > > > but if I need only the alignment result, it is useless. > > > > > > > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment to > > > accept > > > > > > > > > SymbolList or > > > > > > > > > > > may be a new method that the parameters are 2 SymbolList > and > > > > > returns > > > > > > > the > > > > > > > > > > > alignment score? > > > > > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > biojava-dev mailing list > > > > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From felipe.albrecht at gmail.com Fri Jan 25 06:39:45 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Fri, 25 Jan 2008 04:39:45 -0200 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801242140j299b91e7td9fb295a7393ced9@mail.gmail.com> References: <93b45ca50801232350g5996fe1bp1bb6f5c4b92bc734@mail.gmail.com> <93b45ca50801240535v15688c02n17c8bac9d4c962a7@mail.gmail.com> <93b45ca50801241726h25c44621s42c9d89c85de5dc4@mail.gmail.com> <93b45ca50801241943l10e634fal3d10bfc739af5a1d@mail.gmail.com> <93b45ca50801242140j299b91e7td9fb295a7393ced9@mail.gmail.com> Message-ID: Okay, I agree with what you said. I was looking the SequenceAlignment source and I realize a strange thing. At formatOutput method, the editDistance is multiplied by -1, if you use a NeedlemanWunsch pairwiseAlignment method, the editDistance is returned without any multiplication. That is, the score/editDistance of formatOutput is different from there that is given by NeedlemanWunsch pairwiseAlignment. What is the correct? Thank you again Felipe Albrecht On Jan 25, 2008 3:40 AM, Mark Schreiber wrote: > On Jan 25, 2008 12:25 PM, Felipe Albrecht > wrote: > > Hello again :-) > > > > > > On Jan 25, 2008 1:43 AM, Mark Schreiber > wrote: > > > > > > On Jan 25, 2008 10:06 AM, Felipe Albrecht > > wrote: > > > > Hi, > > > > > > > > is not possible to add into the SequenceAlignment interface > something > > like: > > > > "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList > > > > symbolList2)"? > > > > Okay, the name is horrible, but you know what it means. > > > > > > > If I'm correct, the SequenceAlignment is an abstract class, so, we can > > define there with an empty implementation, and SmithWaterman and others > > classes implements it. Anyone that implemented SequenceAlignment will > not > > see anything different. > > OK in that case adding the method would be OK, even desirable. > Probably this would be the best way to merge in your code. > > > Okay, now I understood, biojava is not a library for bioinformatics > > applications, but for interconnect bioinformatics applications. So, > biojava > > Actually it is a library for bioinformatics that you use to build > bioinformatics applications. It is possibly not as loosely coupled as > you might like for your purpose. It is definitely not as loosely > coupled as the Unix collection of executables or an SOA system. Due > to heavy use of interfaces and abstract classes there is some > possibility for custom code. For example you can recode the > SmithWaterman object to be optimal for your needs and then create an > application where you use your class in place of the normal biojava > SmithWaterman. > > > in the actual way is not appropriate for the application that I am > > developing. I will develop some "optimized" classes and functions for my > use > > and when it will be ready I will announce in this mailing list and ask > if > > want to merge in biojava. If biojava team needs somebody to improve some > > biojava functions, specially sequences and sequences IO, can ask me. > > Code improvements and optimizations are always welcome especially if > current interfaces can be preserved (that way the end user gets the > improvement without having to change their code). I always advise > potential optimizers to use a profiler because it is sometimes hard to > predict how the JVM will behave, for example JIT compiling may mean > parts of code that are theoretically CPU intensive may not be the CPU > bottleneck when the JVM compiles them. > > - Mark > > > > > Thank you > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mark > > > > > > > On Jan 24, 2008 11:26 PM, Mark Schreiber > > wrote: > > > > > Hi Felipe - > > > > > > > > > > I agree your method is more efficient but I think it violates the > > > > > SequenceAlignment interface which would cause compatibility > problems. > > > > > I also wonder what should happen if a user calls the > getAlignment() > > > > > method if you have only calculated a score. > > > > > > > > > > instanceof is potentially expensive but it is nothing compared to > > > > > actually performing the SmithWaterman. > > > > > > > > > > Biojava is somewhat memory heavy but this is largely because it is > > > > > object oriented. Certainly something in C would be lighter and > faster > > > > > but the whole point in using Java is the relative benefits of > object > > > > > oriented design. While ultra optimized algorithms where once a > major > > > > > feature of bioinformatics this is becoming less necessary as > standard > > > > > desktops are now equivalent to the super computers of 5 years ago. > > > > > > > > > > I actually find the SW and NW to be reasonably fast. This is > because > > > > > all the heavy lifting is done in loops that the JVM presumably > > > > > compiles and executes natively. > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht < > felipe.albrecht at gmail.com > > > > > wrote: > > > > > > Hello, > > > > > > > > > > > > I saw the commit and I think that this solution is not the > better. > > > > > > I think it because you are creating internally two Sequence and > > probably > > > > the > > > > > > programmer will not use others alignment information, he will > use > > only > > > > the > > > > > > score. > > > > > > > > > > > > Because it, I think that if you have 2 SymbolList, just do the > > alignment > > > > and > > > > > > return the score, as I did.Otherwise, If the programmer want the > > "visual > > > > > > alignment", he should create externally the SimpleSequences, it > is, > > not > > > > the > > > > > > method must do it. > > > > > > > > > > > > IMHO, one [serious] problem in biojava is the memory > consumption, it > > > > have > > > > > > not "lightweight" classes or methods that do the things quickly. > > Because > > > > it, > > > > > > may be is a good choice to have a method that simply gives the > > alignment > > > > > > score, and not do the others things, like backtracking. Another > > think, > > > > the > > > > > > cost of the "instanceof" is high. > > > > > > > > > > > > Thank you, > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber < > markjschreiber at gmail.com > > > > > > > wrote: > > > > > > > Hi - > > > > > > > > > > > > > > I have just commited changes that let you use SymbolLists in > all > > parts > > > > > > > of the NW and SW SequenceAlignment objects. > > > > > > > > > > > > > > As you suggested I made the matrix a method local variable. I > also > > > > > > > removed calls to the garbage collector. > > > > > > > > > > > > > > This can be checked out from SVN. > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht < > > felipe.albrecht at gmail.com > > > > > > > wrote: > > > > > > > > If you prefer, I can send a diff and should I do the same > thing > > in > > > > > > > > SequenceAlignment and NeedlemanWunsch classes? > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber < > > markjschreiber at gmail.com > > > > > > > wrote: > > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > > > Thanks for the input on this. As a general rule the GC > should > > > > never be > > > > > > > > > called from code. Generally this degrades performance of > the > > JVM. > > > > > > > > > Unless there is a very good reason I will remove this. > > Probably > > > > you > > > > > > > > > are right a method parameter may work better. > > > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht > > > > > > > > > > > > wrote: > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that it can be solved by a simple way: > > > > > > > > > > Implement (or just copy and cut) a pairwiseAlignment > > utilizing > > > > > > SymboList > > > > > > > > as > > > > > > > > > > parameters and do no creating a alignment, just the > > calculating > > > > it > > > > > > and > > > > > > > > > > returning the value. > > > > > > > > > > > > > > > > > > > > Another thing that is a bit stange for me, is the > > utilization of > > > > > > garbage > > > > > > > > > > collector direcly, that is: The field "scoreMatrix" is a > > class > > > > > > field, > > > > > > > > why at > > > > > > > > > > the end of pairwiseAlignment it is set to null and the > > garbage > > > > > > collector > > > > > > > > > > run? It is not better (and simpler) to use scoreMatrix > as > > method > > > > > > > > variable? > > > > > > > > > > > > > > > > > > > > I'm annexing the class code with my changes that is > doing > > well > > > > the > > > > > > (4^8) > > > > > > > > * > > > > > > > > > > (4^8) SymbolList pairwise alignments that I am needing > :-) > > > > > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 6:50 AM, Mark Schreiber < > > > > markjschreiber at gmail.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > Hi Felipe - > > > > > > > > > > > > > > > > > > > > > > I agree this is a barrier to ease of use. Even if > > Sequences > > > > are > > > > > > > > > > > required internally for some obscure reason there is > no > > reason > > > > why > > > > > > > > > > > dummy Sequences cannot be made inside the aligner. > These > > > > > > sequences > > > > > > > > > > > could be given names like 'query' and 'subject' or > even > > 'seq1' > > > > and > > > > > > > > > > > 'seq2'. > > > > > > > > > > > > > > > > > > > > > > I will take a look at adding some methods. > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > > > - Mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > > > > > > < felipe.albrecht at gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > > > > > > > I have a simple question about pairwise alignment > > classes > > > > > > > > (SmithWaterman > > > > > > > > > > and > > > > > > > > > > > > NeedlemanWunsch): > > > > > > > > > > > > Why it is necessary two Sequence for alignment and > not > > two > > > > > > > > SymbolList? > > > > > > > > > > > > > > > > > > > > > > > > Example, I have a SymbolList collection to align > between > > > > then, > > > > > > > > > > > > by this way I need to create some "dummies" > Sequence > > for to > > > > do > > > > > > the > > > > > > > > > > > > alignment. > > > > > > > > > > > > > > > > > > > > > > > > Reading the source, I saw that the unique field that > is > > > > > > exclusive to > > > > > > > > > > > > Sequence is the name, for the alignment output, > > > > > > > > > > > > but if I need only the alignment result, it is > useless. > > > > > > > > > > > > > > > > > > > > > > > > It is not possible to override the pairwiseAlignment > to > > > > accept > > > > > > > > > > SymbolList or > > > > > > > > > > > > may be a new method that the parameters are 2 > SymbolList > > and > > > > > > returns > > > > > > > > the > > > > > > > > > > > > alignment score? > > > > > > > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > > > > > > > Felipe Albrecht > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > biojava-dev mailing list > > > > > > > > > > > > biojava-dev at lists.open-bio.org > > > > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From ap3 at sanger.ac.uk Fri Jan 25 09:49:22 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri, 25 Jan 2008 09:49:22 +0000 Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: <8adccabf0801241217gfa82d47r2728b08c6bcde862@mail.gmail.com> References: <8adccabf0801241217gfa82d47r2728b08c6bcde862@mail.gmail.com> Message-ID: <80544F81-E3DE-4212-8C62-2CA4865E4758@sanger.ac.uk> > DAS looks just wonderful, and I am very glad to be made aware of it ? > it seems like a much better solution than my initial, highly naive > reaction (accessing public SQL connections). don;t think that this is naive. There is also ensembldb.ensembl.org which is a public mysql server if you prefer sql.... > > As I understand it, the easiest way to access DAS services in Java is > via an API such as JAX-WS? Jumping into the Javadocs, and looking > over a JAX-WS tutorial that I found here: > http://java.sun.com/webservices/docs/2.0/tutorial/doc/ it looks there > is a lot to this. I have not tried JAX-WS yet, so I can not comment on that, but if you want a library that makes it easier to talk to DAS servers, you can use my DAS client library at: http://www.spice-3d.org/dasobert/ Andreas > ---------------------------------------------------------------------- > - Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ayates at ebi.ac.uk Fri Jan 25 09:58:37 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 25 Jan 2008 09:58:37 +0000 Subject: [Biojava-dev] Direct access to public genome databases In-Reply-To: <80544F81-E3DE-4212-8C62-2CA4865E4758@sanger.ac.uk> References: <8adccabf0801241217gfa82d47r2728b08c6bcde862@mail.gmail.com> <80544F81-E3DE-4212-8C62-2CA4865E4758@sanger.ac.uk> Message-ID: <4799B2CD.3010603@ebi.ac.uk> Andreas Prlic wrote: > >> DAS looks just wonderful, and I am very glad to be made aware of it ? >> it seems like a much better solution than my initial, highly naive >> reaction (accessing public SQL connections). > > don;t think that this is naive. There is also ensembldb.ensembl.org > which is a public mysql server if you prefer sql.... I have to agree. There's a lot to be said about using a database directly (in fact my group does just that). But it is potentially a more fragile solution than using a public api. >> >> As I understand it, the easiest way to access DAS services in Java is >> via an API such as JAX-WS? Jumping into the Javadocs, and looking >> over a JAX-WS tutorial that I found here: >> http://java.sun.com/webservices/docs/2.0/tutorial/doc/ it looks there >> is a lot to this. > > I have not tried JAX-WS yet, so I can not comment on that, but if you > want a library > that makes it easier to talk to DAS servers, you can use my > DAS client library at: http://www.spice-3d.org/dasobert/ I would stay away from JAX-WS. It's a web services framework which is more for SOAP access than anything else. Even when using it for SOAP remoting I've found it less than intuitive (maybe later version have gotten better about this especially with the Java6 compiler api hopefully removing any need for explicit annotation processing). Chances are it would be easier to use JAX-WS when the REST service support comes in but until then I would say stay away from it. Andreas' library is a far superior solution :) Andy From bugzilla-daemon at portal.open-bio.org Fri Jan 25 12:28:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 25 Jan 2008 07:28:08 -0500 Subject: [Biojava-dev] [Bug 2432] non conventional fasta header && RichSequence.IOTools In-Reply-To: Message-ID: <200801251228.m0PCS8A5003219@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2432 mark.schreiber at novartis.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mark.schreiber at novartis.com 2008-01-25 07:28 EST ------- I have added a class called FastaHeader. This class lets you specify which fields you want to see in the fasta header output. There are overloaded writeFasta methods in RichSequence.IOTools that let you use this class easily. For example the following test program reads in a fasta file from Genbank with a full header and outputs it with only the accession number and description after the '>' /* * To change this template, choose Tools | Templates * and open the template in the editor. */ package io; import java.io.BufferedReader; import java.io.FileReader; import org.biojavax.bio.seq.RichSequenceIterator; import org.biojavax.bio.seq.io.FastaHeader; import static org.biojavax.bio.seq.RichSequence.IOTools; /** * * @author Mark */ public class WriteFasta { public static void main(String[] args) throws Exception{ BufferedReader br = new BufferedReader( new FileReader("files/dna.fasta")); RichSequenceIterator iter = IOTools.readFastaDNA(br, null); //IOTools.writeFasta(System.out, iter, null); FastaHeader header = new FastaHeader(); header.setShowDescription(true); header.setShowIdentifier(false); header.setShowNamespace(false); header.setShowName(false); header.setShowVersion(false); IOTools.writeFasta(System.out, iter, null, header); } } -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From miwalsh125 at gmail.com Tue Jan 29 15:22:07 2008 From: miwalsh125 at gmail.com (michael walsh) Date: Tue, 29 Jan 2008 10:22:07 -0500 Subject: [Biojava-dev] Genbank Feature extraction question. Message-ID: BioJava, I am writing a program that extracts features from a Genbank file using BioJava The program needs to extract the feature locations from the file. This is easy enough to do but the location information returned by the program does not specify whether or not the location is on the complementary strand of DNA. This is some of my code: FeatureHolder fltrHold = seq.filter(codeFltr);//codeFltr is a filter that retrieves coding features only. //iterate over the Features in fh for (Iterator i = fltrHold.features(); i.hasNext(); ){ Feature f = (Feature)i.next(); System.out.println(f.getLocation().toString()); An example of the output of the print command is: join:[32775..32948,31801..32052] However, the entry in the Genbank file reads: complement(join(31801..32052,32775..32948)) Is there any way to get my program to output the fact that a features location is on the complementary strand of DNA? Any help that anyone can give me would be greatly appreciated. Sincerely, M Walsh From holland at ebi.ac.uk Wed Jan 30 08:53:29 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 30 Jan 2008 08:53:29 +0000 Subject: [Biojava-dev] Genbank Feature extraction question. In-Reply-To: References: Message-ID: <47A03B09.1030905@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello. If you are using the BioJavaX parsers as opposed to the older deprecated ones, then yes, the information is there. Your parser will be returning instances of RichSequence (there will be a nextRichSequence or getRichSequence method depending on the way in which you are using the parser and/or iterator). Features on a RichSequence are all instances of RichFeature. Locations for RichFeatures are all instances of RichLocation. If you cast appropriately (or locate and find the getRich* equivalents of the get* methods you already using on the non-Rich interfaces) then you will find that RichLocation does have a getStrand method which will give you the information you need. cheers, Richard michael walsh wrote: > BioJava, > > I am writing a program that extracts features from a Genbank file using > BioJava The program needs to extract the feature locations from the file. > This is easy enough to do but the location information returned by the > program does not specify whether or not the location is on the complementary > strand of DNA. > > This is some of my code: > FeatureHolder fltrHold = seq.filter(codeFltr);//codeFltr is a filter that > retrieves coding features only. > //iterate over the Features in fh > for (Iterator i = fltrHold.features(); i.hasNext(); ){ > Feature f = (Feature)i.next(); > System.out.println(f.getLocation().toString()); > > An example of the output of the print command is: > join:[32775..32948,31801..32052] > > However, the entry in the Genbank file reads: > complement(join(31801..32052,32775..32948)) > > Is there any way to get my program to output the fact that a features > location is on the complementary strand of DNA? > > Any help that anyone can give me would be greatly appreciated. > > Sincerely, > > M Walsh > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHoDsJ4C5LeMEKA/QRAhIsAJ4lzIn0bBMjYIaZqNMz0gUm3c1vHgCePGW4 jYPFCjw1pBlMMp94mgRQOsc= =OoGL -----END PGP SIGNATURE----- From felipe.albrecht at gmail.com Thu Jan 24 05:47:37 2008 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Thu, 24 Jan 2008 05:47:37 -0000 Subject: [Biojava-dev] Pairwise Alignment methods In-Reply-To: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> References: <93b45ca50801230050t582784b9l732e08311406cc29@mail.gmail.com> Message-ID: Hello, I think that it can be solved by a simple way: Implement (or just copy and cut) a pairwiseAlignment utilizing SymboList as parameters and do no creating a alignment, just the calculating it and returning the value. Another thing that is a bit stange for me, is the utilization of garbage collector direcly, that is: The field "scoreMatrix" is a class field, why at the end of pairwiseAlignment it is set to null and the garbage collector run? It is not better (and simpler) to use scoreMatrix as method variable? I'm annexing the class code with my changes that is doing well the (4^8) * (4^8) SymbolList pairwise alignments that I am needing :-) Thank you, Felipe Albrecht On Jan 23, 2008 6:50 AM, Mark Schreiber wrote: > Hi Felipe - > > I agree this is a barrier to ease of use. Even if Sequences are > required internally for some obscure reason there is no reason why > dummy Sequences cannot be made inside the aligner. These sequences > could be given names like 'query' and 'subject' or even 'seq1' and > 'seq2'. > > I will take a look at adding some methods. > > Best regards, > > - Mark > > On Jan 23, 2008 2:58 PM, Felipe Albrecht > wrote: > > Hello all, > > > > I have a simple question about pairwise alignment classes (SmithWaterman > and > > NeedlemanWunsch): > > Why it is necessary two Sequence for alignment and not two SymbolList? > > > > Example, I have a SymbolList collection to align between then, > > by this way I need to create some "dummies" Sequence for to do the > > alignment. > > > > Reading the source, I saw that the unique field that is exclusive to > > Sequence is the name, for the alignment output, > > but if I need only the alignment result, it is useless. > > > > It is not possible to override the pairwiseAlignment to accept > SymbolList or > > may be a new method that the parameters are 2 SymbolList and returns the > > alignment score? > > > > Thank you > > > > Felipe Albrecht > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: SmithWaterman.java Type: application/octet-stream Size: 17664 bytes Desc: not available URL: