From andreas at sdsc.edu Sun Apr 1 13:54:30 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 1 Apr 2012 10:54:30 -0700 Subject: [Biojava-l] GSoC Application Discussion and Help - Porting BLAST to Java In-Reply-To: References: Message-ID: Hi Dhruv, We are quite flexible regarding the projects and what we are really looking for are sound projects and motivated students. As such our project suggestions are quite open. We will interact with accepted students from remote, so a certain degree of self-sufficiency will be required from the side of the student. If you already see tons of problems coming up during your initial assessment of the project, perhaps focus your proposal on something smaller and more achievable. There are quite a number of interesting algorithms out there and it does not have to be one of the ones suggested by us. Andreas On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma wrote: > Hi, > > I am Dhruv Sharma, a senior undergraduate student pursuing B.E.(Hons.) > Computer Science at BITS, Pilani, India. > > I am very much interested in 'porting BLAST algorithm to Java' as a GSoC > 2012 project. I am proficient and primarily work using Java and C. Also, I > have past experience of working in C++ before migrating to Java. However, I > am new to GSoC and haven't used version control in the past. > > My recent project was based on developing a web application in Java for > posting data to remote CS-BLAST web > service with > FASTA sequence, parse and auto-filter its results using the release date > from RCSB PDB and download the PDB > files. > > Since, the project aims at converting the legacy C/C++ code to Java, > already suggested approaches on the Bio-Java page and my observations are:- > > 1) ?Using C++ to Java converters for 100% conversion. I have tried > converting the ncbi-blast-2.2.26 source code using a few freely available > converters but all of them either crashed or failed to convert even after I > resolved certain header file dependency issues that emerged. Most failures > occurred at function calls to non-standard C++ libraries. > > 2) ?Using JNI as an alternative solution. JNI programming would be a > tedious task and would anyway require understanding of the purpose of > underlying C++ code. Hence,has little advantage over rewriting the > equivalent Java code. A significant advantage can be seen when there is no > efficient Java alternative of the C++ code. However, platform dependence > would still exist. > > According to my understanding of the problem, a hybrid approach can be > taken up which includes using code converters for simpler files, manual > coding for tricky areas and using JNI for typical C++ code involving > non-standard libraries. But, I am still not clear about my exact course of > action. > > Can you please tell me if my analysis of the problem is correct? Please > also comment on the feasibility of my suggested approach and please make > any suggestions as they would help me in improving my application draft > that I would soon be sharing for review. > > As BLAST is a collection of programs, so, keeping in mind the length of > code to be ported, can we work on certain selectively critical programs in > it from the GSoC's perspective? > > > Thanks. > > -- > *Dhruv Sharma* > *Student > B.E.(Hons.) Computer Science > BITS, Pilani > * > *India* > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From sharma.dhrv at gmail.com Sun Apr 1 17:32:31 2012 From: sharma.dhrv at gmail.com (Dhruv Sharma) Date: Mon, 2 Apr 2012 03:02:31 +0530 Subject: [Biojava-l] Suggestion for porting GHMM library Message-ID: Hi Andreas, In response to our last discussion, I would like to suggest porting General Hidden Markov Model (GHMM) library (http://ghmm.org/) from C to Java. The library is licensed under LGPL and is currently available as RC1 version. The code is not very big and it is very much possible to port 100% code to Java which would make it not only efficient in comparison to use of converters or JNI but also make it platform independent. Would it be possible to add this library to BioJava? If yes, I would surely like to work on it. On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic wrote: > Hi Dhruv, > > We are quite flexible regarding the projects and what we are really > looking for are sound projects and motivated students. As such our > project suggestions are quite open. We will interact with accepted > students from remote, so a certain degree of self-sufficiency will be > required from the side of the student. > > If you already see tons of problems coming up during your initial > assessment of the project, perhaps focus your proposal on something > smaller and more achievable. There are quite a number of interesting > algorithms out there and it does not have to be one of the ones > suggested by us. > > Andreas > > > > On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma > wrote: > > Hi, > > > > I am Dhruv Sharma, a senior undergraduate student pursuing B.E.(Hons.) > > Computer Science at BITS, Pilani, India. > > > > I am very much interested in 'porting BLAST algorithm to Java' as a GSoC > > 2012 project. I am proficient and primarily work using Java and C. Also, > I > > have past experience of working in C++ before migrating to Java. > However, I > > am new to GSoC and haven't used version control in the past. > > > > My recent project was based on developing a web application in Java for > > posting data to remote CS-BLAST web > > service with > > FASTA sequence, parse and auto-filter its results using the release date > > from RCSB PDB and download the > PDB > > files. > > > > Since, the project aims at converting the legacy C/C++ code to Java, > > already suggested approaches on the Bio-Java page and my observations > are:- > > > > 1) Using C++ to Java converters for 100% conversion. I have tried > > converting the ncbi-blast-2.2.26 source code using a few freely available > > converters but all of them either crashed or failed to convert even > after I > > resolved certain header file dependency issues that emerged. Most > failures > > occurred at function calls to non-standard C++ libraries. > > > > 2) Using JNI as an alternative solution. JNI programming would be a > > tedious task and would anyway require understanding of the purpose of > > underlying C++ code. Hence,has little advantage over rewriting the > > equivalent Java code. A significant advantage can be seen when there is > no > > efficient Java alternative of the C++ code. However, platform dependence > > would still exist. > > > > According to my understanding of the problem, a hybrid approach can be > > taken up which includes using code converters for simpler files, manual > > coding for tricky areas and using JNI for typical C++ code involving > > non-standard libraries. But, I am still not clear about my exact course > of > > action. > > > > Can you please tell me if my analysis of the problem is correct? Please > > also comment on the feasibility of my suggested approach and please make > > any suggestions as they would help me in improving my application draft > > that I would soon be sharing for review. > > > > As BLAST is a collection of programs, so, keeping in mind the length of > > code to be ported, can we work on certain selectively critical programs > in > > it from the GSoC's perspective? > > > > > > Thanks. > > > > -- > > *Dhruv Sharma* > > *Student > > B.E.(Hons.) Computer Science > > BITS, Pilani > > * > > *India* > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- *Dhruv Sharma* *Student B.E.(Hons.) Computer Science BITS, Pilani * *India* From andreas at sdsc.edu Sun Apr 1 20:15:38 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 1 Apr 2012 17:15:38 -0700 Subject: [Biojava-l] Suggestion for porting GHMM library In-Reply-To: References: Message-ID: That could work in terms of license and would be an interesting feature to have. I am still slightly concerned that the scale of the project might be too big and it might be difficult to accomplish this during the limited time of the project. Andreas On Sun, Apr 1, 2012 at 2:32 PM, Dhruv Sharma wrote: > Hi Andreas, > > In response to our last discussion, I would like to suggest porting General > Hidden Markov Model?(GHMM) library (http://ghmm.org/) from C to Java. > > The library is licensed under LGPL and is currently available as RC1 > version. The code is not very big and it is very much possible to port 100% > code to Java which would make it not only efficient in comparison to use of > converters or JNI but also make it platform independent. > > Would it be possible to add this library to BioJava? > > If yes, I would surely like to work on it. > > > > > On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic wrote: >> >> Hi Dhruv, >> >> We are quite flexible regarding the projects and what we are really >> looking for are sound projects ?and motivated students. As such our >> project suggestions are quite open. We will interact with accepted >> students from remote, so a certain degree of self-sufficiency will be >> required from the side of the student. >> >> If you already see tons of problems coming up during your initial >> assessment of the project, perhaps focus your proposal on something >> smaller and more achievable. There are quite a number of interesting >> algorithms out there and it does not have to be one of the ones >> suggested by us. >> >> Andreas >> >> >> >> On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma >> wrote: >> > Hi, >> > >> > I am Dhruv Sharma, a senior undergraduate student pursuing B.E.(Hons.) >> > Computer Science at BITS, Pilani, India. >> > >> > I am very much interested in 'porting BLAST algorithm to Java' as a GSoC >> > 2012 project. I am proficient and primarily work using Java and C. Also, >> > I >> > have past experience of working in C++ before migrating to Java. >> > However, I >> > am new to GSoC and haven't used version control in the past. >> > >> > My recent project was based on developing a web application in Java for >> > posting data to remote CS-BLAST web >> > service with >> > FASTA sequence, parse and auto-filter its results using the release date >> > from RCSB PDB and download the >> > PDB >> > files. >> > >> > Since, the project aims at converting the legacy C/C++ code to Java, >> > already suggested approaches on the Bio-Java page and my observations >> > are:- >> > >> > 1) ?Using C++ to Java converters for 100% conversion. I have tried >> > converting the ncbi-blast-2.2.26 source code using a few freely >> > available >> > converters but all of them either crashed or failed to convert even >> > after I >> > resolved certain header file dependency issues that emerged. Most >> > failures >> > occurred at function calls to non-standard C++ libraries. >> > >> > 2) ?Using JNI as an alternative solution. JNI programming would be a >> > tedious task and would anyway require understanding of the purpose of >> > underlying C++ code. Hence,has little advantage over rewriting the >> > equivalent Java code. A significant advantage can be seen when there is >> > no >> > efficient Java alternative of the C++ code. However, platform dependence >> > would still exist. >> > >> > According to my understanding of the problem, a hybrid approach can be >> > taken up which includes using code converters for simpler files, manual >> > coding for tricky areas and using JNI for typical C++ code involving >> > non-standard libraries. But, I am still not clear about my exact course >> > of >> > action. >> > >> > Can you please tell me if my analysis of the problem is correct? Please >> > also comment on the feasibility of my suggested approach and please make >> > any suggestions as they would help me in improving my application draft >> > that I would soon be sharing for review. >> > >> > As BLAST is a collection of programs, so, keeping in mind the length of >> > code to be ported, can we work on certain selectively critical programs >> > in >> > it from the GSoC's perspective? >> > >> > >> > Thanks. >> > >> > -- >> > *Dhruv Sharma* >> > *Student >> > B.E.(Hons.) Computer Science >> > BITS, Pilani >> > * >> > *India* >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > -- > Dhruv Sharma > Student > B.E.(Hons.) Computer Science > BITS, Pilani > India > From HWillis at scripps.edu Sun Apr 1 23:30:46 2012 From: HWillis at scripps.edu (Scooter Willis) Date: Sun, 1 Apr 2012 23:30:46 -0400 Subject: [Biojava-l] Suggestion for porting GHMM library In-Reply-To: Message-ID: I think HMMER implementation should be viewed as the source code of interest. When they went from HMMER2 to HMMER3 significant changes in what answer you get. On 4/1/12 8:15 PM, "Andreas Prlic" wrote: >That could work in terms of license and would be an interesting >feature to have. I am still slightly concerned that the scale of the >project might be too big and it might be difficult to accomplish this >during the limited time of the project. > >Andreas > >On Sun, Apr 1, 2012 at 2:32 PM, Dhruv Sharma >wrote: >> Hi Andreas, >> >> In response to our last discussion, I would like to suggest porting >>General >> Hidden Markov Model (GHMM) library (http://ghmm.org/) from C to Java. >> >> The library is licensed under LGPL and is currently available as RC1 >> version. The code is not very big and it is very much possible to port >>100% >> code to Java which would make it not only efficient in comparison to >>use of >> converters or JNI but also make it platform independent. >> >> Would it be possible to add this library to BioJava? >> >> If yes, I would surely like to work on it. >> >> >> >> >> On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic wrote: >>> >>> Hi Dhruv, >>> >>> We are quite flexible regarding the projects and what we are really >>> looking for are sound projects and motivated students. As such our >>> project suggestions are quite open. We will interact with accepted >>> students from remote, so a certain degree of self-sufficiency will be >>> required from the side of the student. >>> >>> If you already see tons of problems coming up during your initial >>> assessment of the project, perhaps focus your proposal on something >>> smaller and more achievable. There are quite a number of interesting >>> algorithms out there and it does not have to be one of the ones >>> suggested by us. >>> >>> Andreas >>> >>> >>> >>> On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma >>> wrote: >>> > Hi, >>> > >>> > I am Dhruv Sharma, a senior undergraduate student pursuing >>>B.E.(Hons.) >>> > Computer Science at BITS, Pilani, India. >>> > >>> > I am very much interested in 'porting BLAST algorithm to Java' as a >>>GSoC >>> > 2012 project. I am proficient and primarily work using Java and C. >>>Also, >>> > I >>> > have past experience of working in C++ before migrating to Java. >>> > However, I >>> > am new to GSoC and haven't used version control in the past. >>> > >>> > My recent project was based on developing a web application in Java >>>for >>> > posting data to remote CS-BLAST web >>> > service with >>> > FASTA sequence, parse and auto-filter its results using the release >>>date >>> > from RCSB PDB and download the >>> > PDB >>> > files. >>> > >>> > Since, the project aims at converting the legacy C/C++ code to Java, >>> > already suggested approaches on the Bio-Java page and my observations >>> > are:- >>> > >>> > 1) Using C++ to Java converters for 100% conversion. I have tried >>> > converting the ncbi-blast-2.2.26 source code using a few freely >>> > available >>> > converters but all of them either crashed or failed to convert even >>> > after I >>> > resolved certain header file dependency issues that emerged. Most >>> > failures >>> > occurred at function calls to non-standard C++ libraries. >>> > >>> > 2) Using JNI as an alternative solution. JNI programming would be a >>> > tedious task and would anyway require understanding of the purpose of >>> > underlying C++ code. Hence,has little advantage over rewriting the >>> > equivalent Java code. A significant advantage can be seen when there >>>is >>> > no >>> > efficient Java alternative of the C++ code. However, platform >>>dependence >>> > would still exist. >>> > >>> > According to my understanding of the problem, a hybrid approach can >>>be >>> > taken up which includes using code converters for simpler files, >>>manual >>> > coding for tricky areas and using JNI for typical C++ code involving >>> > non-standard libraries. But, I am still not clear about my exact >>>course >>> > of >>> > action. >>> > >>> > Can you please tell me if my analysis of the problem is correct? >>>Please >>> > also comment on the feasibility of my suggested approach and please >>>make >>> > any suggestions as they would help me in improving my application >>>draft >>> > that I would soon be sharing for review. >>> > >>> > As BLAST is a collection of programs, so, keeping in mind the length >>>of >>> > code to be ported, can we work on certain selectively critical >>>programs >>> > in >>> > it from the GSoC's perspective? >>> > >>> > >>> > Thanks. >>> > >>> > -- >>> > *Dhruv Sharma* >>> > *Student >>> > B.E.(Hons.) Computer Science >>> > BITS, Pilani >>> > * >>> > *India* >>> > _______________________________________________ >>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> -- >> Dhruv Sharma >> Student >> B.E.(Hons.) Computer Science >> BITS, Pilani >> India >> > >_______________________________________________ >Biojava-l mailing list - Biojava-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biojava-l From sharma.dhrv at gmail.com Mon Apr 2 01:21:38 2012 From: sharma.dhrv at gmail.com (Dhruv Sharma) Date: Mon, 2 Apr 2012 10:51:38 +0530 Subject: [Biojava-l] Suggestion for porting GHMM library In-Reply-To: References: Message-ID: Then I think I'll stick to Hmmer3 implementation as far as GSoC is concerned. I hope the licensing issues are sorted out soon. Thanks! -- *Dhruv Sharma* *Student B.E.(Hons.) Computer Science BITS, Pilani * *India* On Mon, Apr 2, 2012 at 9:00 AM, Scooter Willis wrote: > I think HMMER implementation should be viewed as the source code of > interest. When they went from HMMER2 to HMMER3 significant changes in what > answer you get. > > On 4/1/12 8:15 PM, "Andreas Prlic" wrote: > > >That could work in terms of license and would be an interesting > >feature to have. I am still slightly concerned that the scale of the > >project might be too big and it might be difficult to accomplish this > >during the limited time of the project. > > > >Andreas > > > >On Sun, Apr 1, 2012 at 2:32 PM, Dhruv Sharma > >wrote: > >> Hi Andreas, > >> > >> In response to our last discussion, I would like to suggest porting > >>General > >> Hidden Markov Model (GHMM) library (http://ghmm.org/) from C to Java. > >> > >> The library is licensed under LGPL and is currently available as RC1 > >> version. The code is not very big and it is very much possible to port > >>100% > >> code to Java which would make it not only efficient in comparison to > >>use of > >> converters or JNI but also make it platform independent. > >> > >> Would it be possible to add this library to BioJava? > >> > >> If yes, I would surely like to work on it. > >> > >> > >> > >> > >> On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic > wrote: > >>> > >>> Hi Dhruv, > >>> > >>> We are quite flexible regarding the projects and what we are really > >>> looking for are sound projects and motivated students. As such our > >>> project suggestions are quite open. We will interact with accepted > >>> students from remote, so a certain degree of self-sufficiency will be > >>> required from the side of the student. > >>> > >>> If you already see tons of problems coming up during your initial > >>> assessment of the project, perhaps focus your proposal on something > >>> smaller and more achievable. There are quite a number of interesting > >>> algorithms out there and it does not have to be one of the ones > >>> suggested by us. > >>> > >>> Andreas > >>> > >>> > >>> > >>> On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma > >>> wrote: > >>> > Hi, > >>> > > >>> > I am Dhruv Sharma, a senior undergraduate student pursuing > >>>B.E.(Hons.) > >>> > Computer Science at BITS, Pilani, India. > >>> > > >>> > I am very much interested in 'porting BLAST algorithm to Java' as a > >>>GSoC > >>> > 2012 project. I am proficient and primarily work using Java and C. > >>>Also, > >>> > I > >>> > have past experience of working in C++ before migrating to Java. > >>> > However, I > >>> > am new to GSoC and haven't used version control in the past. > >>> > > >>> > My recent project was based on developing a web application in Java > >>>for > >>> > posting data to remote CS-BLAST web > >>> > service with > >>> > FASTA sequence, parse and auto-filter its results using the release > >>>date > >>> > from RCSB PDB and download > the > >>> > PDB > >>> > files. > >>> > > >>> > Since, the project aims at converting the legacy C/C++ code to Java, > >>> > already suggested approaches on the Bio-Java page and my observations > >>> > are:- > >>> > > >>> > 1) Using C++ to Java converters for 100% conversion. I have tried > >>> > converting the ncbi-blast-2.2.26 source code using a few freely > >>> > available > >>> > converters but all of them either crashed or failed to convert even > >>> > after I > >>> > resolved certain header file dependency issues that emerged. Most > >>> > failures > >>> > occurred at function calls to non-standard C++ libraries. > >>> > > >>> > 2) Using JNI as an alternative solution. JNI programming would be a > >>> > tedious task and would anyway require understanding of the purpose of > >>> > underlying C++ code. Hence,has little advantage over rewriting the > >>> > equivalent Java code. A significant advantage can be seen when there > >>>is > >>> > no > >>> > efficient Java alternative of the C++ code. However, platform > >>>dependence > >>> > would still exist. > >>> > > >>> > According to my understanding of the problem, a hybrid approach can > >>>be > >>> > taken up which includes using code converters for simpler files, > >>>manual > >>> > coding for tricky areas and using JNI for typical C++ code involving > >>> > non-standard libraries. But, I am still not clear about my exact > >>>course > >>> > of > >>> > action. > >>> > > >>> > Can you please tell me if my analysis of the problem is correct? > >>>Please > >>> > also comment on the feasibility of my suggested approach and please > >>>make > >>> > any suggestions as they would help me in improving my application > >>>draft > >>> > that I would soon be sharing for review. > >>> > > >>> > As BLAST is a collection of programs, so, keeping in mind the length > >>>of > >>> > code to be ported, can we work on certain selectively critical > >>>programs > >>> > in > >>> > it from the GSoC's perspective? > >>> > > >>> > > >>> > Thanks. > >>> > > >>> > -- > >>> > *Dhruv Sharma* > >>> > *Student > >>> > B.E.(Hons.) Computer Science > >>> > BITS, Pilani > >>> > * > >>> > *India* > >>> > _______________________________________________ > >>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > >> > >> > >> -- > >> Dhruv Sharma > >> Student > >> B.E.(Hons.) Computer Science > >> BITS, Pilani > >> India > >> > > > >_______________________________________________ > >Biojava-l mailing list - Biojava-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/biojava-l > > From andreas at sdsc.edu Mon Apr 2 09:43:51 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 2 Apr 2012 06:43:51 -0700 Subject: [Biojava-l] Suggestion for porting GHMM library In-Reply-To: References: Message-ID: I wanted to wait if we actually get a strong project proposals before contacting the Hmmer folks. At this stage I have seen a lot of interest, but not a single proposal being submitted for this. Andreas On Sun, Apr 1, 2012 at 10:21 PM, Dhruv Sharma wrote: > Then I think I'll stick to Hmmer3 implementation as far as GSoC is > concerned. I hope the licensing issues are sorted out soon. > > Thanks! > > > -- > Dhruv Sharma > Student > B.E.(Hons.) Computer Science > BITS, Pilani > India > > > On Mon, Apr 2, 2012 at 9:00 AM, Scooter Willis wrote: >> >> I think HMMER implementation should be viewed as the source code of >> interest. When they went from HMMER2 to HMMER3 significant changes in what >> answer you get. >> >> On 4/1/12 8:15 PM, "Andreas Prlic" wrote: >> >> >That could work in terms of license and would be an interesting >> >feature to have. I am still slightly concerned that the scale of the >> >project might be too big and it might be difficult to accomplish this >> >during the limited time of the project. >> > >> >Andreas >> > >> >On Sun, Apr 1, 2012 at 2:32 PM, Dhruv Sharma >> >wrote: >> >> Hi Andreas, >> >> >> >> In response to our last discussion, I would like to suggest porting >> >>General >> >> Hidden Markov Model (GHMM) library (http://ghmm.org/) from C to Java. >> >> >> >> The library is licensed under LGPL and is currently available as RC1 >> >> version. The code is not very big and it is very much possible to port >> >>100% >> >> code to Java which would make it not only efficient in comparison to >> >>use of >> >> converters or JNI but also make it platform independent. >> >> >> >> Would it be possible to add this library to BioJava? >> >> >> >> If yes, I would surely like to work on it. >> >> >> >> >> >> >> >> >> >> On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic >> >> wrote: >> >>> >> >>> Hi Dhruv, >> >>> >> >>> We are quite flexible regarding the projects and what we are really >> >>> looking for are sound projects ?and motivated students. As such our >> >>> project suggestions are quite open. We will interact with accepted >> >>> students from remote, so a certain degree of self-sufficiency will be >> >>> required from the side of the student. >> >>> >> >>> If you already see tons of problems coming up during your initial >> >>> assessment of the project, perhaps focus your proposal on something >> >>> smaller and more achievable. There are quite a number of interesting >> >>> algorithms out there and it does not have to be one of the ones >> >>> suggested by us. >> >>> >> >>> Andreas >> >>> >> >>> >> >>> >> >>> On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > I am Dhruv Sharma, a senior undergraduate student pursuing >> >>>B.E.(Hons.) >> >>> > Computer Science at BITS, Pilani, India. >> >>> > >> >>> > I am very much interested in 'porting BLAST algorithm to Java' as a >> >>>GSoC >> >>> > 2012 project. I am proficient and primarily work using Java and C. >> >>>Also, >> >>> > I >> >>> > have past experience of working in C++ before migrating to Java. >> >>> > However, I >> >>> > am new to GSoC and haven't used version control in the past. >> >>> > >> >>> > My recent project was based on developing a web application in Java >> >>>for >> >>> > posting data to remote CS-BLAST web >> >>> > service with >> >>> > FASTA sequence, parse and auto-filter its results using the release >> >>>date >> >>> > from RCSB PDB and download >> >>> > the >> >>> > PDB >> >>> > files. >> >>> > >> >>> > Since, the project aims at converting the legacy C/C++ code to Java, >> >>> > already suggested approaches on the Bio-Java page and my >> >>> > observations >> >>> > are:- >> >>> > >> >>> > 1) ?Using C++ to Java converters for 100% conversion. I have tried >> >>> > converting the ncbi-blast-2.2.26 source code using a few freely >> >>> > available >> >>> > converters but all of them either crashed or failed to convert even >> >>> > after I >> >>> > resolved certain header file dependency issues that emerged. Most >> >>> > failures >> >>> > occurred at function calls to non-standard C++ libraries. >> >>> > >> >>> > 2) ?Using JNI as an alternative solution. JNI programming would be a >> >>> > tedious task and would anyway require understanding of the purpose >> >>> > of >> >>> > underlying C++ code. Hence,has little advantage over rewriting the >> >>> > equivalent Java code. A significant advantage can be seen when there >> >>>is >> >>> > no >> >>> > efficient Java alternative of the C++ code. However, platform >> >>>dependence >> >>> > would still exist. >> >>> > >> >>> > According to my understanding of the problem, a hybrid approach can >> >>>be >> >>> > taken up which includes using code converters for simpler files, >> >>>manual >> >>> > coding for tricky areas and using JNI for typical C++ code involving >> >>> > non-standard libraries. But, I am still not clear about my exact >> >>>course >> >>> > of >> >>> > action. >> >>> > >> >>> > Can you please tell me if my analysis of the problem is correct? >> >>>Please >> >>> > also comment on the feasibility of my suggested approach and please >> >>>make >> >>> > any suggestions as they would help me in improving my application >> >>>draft >> >>> > that I would soon be sharing for review. >> >>> > >> >>> > As BLAST is a collection of programs, so, keeping in mind the length >> >>>of >> >>> > code to be ported, can we work on certain selectively critical >> >>>programs >> >>> > in >> >>> > it from the GSoC's perspective? >> >>> > >> >>> > >> >>> > Thanks. >> >>> > >> >>> > -- >> >>> > *Dhruv Sharma* >> >>> > *Student >> >>> > B.E.(Hons.) Computer Science >> >>> > BITS, Pilani >> >>> > * >> >>> > *India* >> >>> > _______________________________________________ >> >>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> >> >> >> >> >> -- >> >> Dhruv Sharma >> >> Student >> >> B.E.(Hons.) Computer Science >> >> BITS, Pilani >> >> India >> >> >> > >> >_______________________________________________ >> >Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > From heuermh at gmail.com Mon Apr 2 15:37:12 2012 From: heuermh at gmail.com (Michael Heuer) Date: Mon, 2 Apr 2012 14:37:12 -0500 Subject: [Biojava-l] BioJava Legacy 1.8.2 released Message-ID: BioJava Legacy 1.8.2 has been released and is available from http://biojava.org/wiki/BioJava:Download_1.8.2 as well as from the BioJava maven repository at http://www.biojava.org/download/maven/ . BioJava Legacy 1.8.2 adds several new features and bug fixes - Added jdk 1.5+ generics to biojavax module - Improvements to Locations to support circularity - Fixes for Date formatting - Added streaming and low-level parsers to FASTQ package to greatly improve performance - Added FastqTools class for converting FASTQ-formatted sequences into biojava-legacy SymbolLists, Sequences, and PhredSequences This release would not have been possible with contributions from numerous people, thanks to all for their support! About BioJava: BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats, and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language. About BioJava Legacy: BioJava Legacy is a continuation of the version 1.x releases of BioJava. The most recent release of BioJava 3 is version 3.0.3. michael From andreas at sdsc.edu Wed Apr 4 12:32:00 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 4 Apr 2012 09:32:00 -0700 Subject: [Biojava-l] [Biojava-dev] Port an Algorithm to Java In-Reply-To: References: Message-ID: I recommend a very good read - Effective Java from Joshua Bloch. also there are a couple of good online articles about the topic of immutable objects. Andreas On Wed, Apr 4, 2012 at 8:06 AM, Dragos-Bogdan Sima wrote: > Hello, > > I hava an important question. What would it be the best method to treat > constant objects in java? > I am thinking to write an Immuble interface that provides API for just the > const methods. Then if I return or pass objects of type immubable, the > degree of safety would be the same as in C++. > > Thank you, > Dragos. From arthur.oviedo at epfl.ch Thu Apr 5 10:39:12 2012 From: arthur.oviedo at epfl.ch (Arthur Oviedo) Date: Thu, 5 Apr 2012 16:39:12 +0200 Subject: [Biojava-l] Interested in the "cloudization" of BioJava In-Reply-To: References: Message-ID: Hello biojava again, After giving some thoughts about the possible ways to apply cloudization to modules in bio-java i have identified some possibilites: 1) The first one and the one i find most interesting can be to try to introduce the map-reduce framework to help to speed-up the pairwise alignment in the creation of the muliple sequence alignment. I see that biojava implements the CLUSTAL algorithm, and I have some experience with MSA programs, and it is known that the pairwise alignment it's the most demanding part of this algorithm when the number of sequences increases. This version of map-reduce all-to-all sequence alignment can also be used in the future if other progressive alignment algorithms are to be implemented (Maybe T-COFFE or others) 2)If the input files are big enough, it can be interesting to perform the parsing on this files while using a distributed infrastructure to speedup the process, in this case the map reduce framework would paralelize this process by splitting the input file in several chunks and making the parsing of the sequences that are in each chunk. 3)Another idea can be to try to have a hadoopify version of blast, in which the input file also can be splitted and then for each sequence in a chunk, the node would perform a local blast query. Since bio-java doesn't implement yet a blast version (Which i see is another GSoC project), this idea would require to make a wrapper to execute the ncbi blast program and then joining the results. Thanks for your feedback, which i'm hoping in order to submit my proposal Best regards! On Fri, Mar 30, 2012 at 6:35 PM, Arthur Oviedo wrote: > Hello, > My name is Arthur, and i'm a master student at EPFL (?cole Polytechnique > F?d?rale de Lausanne) in computer science. > I worked in different project that are somewhat related to BioJava and > cloud environment. > I have worked , while i was research assistant, (briefly) in a project > called UnaCloud ( > http://sistemas.uniandes.edu.co/~unacloud/dokuwiki/doku.php?id=recursos:documentacion) > which provides an opportunistic grid/cloud infrastructure for running > scientific experiments and we have used it to help bio-informaticians with > their different jobs like huge BLAST queryes, HMMER jobs, etc. > As part of my assistant work in the same university, I developed a cool > system called UnaCloud MSA which integrates some existing and mew developed > tools to analyze Multiple Sequence Alignments. It even uses the BioJava > library to perform some verification about the sequences. All of this is > also done employing the UnaCloud infrastructure. This work is still in > development and in preparation for publication. > http://unacloudmsa.uniandes.edu.co > Currently, i'm working on a class project on Hadoop (An implementation of > subset of the functionalities of a Database Manager System) using Hadoop > (Map-reduce) framework. > All of the mentioned projects have been implemented in Java, so i suppose > that i meet the java expertise requirement. > I would like to know more about this project and to know also the rough > dates where the Google Summer of Code would be held (To prepare my > schedule). > Thanks and best regards, > Arthur Oviedo > From andreas at sdsc.edu Thu Apr 5 11:49:08 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 5 Apr 2012 08:49:08 -0700 Subject: [Biojava-l] Interested in the "cloudization" of BioJava In-Reply-To: References: Message-ID: Hi Arthur, > 1) The first one and the one i find most interesting can be to try to > introduce the map-reduce framework to help to speed-up the pairwise > alignment in the creation of the muliple sequence alignment. That would be a possible application. > 2)If the input files are big enough, it can be interesting to perform the > parsing on this files while using a distributed infrastructure to speedup > the process, I am not sure if I have encountered such large files as of yet. Do you have an example? > 3)Another idea can be to try to have a hadoopify version of blast, in which > the input file also can be splitted and then for each sequence in a chunk, > the node would perform a local blast query. I agree, another possible application... What frameworks did you think about using? Andreas From andreas at sdsc.edu Thu Apr 5 11:57:36 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 5 Apr 2012 08:57:36 -0700 Subject: [Biojava-l] [Biojava-dev] Port an Algorithm to Java In-Reply-To: References: Message-ID: Hi Dragos, it contains a good list of technical issues that might come up. Can you also add a section about what additional benefits can be added if this is done in Java? Andreas On Wed, Apr 4, 2012 at 6:20 PM, Dragos-Bogdan Sima wrote: > I have submitted a draft aplication. > Could you provide me some feedback? > > http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dbsima/1 From to.petr at gmail.com Thu Apr 5 18:43:59 2012 From: to.petr at gmail.com (P. Troshin) Date: Thu, 5 Apr 2012 23:43:59 +0100 Subject: [Biojava-l] [Biojava-dev] Port an Algorithm to Java In-Reply-To: References: Message-ID: >> I hava an important question. What would it be the best method to treat >> constant objects in java? Make a class(es) and define your constants there, then import it statically. Make your constants public static and final. You may want to implement some of the constant as Enums (http://docs.oracle.com/javase/1.5.0/docs/guide/language/enums.html). Good luck with your project. Regards, Peter On 4 April 2012 17:32, Andreas Prlic wrote: > I recommend a very good read - Effective Java from Joshua Bloch. > > also there are a couple of good online articles about the topic of > immutable objects. > > Andreas > > On Wed, Apr 4, 2012 at 8:06 AM, Dragos-Bogdan Sima > wrote: >> Hello, >> >> I hava an important question. What would it be the best method to treat >> constant objects in java? >> I am thinking to write an Immuble interface that provides API for just the >> const methods. Then if I return or pass objects of type immubable, the >> degree of safety would be the same as in C++. >> >> Thank you, >> Dragos. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From mictadlo at gmail.com Fri Apr 6 05:17:33 2012 From: mictadlo at gmail.com (Mic) Date: Fri, 6 Apr 2012 19:17:33 +1000 Subject: [Biojava-l] Interested in the "cloudization" of BioJava In-Reply-To: References: Message-ID: I have never tried it out by myself. In the next 3 years some GPUs will have 20 000 cores. Looking at this http://jogamp.org/jocl/www/ benchmark it is a huge difference between GPU and CPU code. In Aparapi http://blogs.amd.com/developer/2011/09/14/i-dont-always-write-gpu-code-in-java-but-when-i-do-i-like-to-use-aparapi/ you write everything in Java. On Fri, Apr 6, 2012 at 3:28 PM, Andreas Prlic wrote: > Do you have any experience with that? I don;t have first hand, but > from what I know GPU programming is very much hardware specific and > not as nicely platform independent as Java... > > Andreas > > On Thu, Apr 5, 2012 at 6:37 PM, Mic wrote: > > maybe also to include OpenCL in order to able to run it on GPU. > > > > On Fri, Apr 6, 2012 at 1:49 AM, Andreas Prlic wrote: > >> > >> Hi Arthur, > >> > >> > 1) The first one and the one i find most interesting can be to try to > >> > introduce the map-reduce framework to help to speed-up the pairwise > >> > alignment in the creation of the muliple sequence alignment. > >> > >> That would be a possible application. > >> > >> > 2)If the input files are big enough, it can be interesting to perform > >> > the > >> > parsing on this files while using a distributed infrastructure to > >> > speedup > >> > the process, > >> > >> I am not sure if I have encountered such large files as of yet. Do you > >> have an example? > >> > >> > 3)Another idea can be to try to have a hadoopify version of blast, in > >> > which > >> > the input file also can be splitted and then for each sequence in a > >> > chunk, > >> > the node would perform a local blast query. > >> > >> I agree, another possible application... > >> > >> What frameworks did you think about using? > >> > >> Andreas > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > From rbuels at gmail.com Mon Apr 9 10:57:45 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 09 Apr 2012 10:57:45 -0400 Subject: [Biojava-l] Google Summer of Code mentors Message-ID: <4F82F8E9.40401@gmail.com> Hi all, Reminder: if you want to help mentor Google Summer of Code students to work on your Bio* project, you have to do four things: 1. Make sure you have enough time to actually help a student over the summer 2. Sign up as a mentor for the Open Bioinformatics Foundation at http://www.google-melange.com/gsoc/homepage/google/gsoc2012 3. Join the OBF Google Summer of Code mailing lists at: http://lists.open-bio.org/mailman/listinfo/gsoc and http://lists.open-bio.org/mailman/listinfo/gsoc-mentors 4. After your request to be a mentor is accepted by me, log into the GSoC web interface at http://www.google-melange.com (the same web application you used to sign up) and help look at and evaluate this year's student proposals. Robert Buels 2012 OBF GSoC Org. Admin. From andreas at sdsc.edu Mon Apr 9 17:50:14 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 9 Apr 2012 14:50:14 -0700 Subject: [Biojava-l] [Biojava-dev] Port an Algorithm to Java In-Reply-To: References: Message-ID: Sorry for the slow response, I am mostly offline this week. I don't know that manual. There is also an O'reilly's book on Blast if you are interested to read up more. Andreas On Sat, Apr 7, 2012 at 1:09 PM, Dragos-Bogdan Sima wrote: > Hello, > > I found this manual:?The Developer?s Guide to BLAST by Jason Papadopoulos. > Is it a good lecture? > > Thank you. > Dragos. -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From p.j.a.cock at googlemail.com Mon Apr 23 18:55:55 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 Apr 2012 23:55:55 +0100 Subject: [Biojava-l] [Biojava-dev] GSoC 2012 In-Reply-To: References: Message-ID: Hello Dragos-Bogdan, On Mon, Apr 23, 2012 at 9:33 PM, Dragos-Bogdan Sima wrote: > Hello everyone, > > How come there was no BioJava project accepted this year, > and only BioRuby and BioPython? > Was the interest greater on those languages or simply the > proposals were better? There should be an official email from the OBF representative today (or tomorrow in my time zone) announcing the results - but I imagine as one of the applicants you've been contacted via Google already. So for now I'll briefly confirm that, yes, the number of applications by Bio* project varied, but the student proposals are ranked on merit regardless of which Bio* project they are for. See: http://www.open-bio.org/wiki/Google_Summer_of_Code_Application_Evaluation This is linked to from the main OBF GSoC page, http://www.open-bio.org/wiki/Google_Summer_of_Code Regards, Peter From rbuels at gmail.com Mon Apr 23 19:49:10 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 23 Apr 2012 19:49:10 -0400 Subject: [Biojava-l] Announcing OBF Google Summer of Code Accepted Students Message-ID: <4F95EA76.4030004@gmail.com> Hello all, I'm very pleased and excited to announce that the Open Bioinformatics Foundation has selected 5 very capable students to work on OBF projects this summer as part of the Google Summer of Code program. The accepted students, their projects, and their mentors (in alphabetical order): Wibowo Arindrarto SearchIO Implementation in Biopython mentored by Peter Cock Lenna Peterson Diff My DNA: Development of a Genomic Variant Toolkit for Biopython mentored by Brad Chapman Marjan Povolni The worlds fastest parallelized GFF3/GTF parser in D, and an interfacing biogem plugin for Ruby mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Artem Tarasov Fast parallelized GFF3/GTF parser in C++, with Ruby FFI bindings mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Clayton Wheeler Multiple Alignment Format parser for BioRuby mentored by Francesco Strozzi and Raoul Bonnal As in every year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF's open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received. For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it's time to "put your money where your mouth is", as the saying goes. Let's get out there and write some great code this summer! Best regards, Rob ---- Robert Buels OBF GSoC 2012 Administrator From andreas at sdsc.edu Mon Apr 23 22:43:49 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 23 Apr 2012 19:43:49 -0700 Subject: [Biojava-l] gsoc update Message-ID: Hi, As you have probably read by now, this year's OBF students for the Google summer of code are going to other Bio* projects, and none for BioJava. This has to do with several factors: - Overall applications were down by 53% this year - None of the BioJava related proposals was scored high enough In order to help students prepare stronger proposals for next year, I believe we should try to prepare things differently: - More concrete project topics from our side. Our approach to provide open topics and let student fill in details did not work well this year. - Less challenging topics, some of our topics were perhaps too difficult. - Overall we should try to have more mentors comment on the list and help students prepare good project plans. - More marketing so we can spread the word to more students. I want to thank all students and potential mentors who invested time into this. Even if we did not succeed this time, I do hope we all learnt something in the process and can find a way to work together on BioJava also beyond the scope of GSoC. Andreas From andreas at sdsc.edu Sun Apr 1 17:54:30 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 1 Apr 2012 10:54:30 -0700 Subject: [Biojava-l] GSoC Application Discussion and Help - Porting BLAST to Java In-Reply-To: References: Message-ID: Hi Dhruv, We are quite flexible regarding the projects and what we are really looking for are sound projects and motivated students. As such our project suggestions are quite open. We will interact with accepted students from remote, so a certain degree of self-sufficiency will be required from the side of the student. If you already see tons of problems coming up during your initial assessment of the project, perhaps focus your proposal on something smaller and more achievable. There are quite a number of interesting algorithms out there and it does not have to be one of the ones suggested by us. Andreas On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma wrote: > Hi, > > I am Dhruv Sharma, a senior undergraduate student pursuing B.E.(Hons.) > Computer Science at BITS, Pilani, India. > > I am very much interested in 'porting BLAST algorithm to Java' as a GSoC > 2012 project. I am proficient and primarily work using Java and C. Also, I > have past experience of working in C++ before migrating to Java. However, I > am new to GSoC and haven't used version control in the past. > > My recent project was based on developing a web application in Java for > posting data to remote CS-BLAST web > service with > FASTA sequence, parse and auto-filter its results using the release date > from RCSB PDB and download the PDB > files. > > Since, the project aims at converting the legacy C/C++ code to Java, > already suggested approaches on the Bio-Java page and my observations are:- > > 1) ?Using C++ to Java converters for 100% conversion. I have tried > converting the ncbi-blast-2.2.26 source code using a few freely available > converters but all of them either crashed or failed to convert even after I > resolved certain header file dependency issues that emerged. Most failures > occurred at function calls to non-standard C++ libraries. > > 2) ?Using JNI as an alternative solution. JNI programming would be a > tedious task and would anyway require understanding of the purpose of > underlying C++ code. Hence,has little advantage over rewriting the > equivalent Java code. A significant advantage can be seen when there is no > efficient Java alternative of the C++ code. However, platform dependence > would still exist. > > According to my understanding of the problem, a hybrid approach can be > taken up which includes using code converters for simpler files, manual > coding for tricky areas and using JNI for typical C++ code involving > non-standard libraries. But, I am still not clear about my exact course of > action. > > Can you please tell me if my analysis of the problem is correct? Please > also comment on the feasibility of my suggested approach and please make > any suggestions as they would help me in improving my application draft > that I would soon be sharing for review. > > As BLAST is a collection of programs, so, keeping in mind the length of > code to be ported, can we work on certain selectively critical programs in > it from the GSoC's perspective? > > > Thanks. > > -- > *Dhruv Sharma* > *Student > B.E.(Hons.) Computer Science > BITS, Pilani > * > *India* > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From sharma.dhrv at gmail.com Sun Apr 1 21:32:31 2012 From: sharma.dhrv at gmail.com (Dhruv Sharma) Date: Mon, 2 Apr 2012 03:02:31 +0530 Subject: [Biojava-l] Suggestion for porting GHMM library Message-ID: Hi Andreas, In response to our last discussion, I would like to suggest porting General Hidden Markov Model (GHMM) library (http://ghmm.org/) from C to Java. The library is licensed under LGPL and is currently available as RC1 version. The code is not very big and it is very much possible to port 100% code to Java which would make it not only efficient in comparison to use of converters or JNI but also make it platform independent. Would it be possible to add this library to BioJava? If yes, I would surely like to work on it. On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic wrote: > Hi Dhruv, > > We are quite flexible regarding the projects and what we are really > looking for are sound projects and motivated students. As such our > project suggestions are quite open. We will interact with accepted > students from remote, so a certain degree of self-sufficiency will be > required from the side of the student. > > If you already see tons of problems coming up during your initial > assessment of the project, perhaps focus your proposal on something > smaller and more achievable. There are quite a number of interesting > algorithms out there and it does not have to be one of the ones > suggested by us. > > Andreas > > > > On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma > wrote: > > Hi, > > > > I am Dhruv Sharma, a senior undergraduate student pursuing B.E.(Hons.) > > Computer Science at BITS, Pilani, India. > > > > I am very much interested in 'porting BLAST algorithm to Java' as a GSoC > > 2012 project. I am proficient and primarily work using Java and C. Also, > I > > have past experience of working in C++ before migrating to Java. > However, I > > am new to GSoC and haven't used version control in the past. > > > > My recent project was based on developing a web application in Java for > > posting data to remote CS-BLAST web > > service with > > FASTA sequence, parse and auto-filter its results using the release date > > from RCSB PDB and download the > PDB > > files. > > > > Since, the project aims at converting the legacy C/C++ code to Java, > > already suggested approaches on the Bio-Java page and my observations > are:- > > > > 1) Using C++ to Java converters for 100% conversion. I have tried > > converting the ncbi-blast-2.2.26 source code using a few freely available > > converters but all of them either crashed or failed to convert even > after I > > resolved certain header file dependency issues that emerged. Most > failures > > occurred at function calls to non-standard C++ libraries. > > > > 2) Using JNI as an alternative solution. JNI programming would be a > > tedious task and would anyway require understanding of the purpose of > > underlying C++ code. Hence,has little advantage over rewriting the > > equivalent Java code. A significant advantage can be seen when there is > no > > efficient Java alternative of the C++ code. However, platform dependence > > would still exist. > > > > According to my understanding of the problem, a hybrid approach can be > > taken up which includes using code converters for simpler files, manual > > coding for tricky areas and using JNI for typical C++ code involving > > non-standard libraries. But, I am still not clear about my exact course > of > > action. > > > > Can you please tell me if my analysis of the problem is correct? Please > > also comment on the feasibility of my suggested approach and please make > > any suggestions as they would help me in improving my application draft > > that I would soon be sharing for review. > > > > As BLAST is a collection of programs, so, keeping in mind the length of > > code to be ported, can we work on certain selectively critical programs > in > > it from the GSoC's perspective? > > > > > > Thanks. > > > > -- > > *Dhruv Sharma* > > *Student > > B.E.(Hons.) Computer Science > > BITS, Pilani > > * > > *India* > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- *Dhruv Sharma* *Student B.E.(Hons.) Computer Science BITS, Pilani * *India* From andreas at sdsc.edu Mon Apr 2 00:15:38 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 1 Apr 2012 17:15:38 -0700 Subject: [Biojava-l] Suggestion for porting GHMM library In-Reply-To: References: Message-ID: That could work in terms of license and would be an interesting feature to have. I am still slightly concerned that the scale of the project might be too big and it might be difficult to accomplish this during the limited time of the project. Andreas On Sun, Apr 1, 2012 at 2:32 PM, Dhruv Sharma wrote: > Hi Andreas, > > In response to our last discussion, I would like to suggest porting General > Hidden Markov Model?(GHMM) library (http://ghmm.org/) from C to Java. > > The library is licensed under LGPL and is currently available as RC1 > version. The code is not very big and it is very much possible to port 100% > code to Java which would make it not only efficient in comparison to use of > converters or JNI but also make it platform independent. > > Would it be possible to add this library to BioJava? > > If yes, I would surely like to work on it. > > > > > On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic wrote: >> >> Hi Dhruv, >> >> We are quite flexible regarding the projects and what we are really >> looking for are sound projects ?and motivated students. As such our >> project suggestions are quite open. We will interact with accepted >> students from remote, so a certain degree of self-sufficiency will be >> required from the side of the student. >> >> If you already see tons of problems coming up during your initial >> assessment of the project, perhaps focus your proposal on something >> smaller and more achievable. There are quite a number of interesting >> algorithms out there and it does not have to be one of the ones >> suggested by us. >> >> Andreas >> >> >> >> On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma >> wrote: >> > Hi, >> > >> > I am Dhruv Sharma, a senior undergraduate student pursuing B.E.(Hons.) >> > Computer Science at BITS, Pilani, India. >> > >> > I am very much interested in 'porting BLAST algorithm to Java' as a GSoC >> > 2012 project. I am proficient and primarily work using Java and C. Also, >> > I >> > have past experience of working in C++ before migrating to Java. >> > However, I >> > am new to GSoC and haven't used version control in the past. >> > >> > My recent project was based on developing a web application in Java for >> > posting data to remote CS-BLAST web >> > service with >> > FASTA sequence, parse and auto-filter its results using the release date >> > from RCSB PDB and download the >> > PDB >> > files. >> > >> > Since, the project aims at converting the legacy C/C++ code to Java, >> > already suggested approaches on the Bio-Java page and my observations >> > are:- >> > >> > 1) ?Using C++ to Java converters for 100% conversion. I have tried >> > converting the ncbi-blast-2.2.26 source code using a few freely >> > available >> > converters but all of them either crashed or failed to convert even >> > after I >> > resolved certain header file dependency issues that emerged. Most >> > failures >> > occurred at function calls to non-standard C++ libraries. >> > >> > 2) ?Using JNI as an alternative solution. JNI programming would be a >> > tedious task and would anyway require understanding of the purpose of >> > underlying C++ code. Hence,has little advantage over rewriting the >> > equivalent Java code. A significant advantage can be seen when there is >> > no >> > efficient Java alternative of the C++ code. However, platform dependence >> > would still exist. >> > >> > According to my understanding of the problem, a hybrid approach can be >> > taken up which includes using code converters for simpler files, manual >> > coding for tricky areas and using JNI for typical C++ code involving >> > non-standard libraries. But, I am still not clear about my exact course >> > of >> > action. >> > >> > Can you please tell me if my analysis of the problem is correct? Please >> > also comment on the feasibility of my suggested approach and please make >> > any suggestions as they would help me in improving my application draft >> > that I would soon be sharing for review. >> > >> > As BLAST is a collection of programs, so, keeping in mind the length of >> > code to be ported, can we work on certain selectively critical programs >> > in >> > it from the GSoC's perspective? >> > >> > >> > Thanks. >> > >> > -- >> > *Dhruv Sharma* >> > *Student >> > B.E.(Hons.) Computer Science >> > BITS, Pilani >> > * >> > *India* >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > -- > Dhruv Sharma > Student > B.E.(Hons.) Computer Science > BITS, Pilani > India > From HWillis at scripps.edu Mon Apr 2 03:30:46 2012 From: HWillis at scripps.edu (Scooter Willis) Date: Sun, 1 Apr 2012 23:30:46 -0400 Subject: [Biojava-l] Suggestion for porting GHMM library In-Reply-To: Message-ID: I think HMMER implementation should be viewed as the source code of interest. When they went from HMMER2 to HMMER3 significant changes in what answer you get. On 4/1/12 8:15 PM, "Andreas Prlic" wrote: >That could work in terms of license and would be an interesting >feature to have. I am still slightly concerned that the scale of the >project might be too big and it might be difficult to accomplish this >during the limited time of the project. > >Andreas > >On Sun, Apr 1, 2012 at 2:32 PM, Dhruv Sharma >wrote: >> Hi Andreas, >> >> In response to our last discussion, I would like to suggest porting >>General >> Hidden Markov Model (GHMM) library (http://ghmm.org/) from C to Java. >> >> The library is licensed under LGPL and is currently available as RC1 >> version. The code is not very big and it is very much possible to port >>100% >> code to Java which would make it not only efficient in comparison to >>use of >> converters or JNI but also make it platform independent. >> >> Would it be possible to add this library to BioJava? >> >> If yes, I would surely like to work on it. >> >> >> >> >> On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic wrote: >>> >>> Hi Dhruv, >>> >>> We are quite flexible regarding the projects and what we are really >>> looking for are sound projects and motivated students. As such our >>> project suggestions are quite open. We will interact with accepted >>> students from remote, so a certain degree of self-sufficiency will be >>> required from the side of the student. >>> >>> If you already see tons of problems coming up during your initial >>> assessment of the project, perhaps focus your proposal on something >>> smaller and more achievable. There are quite a number of interesting >>> algorithms out there and it does not have to be one of the ones >>> suggested by us. >>> >>> Andreas >>> >>> >>> >>> On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma >>> wrote: >>> > Hi, >>> > >>> > I am Dhruv Sharma, a senior undergraduate student pursuing >>>B.E.(Hons.) >>> > Computer Science at BITS, Pilani, India. >>> > >>> > I am very much interested in 'porting BLAST algorithm to Java' as a >>>GSoC >>> > 2012 project. I am proficient and primarily work using Java and C. >>>Also, >>> > I >>> > have past experience of working in C++ before migrating to Java. >>> > However, I >>> > am new to GSoC and haven't used version control in the past. >>> > >>> > My recent project was based on developing a web application in Java >>>for >>> > posting data to remote CS-BLAST web >>> > service with >>> > FASTA sequence, parse and auto-filter its results using the release >>>date >>> > from RCSB PDB and download the >>> > PDB >>> > files. >>> > >>> > Since, the project aims at converting the legacy C/C++ code to Java, >>> > already suggested approaches on the Bio-Java page and my observations >>> > are:- >>> > >>> > 1) Using C++ to Java converters for 100% conversion. I have tried >>> > converting the ncbi-blast-2.2.26 source code using a few freely >>> > available >>> > converters but all of them either crashed or failed to convert even >>> > after I >>> > resolved certain header file dependency issues that emerged. Most >>> > failures >>> > occurred at function calls to non-standard C++ libraries. >>> > >>> > 2) Using JNI as an alternative solution. JNI programming would be a >>> > tedious task and would anyway require understanding of the purpose of >>> > underlying C++ code. Hence,has little advantage over rewriting the >>> > equivalent Java code. A significant advantage can be seen when there >>>is >>> > no >>> > efficient Java alternative of the C++ code. However, platform >>>dependence >>> > would still exist. >>> > >>> > According to my understanding of the problem, a hybrid approach can >>>be >>> > taken up which includes using code converters for simpler files, >>>manual >>> > coding for tricky areas and using JNI for typical C++ code involving >>> > non-standard libraries. But, I am still not clear about my exact >>>course >>> > of >>> > action. >>> > >>> > Can you please tell me if my analysis of the problem is correct? >>>Please >>> > also comment on the feasibility of my suggested approach and please >>>make >>> > any suggestions as they would help me in improving my application >>>draft >>> > that I would soon be sharing for review. >>> > >>> > As BLAST is a collection of programs, so, keeping in mind the length >>>of >>> > code to be ported, can we work on certain selectively critical >>>programs >>> > in >>> > it from the GSoC's perspective? >>> > >>> > >>> > Thanks. >>> > >>> > -- >>> > *Dhruv Sharma* >>> > *Student >>> > B.E.(Hons.) Computer Science >>> > BITS, Pilani >>> > * >>> > *India* >>> > _______________________________________________ >>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> -- >> Dhruv Sharma >> Student >> B.E.(Hons.) Computer Science >> BITS, Pilani >> India >> > >_______________________________________________ >Biojava-l mailing list - Biojava-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biojava-l From sharma.dhrv at gmail.com Mon Apr 2 05:21:38 2012 From: sharma.dhrv at gmail.com (Dhruv Sharma) Date: Mon, 2 Apr 2012 10:51:38 +0530 Subject: [Biojava-l] Suggestion for porting GHMM library In-Reply-To: References: Message-ID: Then I think I'll stick to Hmmer3 implementation as far as GSoC is concerned. I hope the licensing issues are sorted out soon. Thanks! -- *Dhruv Sharma* *Student B.E.(Hons.) Computer Science BITS, Pilani * *India* On Mon, Apr 2, 2012 at 9:00 AM, Scooter Willis wrote: > I think HMMER implementation should be viewed as the source code of > interest. When they went from HMMER2 to HMMER3 significant changes in what > answer you get. > > On 4/1/12 8:15 PM, "Andreas Prlic" wrote: > > >That could work in terms of license and would be an interesting > >feature to have. I am still slightly concerned that the scale of the > >project might be too big and it might be difficult to accomplish this > >during the limited time of the project. > > > >Andreas > > > >On Sun, Apr 1, 2012 at 2:32 PM, Dhruv Sharma > >wrote: > >> Hi Andreas, > >> > >> In response to our last discussion, I would like to suggest porting > >>General > >> Hidden Markov Model (GHMM) library (http://ghmm.org/) from C to Java. > >> > >> The library is licensed under LGPL and is currently available as RC1 > >> version. The code is not very big and it is very much possible to port > >>100% > >> code to Java which would make it not only efficient in comparison to > >>use of > >> converters or JNI but also make it platform independent. > >> > >> Would it be possible to add this library to BioJava? > >> > >> If yes, I would surely like to work on it. > >> > >> > >> > >> > >> On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic > wrote: > >>> > >>> Hi Dhruv, > >>> > >>> We are quite flexible regarding the projects and what we are really > >>> looking for are sound projects and motivated students. As such our > >>> project suggestions are quite open. We will interact with accepted > >>> students from remote, so a certain degree of self-sufficiency will be > >>> required from the side of the student. > >>> > >>> If you already see tons of problems coming up during your initial > >>> assessment of the project, perhaps focus your proposal on something > >>> smaller and more achievable. There are quite a number of interesting > >>> algorithms out there and it does not have to be one of the ones > >>> suggested by us. > >>> > >>> Andreas > >>> > >>> > >>> > >>> On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma > >>> wrote: > >>> > Hi, > >>> > > >>> > I am Dhruv Sharma, a senior undergraduate student pursuing > >>>B.E.(Hons.) > >>> > Computer Science at BITS, Pilani, India. > >>> > > >>> > I am very much interested in 'porting BLAST algorithm to Java' as a > >>>GSoC > >>> > 2012 project. I am proficient and primarily work using Java and C. > >>>Also, > >>> > I > >>> > have past experience of working in C++ before migrating to Java. > >>> > However, I > >>> > am new to GSoC and haven't used version control in the past. > >>> > > >>> > My recent project was based on developing a web application in Java > >>>for > >>> > posting data to remote CS-BLAST web > >>> > service with > >>> > FASTA sequence, parse and auto-filter its results using the release > >>>date > >>> > from RCSB PDB and download > the > >>> > PDB > >>> > files. > >>> > > >>> > Since, the project aims at converting the legacy C/C++ code to Java, > >>> > already suggested approaches on the Bio-Java page and my observations > >>> > are:- > >>> > > >>> > 1) Using C++ to Java converters for 100% conversion. I have tried > >>> > converting the ncbi-blast-2.2.26 source code using a few freely > >>> > available > >>> > converters but all of them either crashed or failed to convert even > >>> > after I > >>> > resolved certain header file dependency issues that emerged. Most > >>> > failures > >>> > occurred at function calls to non-standard C++ libraries. > >>> > > >>> > 2) Using JNI as an alternative solution. JNI programming would be a > >>> > tedious task and would anyway require understanding of the purpose of > >>> > underlying C++ code. Hence,has little advantage over rewriting the > >>> > equivalent Java code. A significant advantage can be seen when there > >>>is > >>> > no > >>> > efficient Java alternative of the C++ code. However, platform > >>>dependence > >>> > would still exist. > >>> > > >>> > According to my understanding of the problem, a hybrid approach can > >>>be > >>> > taken up which includes using code converters for simpler files, > >>>manual > >>> > coding for tricky areas and using JNI for typical C++ code involving > >>> > non-standard libraries. But, I am still not clear about my exact > >>>course > >>> > of > >>> > action. > >>> > > >>> > Can you please tell me if my analysis of the problem is correct? > >>>Please > >>> > also comment on the feasibility of my suggested approach and please > >>>make > >>> > any suggestions as they would help me in improving my application > >>>draft > >>> > that I would soon be sharing for review. > >>> > > >>> > As BLAST is a collection of programs, so, keeping in mind the length > >>>of > >>> > code to be ported, can we work on certain selectively critical > >>>programs > >>> > in > >>> > it from the GSoC's perspective? > >>> > > >>> > > >>> > Thanks. > >>> > > >>> > -- > >>> > *Dhruv Sharma* > >>> > *Student > >>> > B.E.(Hons.) Computer Science > >>> > BITS, Pilani > >>> > * > >>> > *India* > >>> > _______________________________________________ > >>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > >> > >> > >> -- > >> Dhruv Sharma > >> Student > >> B.E.(Hons.) Computer Science > >> BITS, Pilani > >> India > >> > > > >_______________________________________________ > >Biojava-l mailing list - Biojava-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/biojava-l > > From andreas at sdsc.edu Mon Apr 2 13:43:51 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 2 Apr 2012 06:43:51 -0700 Subject: [Biojava-l] Suggestion for porting GHMM library In-Reply-To: References: Message-ID: I wanted to wait if we actually get a strong project proposals before contacting the Hmmer folks. At this stage I have seen a lot of interest, but not a single proposal being submitted for this. Andreas On Sun, Apr 1, 2012 at 10:21 PM, Dhruv Sharma wrote: > Then I think I'll stick to Hmmer3 implementation as far as GSoC is > concerned. I hope the licensing issues are sorted out soon. > > Thanks! > > > -- > Dhruv Sharma > Student > B.E.(Hons.) Computer Science > BITS, Pilani > India > > > On Mon, Apr 2, 2012 at 9:00 AM, Scooter Willis wrote: >> >> I think HMMER implementation should be viewed as the source code of >> interest. When they went from HMMER2 to HMMER3 significant changes in what >> answer you get. >> >> On 4/1/12 8:15 PM, "Andreas Prlic" wrote: >> >> >That could work in terms of license and would be an interesting >> >feature to have. I am still slightly concerned that the scale of the >> >project might be too big and it might be difficult to accomplish this >> >during the limited time of the project. >> > >> >Andreas >> > >> >On Sun, Apr 1, 2012 at 2:32 PM, Dhruv Sharma >> >wrote: >> >> Hi Andreas, >> >> >> >> In response to our last discussion, I would like to suggest porting >> >>General >> >> Hidden Markov Model (GHMM) library (http://ghmm.org/) from C to Java. >> >> >> >> The library is licensed under LGPL and is currently available as RC1 >> >> version. The code is not very big and it is very much possible to port >> >>100% >> >> code to Java which would make it not only efficient in comparison to >> >>use of >> >> converters or JNI but also make it platform independent. >> >> >> >> Would it be possible to add this library to BioJava? >> >> >> >> If yes, I would surely like to work on it. >> >> >> >> >> >> >> >> >> >> On Sun, Apr 1, 2012 at 11:24 PM, Andreas Prlic >> >> wrote: >> >>> >> >>> Hi Dhruv, >> >>> >> >>> We are quite flexible regarding the projects and what we are really >> >>> looking for are sound projects ?and motivated students. As such our >> >>> project suggestions are quite open. We will interact with accepted >> >>> students from remote, so a certain degree of self-sufficiency will be >> >>> required from the side of the student. >> >>> >> >>> If you already see tons of problems coming up during your initial >> >>> assessment of the project, perhaps focus your proposal on something >> >>> smaller and more achievable. There are quite a number of interesting >> >>> algorithms out there and it does not have to be one of the ones >> >>> suggested by us. >> >>> >> >>> Andreas >> >>> >> >>> >> >>> >> >>> On Sat, Mar 31, 2012 at 1:46 PM, Dhruv Sharma >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > I am Dhruv Sharma, a senior undergraduate student pursuing >> >>>B.E.(Hons.) >> >>> > Computer Science at BITS, Pilani, India. >> >>> > >> >>> > I am very much interested in 'porting BLAST algorithm to Java' as a >> >>>GSoC >> >>> > 2012 project. I am proficient and primarily work using Java and C. >> >>>Also, >> >>> > I >> >>> > have past experience of working in C++ before migrating to Java. >> >>> > However, I >> >>> > am new to GSoC and haven't used version control in the past. >> >>> > >> >>> > My recent project was based on developing a web application in Java >> >>>for >> >>> > posting data to remote CS-BLAST web >> >>> > service with >> >>> > FASTA sequence, parse and auto-filter its results using the release >> >>>date >> >>> > from RCSB PDB and download >> >>> > the >> >>> > PDB >> >>> > files. >> >>> > >> >>> > Since, the project aims at converting the legacy C/C++ code to Java, >> >>> > already suggested approaches on the Bio-Java page and my >> >>> > observations >> >>> > are:- >> >>> > >> >>> > 1) ?Using C++ to Java converters for 100% conversion. I have tried >> >>> > converting the ncbi-blast-2.2.26 source code using a few freely >> >>> > available >> >>> > converters but all of them either crashed or failed to convert even >> >>> > after I >> >>> > resolved certain header file dependency issues that emerged. Most >> >>> > failures >> >>> > occurred at function calls to non-standard C++ libraries. >> >>> > >> >>> > 2) ?Using JNI as an alternative solution. JNI programming would be a >> >>> > tedious task and would anyway require understanding of the purpose >> >>> > of >> >>> > underlying C++ code. Hence,has little advantage over rewriting the >> >>> > equivalent Java code. A significant advantage can be seen when there >> >>>is >> >>> > no >> >>> > efficient Java alternative of the C++ code. However, platform >> >>>dependence >> >>> > would still exist. >> >>> > >> >>> > According to my understanding of the problem, a hybrid approach can >> >>>be >> >>> > taken up which includes using code converters for simpler files, >> >>>manual >> >>> > coding for tricky areas and using JNI for typical C++ code involving >> >>> > non-standard libraries. But, I am still not clear about my exact >> >>>course >> >>> > of >> >>> > action. >> >>> > >> >>> > Can you please tell me if my analysis of the problem is correct? >> >>>Please >> >>> > also comment on the feasibility of my suggested approach and please >> >>>make >> >>> > any suggestions as they would help me in improving my application >> >>>draft >> >>> > that I would soon be sharing for review. >> >>> > >> >>> > As BLAST is a collection of programs, so, keeping in mind the length >> >>>of >> >>> > code to be ported, can we work on certain selectively critical >> >>>programs >> >>> > in >> >>> > it from the GSoC's perspective? >> >>> > >> >>> > >> >>> > Thanks. >> >>> > >> >>> > -- >> >>> > *Dhruv Sharma* >> >>> > *Student >> >>> > B.E.(Hons.) Computer Science >> >>> > BITS, Pilani >> >>> > * >> >>> > *India* >> >>> > _______________________________________________ >> >>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> >> >> >> >> >> -- >> >> Dhruv Sharma >> >> Student >> >> B.E.(Hons.) Computer Science >> >> BITS, Pilani >> >> India >> >> >> > >> >_______________________________________________ >> >Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > From heuermh at gmail.com Mon Apr 2 19:37:12 2012 From: heuermh at gmail.com (Michael Heuer) Date: Mon, 2 Apr 2012 14:37:12 -0500 Subject: [Biojava-l] BioJava Legacy 1.8.2 released Message-ID: BioJava Legacy 1.8.2 has been released and is available from http://biojava.org/wiki/BioJava:Download_1.8.2 as well as from the BioJava maven repository at http://www.biojava.org/download/maven/ . BioJava Legacy 1.8.2 adds several new features and bug fixes - Added jdk 1.5+ generics to biojavax module - Improvements to Locations to support circularity - Fixes for Date formatting - Added streaming and low-level parsers to FASTQ package to greatly improve performance - Added FastqTools class for converting FASTQ-formatted sequences into biojava-legacy SymbolLists, Sequences, and PhredSequences This release would not have been possible with contributions from numerous people, thanks to all for their support! About BioJava: BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats, and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language. About BioJava Legacy: BioJava Legacy is a continuation of the version 1.x releases of BioJava. The most recent release of BioJava 3 is version 3.0.3. michael From andreas at sdsc.edu Wed Apr 4 16:32:00 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 4 Apr 2012 09:32:00 -0700 Subject: [Biojava-l] [Biojava-dev] Port an Algorithm to Java In-Reply-To: References: Message-ID: I recommend a very good read - Effective Java from Joshua Bloch. also there are a couple of good online articles about the topic of immutable objects. Andreas On Wed, Apr 4, 2012 at 8:06 AM, Dragos-Bogdan Sima wrote: > Hello, > > I hava an important question. What would it be the best method to treat > constant objects in java? > I am thinking to write an Immuble interface that provides API for just the > const methods. Then if I return or pass objects of type immubable, the > degree of safety would be the same as in C++. > > Thank you, > Dragos. From arthur.oviedo at epfl.ch Thu Apr 5 14:39:12 2012 From: arthur.oviedo at epfl.ch (Arthur Oviedo) Date: Thu, 5 Apr 2012 16:39:12 +0200 Subject: [Biojava-l] Interested in the "cloudization" of BioJava In-Reply-To: References: Message-ID: Hello biojava again, After giving some thoughts about the possible ways to apply cloudization to modules in bio-java i have identified some possibilites: 1) The first one and the one i find most interesting can be to try to introduce the map-reduce framework to help to speed-up the pairwise alignment in the creation of the muliple sequence alignment. I see that biojava implements the CLUSTAL algorithm, and I have some experience with MSA programs, and it is known that the pairwise alignment it's the most demanding part of this algorithm when the number of sequences increases. This version of map-reduce all-to-all sequence alignment can also be used in the future if other progressive alignment algorithms are to be implemented (Maybe T-COFFE or others) 2)If the input files are big enough, it can be interesting to perform the parsing on this files while using a distributed infrastructure to speedup the process, in this case the map reduce framework would paralelize this process by splitting the input file in several chunks and making the parsing of the sequences that are in each chunk. 3)Another idea can be to try to have a hadoopify version of blast, in which the input file also can be splitted and then for each sequence in a chunk, the node would perform a local blast query. Since bio-java doesn't implement yet a blast version (Which i see is another GSoC project), this idea would require to make a wrapper to execute the ncbi blast program and then joining the results. Thanks for your feedback, which i'm hoping in order to submit my proposal Best regards! On Fri, Mar 30, 2012 at 6:35 PM, Arthur Oviedo wrote: > Hello, > My name is Arthur, and i'm a master student at EPFL (?cole Polytechnique > F?d?rale de Lausanne) in computer science. > I worked in different project that are somewhat related to BioJava and > cloud environment. > I have worked , while i was research assistant, (briefly) in a project > called UnaCloud ( > http://sistemas.uniandes.edu.co/~unacloud/dokuwiki/doku.php?id=recursos:documentacion) > which provides an opportunistic grid/cloud infrastructure for running > scientific experiments and we have used it to help bio-informaticians with > their different jobs like huge BLAST queryes, HMMER jobs, etc. > As part of my assistant work in the same university, I developed a cool > system called UnaCloud MSA which integrates some existing and mew developed > tools to analyze Multiple Sequence Alignments. It even uses the BioJava > library to perform some verification about the sequences. All of this is > also done employing the UnaCloud infrastructure. This work is still in > development and in preparation for publication. > http://unacloudmsa.uniandes.edu.co > Currently, i'm working on a class project on Hadoop (An implementation of > subset of the functionalities of a Database Manager System) using Hadoop > (Map-reduce) framework. > All of the mentioned projects have been implemented in Java, so i suppose > that i meet the java expertise requirement. > I would like to know more about this project and to know also the rough > dates where the Google Summer of Code would be held (To prepare my > schedule). > Thanks and best regards, > Arthur Oviedo > From andreas at sdsc.edu Thu Apr 5 15:49:08 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 5 Apr 2012 08:49:08 -0700 Subject: [Biojava-l] Interested in the "cloudization" of BioJava In-Reply-To: References: Message-ID: Hi Arthur, > 1) The first one and the one i find most interesting can be to try to > introduce the map-reduce framework to help to speed-up the pairwise > alignment in the creation of the muliple sequence alignment. That would be a possible application. > 2)If the input files are big enough, it can be interesting to perform the > parsing on this files while using a distributed infrastructure to speedup > the process, I am not sure if I have encountered such large files as of yet. Do you have an example? > 3)Another idea can be to try to have a hadoopify version of blast, in which > the input file also can be splitted and then for each sequence in a chunk, > the node would perform a local blast query. I agree, another possible application... What frameworks did you think about using? Andreas From andreas at sdsc.edu Thu Apr 5 15:57:36 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 5 Apr 2012 08:57:36 -0700 Subject: [Biojava-l] [Biojava-dev] Port an Algorithm to Java In-Reply-To: References: Message-ID: Hi Dragos, it contains a good list of technical issues that might come up. Can you also add a section about what additional benefits can be added if this is done in Java? Andreas On Wed, Apr 4, 2012 at 6:20 PM, Dragos-Bogdan Sima wrote: > I have submitted a draft aplication. > Could you provide me some feedback? > > http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dbsima/1 From to.petr at gmail.com Thu Apr 5 22:43:59 2012 From: to.petr at gmail.com (P. Troshin) Date: Thu, 5 Apr 2012 23:43:59 +0100 Subject: [Biojava-l] [Biojava-dev] Port an Algorithm to Java In-Reply-To: References: Message-ID: >> I hava an important question. What would it be the best method to treat >> constant objects in java? Make a class(es) and define your constants there, then import it statically. Make your constants public static and final. You may want to implement some of the constant as Enums (http://docs.oracle.com/javase/1.5.0/docs/guide/language/enums.html). Good luck with your project. Regards, Peter On 4 April 2012 17:32, Andreas Prlic wrote: > I recommend a very good read - Effective Java from Joshua Bloch. > > also there are a couple of good online articles about the topic of > immutable objects. > > Andreas > > On Wed, Apr 4, 2012 at 8:06 AM, Dragos-Bogdan Sima > wrote: >> Hello, >> >> I hava an important question. What would it be the best method to treat >> constant objects in java? >> I am thinking to write an Immuble interface that provides API for just the >> const methods. Then if I return or pass objects of type immubable, the >> degree of safety would be the same as in C++. >> >> Thank you, >> Dragos. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From mictadlo at gmail.com Fri Apr 6 09:17:33 2012 From: mictadlo at gmail.com (Mic) Date: Fri, 6 Apr 2012 19:17:33 +1000 Subject: [Biojava-l] Interested in the "cloudization" of BioJava In-Reply-To: References: Message-ID: I have never tried it out by myself. In the next 3 years some GPUs will have 20 000 cores. Looking at this http://jogamp.org/jocl/www/ benchmark it is a huge difference between GPU and CPU code. In Aparapi http://blogs.amd.com/developer/2011/09/14/i-dont-always-write-gpu-code-in-java-but-when-i-do-i-like-to-use-aparapi/ you write everything in Java. On Fri, Apr 6, 2012 at 3:28 PM, Andreas Prlic wrote: > Do you have any experience with that? I don;t have first hand, but > from what I know GPU programming is very much hardware specific and > not as nicely platform independent as Java... > > Andreas > > On Thu, Apr 5, 2012 at 6:37 PM, Mic wrote: > > maybe also to include OpenCL in order to able to run it on GPU. > > > > On Fri, Apr 6, 2012 at 1:49 AM, Andreas Prlic wrote: > >> > >> Hi Arthur, > >> > >> > 1) The first one and the one i find most interesting can be to try to > >> > introduce the map-reduce framework to help to speed-up the pairwise > >> > alignment in the creation of the muliple sequence alignment. > >> > >> That would be a possible application. > >> > >> > 2)If the input files are big enough, it can be interesting to perform > >> > the > >> > parsing on this files while using a distributed infrastructure to > >> > speedup > >> > the process, > >> > >> I am not sure if I have encountered such large files as of yet. Do you > >> have an example? > >> > >> > 3)Another idea can be to try to have a hadoopify version of blast, in > >> > which > >> > the input file also can be splitted and then for each sequence in a > >> > chunk, > >> > the node would perform a local blast query. > >> > >> I agree, another possible application... > >> > >> What frameworks did you think about using? > >> > >> Andreas > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > From rbuels at gmail.com Mon Apr 9 14:57:45 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 09 Apr 2012 10:57:45 -0400 Subject: [Biojava-l] Google Summer of Code mentors Message-ID: <4F82F8E9.40401@gmail.com> Hi all, Reminder: if you want to help mentor Google Summer of Code students to work on your Bio* project, you have to do four things: 1. Make sure you have enough time to actually help a student over the summer 2. Sign up as a mentor for the Open Bioinformatics Foundation at http://www.google-melange.com/gsoc/homepage/google/gsoc2012 3. Join the OBF Google Summer of Code mailing lists at: http://lists.open-bio.org/mailman/listinfo/gsoc and http://lists.open-bio.org/mailman/listinfo/gsoc-mentors 4. After your request to be a mentor is accepted by me, log into the GSoC web interface at http://www.google-melange.com (the same web application you used to sign up) and help look at and evaluate this year's student proposals. Robert Buels 2012 OBF GSoC Org. Admin. From andreas at sdsc.edu Mon Apr 9 21:50:14 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 9 Apr 2012 14:50:14 -0700 Subject: [Biojava-l] [Biojava-dev] Port an Algorithm to Java In-Reply-To: References: Message-ID: Sorry for the slow response, I am mostly offline this week. I don't know that manual. There is also an O'reilly's book on Blast if you are interested to read up more. Andreas On Sat, Apr 7, 2012 at 1:09 PM, Dragos-Bogdan Sima wrote: > Hello, > > I found this manual:?The Developer?s Guide to BLAST by Jason Papadopoulos. > Is it a good lecture? > > Thank you. > Dragos. -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From p.j.a.cock at googlemail.com Mon Apr 23 22:55:55 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 Apr 2012 23:55:55 +0100 Subject: [Biojava-l] [Biojava-dev] GSoC 2012 In-Reply-To: References: Message-ID: Hello Dragos-Bogdan, On Mon, Apr 23, 2012 at 9:33 PM, Dragos-Bogdan Sima wrote: > Hello everyone, > > How come there was no BioJava project accepted this year, > and only BioRuby and BioPython? > Was the interest greater on those languages or simply the > proposals were better? There should be an official email from the OBF representative today (or tomorrow in my time zone) announcing the results - but I imagine as one of the applicants you've been contacted via Google already. So for now I'll briefly confirm that, yes, the number of applications by Bio* project varied, but the student proposals are ranked on merit regardless of which Bio* project they are for. See: http://www.open-bio.org/wiki/Google_Summer_of_Code_Application_Evaluation This is linked to from the main OBF GSoC page, http://www.open-bio.org/wiki/Google_Summer_of_Code Regards, Peter From rbuels at gmail.com Mon Apr 23 23:49:10 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 23 Apr 2012 19:49:10 -0400 Subject: [Biojava-l] Announcing OBF Google Summer of Code Accepted Students Message-ID: <4F95EA76.4030004@gmail.com> Hello all, I'm very pleased and excited to announce that the Open Bioinformatics Foundation has selected 5 very capable students to work on OBF projects this summer as part of the Google Summer of Code program. The accepted students, their projects, and their mentors (in alphabetical order): Wibowo Arindrarto SearchIO Implementation in Biopython mentored by Peter Cock Lenna Peterson Diff My DNA: Development of a Genomic Variant Toolkit for Biopython mentored by Brad Chapman Marjan Povolni The worlds fastest parallelized GFF3/GTF parser in D, and an interfacing biogem plugin for Ruby mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Artem Tarasov Fast parallelized GFF3/GTF parser in C++, with Ruby FFI bindings mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal Clayton Wheeler Multiple Alignment Format parser for BioRuby mentored by Francesco Strozzi and Raoul Bonnal As in every year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF's open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received. For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it's time to "put your money where your mouth is", as the saying goes. Let's get out there and write some great code this summer! Best regards, Rob ---- Robert Buels OBF GSoC 2012 Administrator From andreas at sdsc.edu Tue Apr 24 02:43:49 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 23 Apr 2012 19:43:49 -0700 Subject: [Biojava-l] gsoc update Message-ID: Hi, As you have probably read by now, this year's OBF students for the Google summer of code are going to other Bio* projects, and none for BioJava. This has to do with several factors: - Overall applications were down by 53% this year - None of the BioJava related proposals was scored high enough In order to help students prepare stronger proposals for next year, I believe we should try to prepare things differently: - More concrete project topics from our side. Our approach to provide open topics and let student fill in details did not work well this year. - Less challenging topics, some of our topics were perhaps too difficult. - Overall we should try to have more mentors comment on the list and help students prepare good project plans. - More marketing so we can spread the word to more students. I want to thank all students and potential mentors who invested time into this. Even if we did not succeed this time, I do hope we all learnt something in the process and can find a way to work together on BioJava also beyond the scope of GSoC. Andreas