From andreas at sdsc.edu Wed Mar 2 00:12:19 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 1 Mar 2011 21:12:19 -0800 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D6BBCB7.3010203@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> Message-ID: Hi Peter, we still don;t know yet if we will have support from Google again this year. Once we have a confirmation we will use the wiki site again for hosting pages related to GSoC. However we should do this project in any case... Andreas On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin wrote: >>>> What other functionality would you >>>> like to see that is currently not there? > > > I think that the methods below would be a good starting point, then the > Google Summer of Code student can propose something else that he/she would > fancy implementing. > > ?Molecular weight > ?Extinction coefficient > ?Instability index > ?Aliphatic index > ?Grand Average of Hydropathy > ?Isoelectric point > ?Number of amino acids in the protein (His, Met, Cys) > > I know BioJava projects were managed under Open Bioinformatics Foundation > (OBF) during last years GSoC. Is there a page for this year GSoC ideas > somewhere? > > Regards, > Peter > > > On 25/02/2011 05:12, Andreas Prlic wrote: >> >> Great, seems we have an agreement that we want to improve >> functionality for this. How complex is this going to be? From quickly >> checking the 1.8 source it looks like just a few classes that need to >> be converted and not too painful. ?What other functionality would you >> like to see that is currently not there? >> >> Andreas >> >> >> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis ?wrote: >>> >>> We put in some basics regarding modeling amino acid properties in the >>> core module but really didn't have any pressing use cases to drive the >>> api beyond calculating the mass of a peptide. We currently have >>> getMolecularWeight() as a method in AbstractCompound but never added a >>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>> get the attributes/features of amino acids properly modeled in core >>> and extend when reasonable useful summary methods at higher levels. >>> You should be able to query mass of a peptide and have it valid for an >>> amino acid with a PTM which means the amino acid needs to support the >>> ability to be modified in a flexible manner. I spent the last year+ >>> developing a software suite for peptide detection in MS data for >>> deuterium exchange where automated PTM detection was important. Would >>> be great to get some focused attention on the core to make sure we can >>> model nucleotides and amino acids with a chemistry friendly API. >>> >>> Thanks >>> >>> Scooter >>> >>> On Thu, Feb 24, 2011 at 2:15 PM, George Waldon >>> ?wrote: >>>> >>>> Hello Peter& ?Andreas >>>> >>>> I effectively did some work on these methods, mostly fixing and adding >>>> the >>>> ExPASy algorithm that was kindly provided to me. I think it makes a lot >>>> of >>>> sense to port all physico-chemical property calculations related to >>>> amino >>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>> definitively >>>> support the effort. We could smoothly deprecate the bj1 package when >>>> this is >>>> done. Let me know how I could help. >>>> >>>> Thanks >>>> George >>>> >>>> Quoting Peter Troshin: >>>> >>>>> Hi Andreas, >>>>> >>>>> In fact I'd be happy to help with the development of the tools for >>>>> simple >>>>> physico-chemical properties calculation for peptides. We could port >>>>> George?s >>>>> code (assuming he is happy with this) from BioJava 1.8 but we can also >>>>> provide a few other methods. A couple of projects in the lab where I >>>>> work >>>>> would have benefited from having these calculations readily available. >>>>> >>>>> I was thinking about participation in the Google Summer of Code (GoSC) >>>>> this year as a mentor, and I think this would be an easy project for a >>>>> student. What do you think about this? >>>>> >>>>> Thank you for your prompt reply. >>>>> >>>>> Regards, >>>>> Peter >>>>> >>>>> >>>>> >>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>> >>>>>> Hi Peter, >>>>>> >>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>> like to port this to biojava 3 as well.. George do you want to help me >>>>>> with that, since you are one of the authors of this package? The basic >>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>> class) >>>>>> >>>>>> Andreas >>>>>> >>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>> Troshin >>>>>> ?wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>> org.biojava.bio.proteomics package, which had methods for isoelectric >>>>>>> point >>>>>>> and molecular weight calculations for peptides. I could not find this >>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>> wonder >>>>>>> if there are any equivalent methods available in the latest version >>>>>>> of >>>>>>> BioJava? >>>>>>> >>>>>>> Thank you for your help, >>>>>>> >>>>>>> Kind regards, >>>>>>> Peter >>>>>>> >>>>>>> Dr Peter Troshin >>>>>>> Bioinformatics Software Developer >>>>>>> Phone: +44 (0)1382 388589 >>>>>>> Fax: +44 (0)1382 385764 >>>>>>> The Barton Group >>>>>>> College of Life Sciences >>>>>>> Medical Sciences Institute >>>>>>> University of Dundee >>>>>>> Dundee >>>>>>> DD1 5EH >>>>>>> UK >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >> >> > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From p.v.troshin at dundee.ac.uk Wed Mar 2 12:06:34 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 02 Mar 2011 17:06:34 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> Message-ID: <4D6E791A.1000907@dundee.ac.uk> My apologies, the link to the timeline should have been this http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2011/faqs#timeline I have to say that Google has been pretty consistent with the dates, so the date for the announcement is still the same (18 of March). Peter >>>>Hi Andreas, >>>>It's not a lot to wait before we know whether Google supports OBF this year or not. >>>>According to the GSoC timeline >>>>(http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#timeline ) >>>>on the 18 of March they will publish the list of organisations they >>>>will support. Let's wait and see. >>>>Kind regards, >>>>Peter On 02/03/2011 05:12, Andreas Prlic wrote: > Hi Peter, > > we still don;t know yet if we will have support from Google again this > year. Once we have a confirmation we will use the wiki site again for > hosting pages related to GSoC. However we should do this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin wrote: >>>>> What other functionality would you >>>>> like to see that is currently not there? >> >> I think that the methods below would be a good starting point, then the >> Google Summer of Code student can propose something else that he/she would >> fancy implementing. >> >> Molecular weight >> Extinction coefficient >> Instability index >> Aliphatic index >> Grand Average of Hydropathy >> Isoelectric point >> Number of amino acids in the protein (His, Met, Cys) >> >> I know BioJava projects were managed under Open Bioinformatics Foundation >> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >> somewhere? >> >> Regards, >> Peter >> >> >> On 25/02/2011 05:12, Andreas Prlic wrote: >>> Great, seems we have an agreement that we want to improve >>> functionality for this. How complex is this going to be? From quickly >>> checking the 1.8 source it looks like just a few classes that need to >>> be converted and not too painful. What other functionality would you >>> like to see that is currently not there? >>> >>> Andreas >>> >>> >>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis wrote: >>>> We put in some basics regarding modeling amino acid properties in the >>>> core module but really didn't have any pressing use cases to drive the >>>> api beyond calculating the mass of a peptide. We currently have >>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>> get the attributes/features of amino acids properly modeled in core >>>> and extend when reasonable useful summary methods at higher levels. >>>> You should be able to query mass of a peptide and have it valid for an >>>> amino acid with a PTM which means the amino acid needs to support the >>>> ability to be modified in a flexible manner. I spent the last year+ >>>> developing a software suite for peptide detection in MS data for >>>> deuterium exchange where automated PTM detection was important. Would >>>> be great to get some focused attention on the core to make sure we can >>>> model nucleotides and amino acids with a chemistry friendly API. >>>> >>>> Thanks >>>> >>>> Scooter >>>> >>>> On Thu, Feb 24, 2011 at 2:15 PM, George Waldon >>>> wrote: >>>>> Hello Peter& Andreas >>>>> >>>>> I effectively did some work on these methods, mostly fixing and adding >>>>> the >>>>> ExPASy algorithm that was kindly provided to me. I think it makes a lot >>>>> of >>>>> sense to port all physico-chemical property calculations related to >>>>> amino >>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>> definitively >>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>> this is >>>>> done. Let me know how I could help. >>>>> >>>>> Thanks >>>>> George >>>>> >>>>> Quoting Peter Troshin: >>>>> >>>>>> Hi Andreas, >>>>>> >>>>>> In fact I'd be happy to help with the development of the tools for >>>>>> simple >>>>>> physico-chemical properties calculation for peptides. We could port >>>>>> George?s >>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can also >>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>> work >>>>>> would have benefited from having these calculations readily available. >>>>>> >>>>>> I was thinking about participation in the Google Summer of Code (GoSC) >>>>>> this year as a mentor, and I think this would be an easy project for a >>>>>> student. What do you think about this? >>>>>> >>>>>> Thank you for your prompt reply. >>>>>> >>>>>> Regards, >>>>>> Peter >>>>>> >>>>>> >>>>>> >>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>> Hi Peter, >>>>>>> >>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>> like to port this to biojava 3 as well.. George do you want to help me >>>>>>> with that, since you are one of the authors of this package? The basic >>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>> class) >>>>>>> >>>>>>> Andreas >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>> Troshin >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>> org.biojava.bio.proteomics package, which had methods for isoelectric >>>>>>>> point >>>>>>>> and molecular weight calculations for peptides. I could not find this >>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>> wonder >>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>> of >>>>>>>> BioJava? >>>>>>>> >>>>>>>> Thank you for your help, >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Peter >>>>>>>> >>>>>>>> Dr Peter Troshin >>>>>>>> Bioinformatics Software Developer >>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>> The Barton Group >>>>>>>> College of Life Sciences >>>>>>>> Medical Sciences Institute >>>>>>>> University of Dundee >>>>>>>> Dundee >>>>>>>> DD1 5EH >>>>>>>> UK >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>> >> > > From p.v.troshin at dundee.ac.uk Wed Mar 2 12:00:16 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 02 Mar 2011 17:00:16 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> Message-ID: <4D6E77A0.4060104@dundee.ac.uk> Hi Andreas, It's not a lot to wait before we know whether Google supports OBF this year or not. According to the GSoC timeline (http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#timeline ) on the 18 of March they will publish the list of organisations they will support. Let's wait and see. Kind regards, Peter On 02/03/2011 05:12, Andreas Prlic wrote: > Hi Peter, > > we still don;t know yet if we will have support from Google again this > year. Once we have a confirmation we will use the wiki site again for > hosting pages related to GSoC. However we should do this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin wrote: >>>>> What other functionality would you >>>>> like to see that is currently not there? >> >> I think that the methods below would be a good starting point, then the >> Google Summer of Code student can propose something else that he/she would >> fancy implementing. >> >> Molecular weight >> Extinction coefficient >> Instability index >> Aliphatic index >> Grand Average of Hydropathy >> Isoelectric point >> Number of amino acids in the protein (His, Met, Cys) >> >> I know BioJava projects were managed under Open Bioinformatics Foundation >> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >> somewhere? >> >> Regards, >> Peter >> >> >> On 25/02/2011 05:12, Andreas Prlic wrote: >>> Great, seems we have an agreement that we want to improve >>> functionality for this. How complex is this going to be? From quickly >>> checking the 1.8 source it looks like just a few classes that need to >>> be converted and not too painful. What other functionality would you >>> like to see that is currently not there? >>> >>> Andreas >>> >>> >>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis wrote: >>>> We put in some basics regarding modeling amino acid properties in the >>>> core module but really didn't have any pressing use cases to drive the >>>> api beyond calculating the mass of a peptide. We currently have >>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>> get the attributes/features of amino acids properly modeled in core >>>> and extend when reasonable useful summary methods at higher levels. >>>> You should be able to query mass of a peptide and have it valid for an >>>> amino acid with a PTM which means the amino acid needs to support the >>>> ability to be modified in a flexible manner. I spent the last year+ >>>> developing a software suite for peptide detection in MS data for >>>> deuterium exchange where automated PTM detection was important. Would >>>> be great to get some focused attention on the core to make sure we can >>>> model nucleotides and amino acids with a chemistry friendly API. >>>> >>>> Thanks >>>> >>>> Scooter >>>> >>>> On Thu, Feb 24, 2011 at 2:15 PM, George Waldon >>>> wrote: >>>>> Hello Peter& Andreas >>>>> >>>>> I effectively did some work on these methods, mostly fixing and adding >>>>> the >>>>> ExPASy algorithm that was kindly provided to me. I think it makes a lot >>>>> of >>>>> sense to port all physico-chemical property calculations related to >>>>> amino >>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>> definitively >>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>> this is >>>>> done. Let me know how I could help. >>>>> >>>>> Thanks >>>>> George >>>>> >>>>> Quoting Peter Troshin: >>>>> >>>>>> Hi Andreas, >>>>>> >>>>>> In fact I'd be happy to help with the development of the tools for >>>>>> simple >>>>>> physico-chemical properties calculation for peptides. We could port >>>>>> George?s >>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can also >>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>> work >>>>>> would have benefited from having these calculations readily available. >>>>>> >>>>>> I was thinking about participation in the Google Summer of Code (GoSC) >>>>>> this year as a mentor, and I think this would be an easy project for a >>>>>> student. What do you think about this? >>>>>> >>>>>> Thank you for your prompt reply. >>>>>> >>>>>> Regards, >>>>>> Peter >>>>>> >>>>>> >>>>>> >>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>> Hi Peter, >>>>>>> >>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>> like to port this to biojava 3 as well.. George do you want to help me >>>>>>> with that, since you are one of the authors of this package? The basic >>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>> class) >>>>>>> >>>>>>> Andreas >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>> Troshin >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>> org.biojava.bio.proteomics package, which had methods for isoelectric >>>>>>>> point >>>>>>>> and molecular weight calculations for peptides. I could not find this >>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>> wonder >>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>> of >>>>>>>> BioJava? >>>>>>>> >>>>>>>> Thank you for your help, >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Peter >>>>>>>> >>>>>>>> Dr Peter Troshin >>>>>>>> Bioinformatics Software Developer >>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>> The Barton Group >>>>>>>> College of Life Sciences >>>>>>>> Medical Sciences Institute >>>>>>>> University of Dundee >>>>>>>> Dundee >>>>>>>> DD1 5EH >>>>>>>> UK >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>> >> > > From bart.mesuere at ugent.be Fri Mar 4 02:37:15 2011 From: bart.mesuere at ugent.be (Bart Mesuere) Date: Fri, 4 Mar 2011 08:37:15 +0100 Subject: [Biojava-l] read DBLINK field from genbank file Message-ID: Hi, I'm trying to read some genbank files using the 1.7 legacy code. All works fine but I'm having trouble extracting the projectid located in the DBLINK field (in bold in the example): LOCUS NC_009926 374161 bp DNA circular BCT > 25-JUL-2008 > DEFINITION Acaryochloris marina MBIC11017 plasmid pREB1, complete > sequence. > ACCESSION NC_009926 > VERSION NC_009926.1 GI:158339488 > *DBLINK Project: 58167* I already inspected the SimpleRichSequence object with a debugger but couldn't find anything useful. Is it possible to read this field using biojava 1.7? Kind regards, Bart Mesuere From andreas at sdsc.edu Fri Mar 4 11:38:10 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 4 Mar 2011 08:38:10 -0800 Subject: [Biojava-l] [Biojava-dev] Bioinformatics Open Source Conference (BOSC 2011)--Call for Abstracts In-Reply-To: <20110304123756.GE27839@sobchak> References: <20110304123756.GE27839@sobchak> Message-ID: Anybody who wants to submit the BioJava abstract for the BOSC meeting this year? - I will be in Vienna and can help, however I will be attending the 3D-SIG meeting which is at the same time... Andreas On Fri, Mar 4, 2011 at 4:37 AM, Brad Chapman wrote: > We invite you to submit an abstract to BOSC 2011! Please forward this > message as appropriate, and forgive multiple postings. > > Call for Abstracts for the 12th Annual Bioinformatics Open Source Conference (BOSC 2011) > An ISMB 2011 Special Interest Group (SIG) > > Dates: July 15-16, 2011 > Location: Vienna, Austria > Web site: http://www.open-bio.org/wiki/BOSC_2011 > Email: bosc at open-bio.org > BOSC announcements mailing list: ?http://lists.open-bio.org/mailman/listinfo/bosc-announce > > Important Dates: > April 18, 2011: Deadline for submitting abstracts to BOSC 2011 > May 9, 2011: Notifications of accepted abstracts emailed to corresponding authors > July 13-14, 2011: Codefest 2011 programming session (see http://www.open-bio.org/wiki/Codefest_2011 for details) > July 15-16, 2011: BOSC 2011 > July 17-19, 2011: ISMB 2011 > > The Bioinformatics Open Source Conference (BOSC) is sponsored by the > Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated > to promoting the practice and philosophy of Open Source software > development within the biological research community. To be considered > for acceptance, software systems representing the central topic in a > presentation submitted to BOSC must be licensed with a recognized Open > Source License, and be freely available for download in source code > form. > > We invite you to submit abstracts for talks and posters. ?Sessions include: > - Approaches to parallel processing > - Cloud-based approaches to improving software and data accessibility > - The Semantic Web in open source bioinformatics > - Data visualization > - Tools for next-generation sequencing > - Other Open Source software > > In addition to the above sessions, there will be a panel discussion > about "Meeting the challenges of inter-institutional collaboration". We > are also working to arrange a joint session with one of the other ISMB > SIGs. > > Thanks to generous sponsorship from Eagle Genomics and an anonymous > donor, we are pleased to announce a competition for three Student Travel > Awards for BOSC 2011. Each winner will be awarded $250 to defray the > costs of travel to BOSC 2011. > > For instructions on submitting your abstract, please visit > http://www.open-bio.org/wiki/BOSC_2011#Abstract_Submission_Information > > BOSC 2011 Organizing Committee: > Nomi Harris and Peter Rice (co-chairs); Brad Chapman, Peter Cock, Erwin Frise, Darin London, Ron Taylor > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Fri Mar 4 11:43:44 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 4 Mar 2011 16:43:44 +0000 Subject: [Biojava-l] [Biojava-dev] Bioinformatics Open Source Conference (BOSC 2011)--Call for Abstracts In-Reply-To: References: <20110304123756.GE27839@sobchak> Message-ID: i will be attending BOSC and chairing a session. probably best if i do not present though as i am no longer directly involved with biojava. On 4 Mar 2011, at 16:38, Andreas Prlic wrote: > Anybody who wants to submit the BioJava abstract for the BOSC meeting this year? > > - I will be in Vienna and can help, however I will be attending the > 3D-SIG meeting which is at the same time... > > Andreas > > On Fri, Mar 4, 2011 at 4:37 AM, Brad Chapman wrote: >> We invite you to submit an abstract to BOSC 2011! Please forward this >> message as appropriate, and forgive multiple postings. >> >> Call for Abstracts for the 12th Annual Bioinformatics Open Source Conference (BOSC 2011) >> An ISMB 2011 Special Interest Group (SIG) >> >> Dates: July 15-16, 2011 >> Location: Vienna, Austria >> Web site: http://www.open-bio.org/wiki/BOSC_2011 >> Email: bosc at open-bio.org >> BOSC announcements mailing list: http://lists.open-bio.org/mailman/listinfo/bosc-announce >> >> Important Dates: >> April 18, 2011: Deadline for submitting abstracts to BOSC 2011 >> May 9, 2011: Notifications of accepted abstracts emailed to corresponding authors >> July 13-14, 2011: Codefest 2011 programming session (see http://www.open-bio.org/wiki/Codefest_2011 for details) >> July 15-16, 2011: BOSC 2011 >> July 17-19, 2011: ISMB 2011 >> >> The Bioinformatics Open Source Conference (BOSC) is sponsored by the >> Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated >> to promoting the practice and philosophy of Open Source software >> development within the biological research community. To be considered >> for acceptance, software systems representing the central topic in a >> presentation submitted to BOSC must be licensed with a recognized Open >> Source License, and be freely available for download in source code >> form. >> >> We invite you to submit abstracts for talks and posters. Sessions include: >> - Approaches to parallel processing >> - Cloud-based approaches to improving software and data accessibility >> - The Semantic Web in open source bioinformatics >> - Data visualization >> - Tools for next-generation sequencing >> - Other Open Source software >> >> In addition to the above sessions, there will be a panel discussion >> about "Meeting the challenges of inter-institutional collaboration". We >> are also working to arrange a joint session with one of the other ISMB >> SIGs. >> >> Thanks to generous sponsorship from Eagle Genomics and an anonymous >> donor, we are pleased to announce a competition for three Student Travel >> Awards for BOSC 2011. Each winner will be awarded $250 to defray the >> costs of travel to BOSC 2011. >> >> For instructions on submitting your abstract, please visit >> http://www.open-bio.org/wiki/BOSC_2011#Abstract_Submission_Information >> >> BOSC 2011 Organizing Committee: >> Nomi Harris and Peter Rice (co-chairs); Brad Chapman, Peter Cock, Erwin Frise, Darin London, Ron Taylor >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From gwaldon at geneinfinity.org Fri Mar 4 12:16:25 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Fri, 04 Mar 2011 11:16:25 -0600 Subject: [Biojava-l] read DBLINK field from genbank file In-Reply-To: References: Message-ID: <20110304111625.100050dk9yj9m6ck@gator1273.hostgator.com> Hi Bart, DBLINK is a recent addition to Genbank format and unfortunately bj1 parser does not read it. You can check yourself in org.biojavax.bio.seq.io.GenbankFormat. Regards, George Quoting Bart Mesuere : > Hi, > > I'm trying to read some genbank files using the 1.7 legacy code. All works > fine but I'm having trouble extracting the projectid located in the DBLINK > field (in bold in the example): > > LOCUS NC_009926 374161 bp DNA circular BCT >> 25-JUL-2008 >> DEFINITION Acaryochloris marina MBIC11017 plasmid pREB1, complete >> sequence. >> ACCESSION NC_009926 >> VERSION NC_009926.1 GI:158339488 >> *DBLINK Project: 58167* > > > I already inspected the SimpleRichSequence object with a debugger but > couldn't find anything useful. Is it possible to read this field using > biojava 1.7? > > Kind regards, > Bart Mesuere > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From hlapp at drycafe.net Fri Mar 4 18:26:25 2011 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 4 Mar 2011 18:26:25 -0500 Subject: [Biojava-l] Informatics job opportunity at NESCent Message-ID: <1878F27F-000D-4C80-B9EA-A83F7887828F@drycafe.net> (Apologies if you receive multiple copies, and also if you are not interested in job opportunities. In my defense, quite a few people on Bio* lists might qualify for (let alone enjoy) the position. And if you know someone who might be interested please forward.) =================================================== User Interface Design and Web Application Developer =================================================== The National Evolutionary Synthesis Center (NESCent) seeks a creative and enthusiastic individual to design user interfaces and web applications for scientific applications. The incumbent will work as part of a small informatics team in close collaboration with domain scientists. NESCent is an NSF-funded center dedicated to cross-disciplinary research in evolutionary science. Our informatics team works closely with visiting and resident scientists to support their custom software and database development needs. All NESCent software products are open- source, and the Center has a number of initiatives to actively promote collaborative development of community software resources (informatics.nescent.org). Above all, we are enthusiastic about our work, about the mission of the Center, and about the contribution of informatics to that mission. Job description: The incumbent will design and develop user interfaces and web applications for databases and other software tools for sponsored scientists and staff. The job responsibilities include all stages of the software development process, including requirements gathering, design, implementation, release packaging and documentation, as part of a small team (typically 2-3 individuals) following project management best practices. We expect the incumbent to present their work at conferences and contribute to publications with scientific collaborators; interact regularly with visiting and resident scientists, other members of the informatics team and Center staff; and generally serve as an expert resource for Center personnel. The position provides opportunities for professional development. Most informatics staff work at our Durham NC offices, located adjacent to Duke University, but we do support a wide range of technologies for virtual communication with off-site staff and collaborators. Required Qualifications: * Demonstrated success collaborating with clients on custom software solutions * Experience with various stages of the software development cycle * Expertise in development and testing of user interface designs * Excellent communication skills, both virtual and face-to-face * A four-year college degree in Computer Science, Bioinformatics or a related field Preferred Qualifications: * M.S. or Ph.D. in Computer Science, Bioinformatics or related field along with demonstrated interest in science, particularly biology * Expertise in rapid application development and respective programming technologies and languages (e.g., modern scripting languages and web-application frameworks such as Python/Django, Ruby/ Ruby-on-Rails, and Perl/Catalyst), fluency in Java programming, and prior experience in relational database programming (PostgreSQL or MySQL) * Expertise in dynamic and interactive web technologies (JavaScript, CGI), web service (SOAP, REST, XML, JSON) and semantic web technologies * Experience with open-source, and collaborative, software development, software usability design and assessment * Expertise in graphic design, data visualization and/or scientific data integration How to apply: Please send cover letter, resume and contact information for three references to Dr. Karen Cranston, Training Coordinator and Bioinformatics Project Manager (karen.cranston at nescent.org). Review of applications will begin March 21, 2011. Informal inquires or requests for additional information may be directed to Dr. Cranston by email or phone (+1-919-613-2275). -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From andreas at sdsc.edu Sat Mar 5 16:56:40 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 5 Mar 2011 13:56:40 -0800 Subject: [Biojava-l] biojava wiki Message-ID: Hi, In order to prevent our wiki from being spammed, there is now a new plugin being used to block out bots. Let us know if you notice any problems when signing up or logging into your accounts... Andreas From jayunit100 at gmail.com Sun Mar 6 20:54:58 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Sun, 6 Mar 2011 20:54:58 -0500 Subject: [Biojava-l] DBREF parsing exception... Message-ID: Hi guys : It looks like the PDB parser for biojava is tripping up on the DBREFs for pdb id 3O62 . Its a little bit of a problem for me because a bunch of exception stack traces are getting streamed to the screen and making it difficult for me to debug my code.... Is there a way to disable the reading of DBREF lines, or alternatively, is there a way to fix the exception ? badly formatted line ... DBREF 3O62 A 1 135 UNP P84233 H32_XENLA 2 136 java.lang.StringIndexOutOfBoundsException: String index out of range: 68 at java.lang.String.substring(String.java:1934) at org.biojava.bio.structure.io.PDBFileParser.pdb_DBREF_Handler(PDBFileParser.java:1979) at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2413) -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Mon Mar 7 00:15:06 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 6 Mar 2011 21:15:06 -0800 Subject: [Biojava-l] DBREF parsing exception... In-Reply-To: References: Message-ID: Hi Jay, are you using the version from SVN or a particular release? I think this is already fixed in SVN... Andreas On Sun, Mar 6, 2011 at 5:54 PM, Jay Vyas wrote: > Hi guys : It looks like the PDB parser for biojava is tripping up on the > DBREFs for pdb id 3O62 . ?Its a little bit of a problem for me because > a bunch of exception stack traces are getting streamed to the screen and > making it difficult for me to debug my code.... > > Is there a way to disable the reading of DBREF lines, or alternatively, is > there > a way to fix the exception ? > > badly formatted line ... DBREF ?3O62 A ? ?1 ? 135 ?UNP ? ?P84233 > H32_XENLA ? ? ? ?2 ? ?136 > java.lang.StringIndexOutOfBoundsException: String index out of range: 68 > ? ?at java.lang.String.substring(String.java:1934) > ? ?at > org.biojava.bio.structure.io.PDBFileParser.pdb_DBREF_Handler(PDBFileParser.java:1979) > ? ?at > org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2413) > > -- > Jay Vyas > MMSB/UCHC > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From rmb32 at cornell.edu Mon Mar 7 11:37:29 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 07 Mar 2011 11:37:29 -0500 Subject: [Biojava-l] Google Summer of Code project ideas Message-ID: <4D7509C9.5090008@cornell.edu> Hi all, I'm going to be OBF project admin again this year for Google Summer of code. OBF's application is due later this week, and we need to update our project ideas on the OBF wiki page and on each project's individual wiki pages. So, for each of the OBF projects that wants to do GSoC again this year, please: a.) Update the list of project ideas on your project's GSoC page (BioPython, BioPerl, BioRuby, etc). Add new ones, remove ones that have already been done or no longer relevant, etc. b.) Update the list of project ideas on the main OBF GSoC page (http://www.open-bio.org/wiki/Google_Summer_of_Code) to match. c.) Let me know via email that you have done so and it's ready for Google to peruse. Please have the updates done, if possible, by this Friday (March 11). The number and quality of the project ideas are part of the evaluation process for whether OBF is accepted as a Summer of Code organization again this year, so let's come up with some good ones. :-) Rob ---- Robert Buels (prospective) 2011 OBF GSoC Organization Admin From jayunit100 at gmail.com Mon Mar 7 13:11:04 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Mon, 7 Mar 2011 13:11:04 -0500 Subject: [Biojava-l] New Maven Pom issue Message-ID: Hi guys : I tried to update my biojava pom , and it looks like I missed something. In particular the following biojava-core classes are missing. import org.biojava.bio.BioException; import org.biojava.bio.proteomics.IsoelectricPointCalc; import org.biojava.bio.proteomics.MassCalc; import org.biojava.bio.seq.ProteinTools; Heres the way my new pom looks : I think there is a problem with biojava-core ? org.biojava biojava3-alignment 3.0.1 compile org.biojava biojava3-core 3.0.1 org.biojava biojava3-protmod 3.0.1 compile From willishf at ufl.edu Mon Mar 7 13:37:26 2011 From: willishf at ufl.edu (Scooter Willis) Date: Mon, 7 Mar 2011 13:37:26 -0500 Subject: [Biojava-l] New Maven Pom issue In-Reply-To: References: Message-ID: I think those are all imports from biojava 1.X. We tried to use org.biojava3 for packages where code is in biojava3. On Mon, Mar 7, 2011 at 1:11 PM, Jay Vyas wrote: > Hi guys : I tried to update my biojava pom , and it looks like I missed > something. > > In particular the following biojava-core classes are missing. > import org.biojava.bio.BioException; > import org.biojava.bio.proteomics.IsoelectricPointCalc; > import org.biojava.bio.proteomics.MassCalc; > import org.biojava.bio.seq.ProteinTools; > > Heres the way my new pom looks : I think there is a problem with > biojava-core ? > > > ? ? ? ? ? ? ? ?org.biojava > ? ? ? ? ? ? ? ?biojava3-alignment > ? ? ? ? ? ? ? ?3.0.1 > ? ? ? ? ? ? ? ?compile > ? ? ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? ? ? ? ?org.biojava > ? ? ? ? ? ? ? ?biojava3-core > ? ? ? ? ? ? ? ?3.0.1 > ? ? ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? ? ? ? ?org.biojava > ? ? ? ? ? ? ? ?biojava3-protmod > ? ? ? ? ? ? ? ?3.0.1 > ? ? ? ? ? ? ? ?compile > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From p.v.troshin at dundee.ac.uk Tue Mar 8 06:15:36 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 08 Mar 2011 11:15:36 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> Message-ID: <4D760FD8.2010002@dundee.ac.uk> Hi guys, Follow the invitation from Robert, I now registered this idea on the GSoC page for BioJava http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals I wonder if anyone of you fancy co-mentoring a student? It would be good to have someone with up-to-date knowledge of BioJava to ensure that all the appropriate data structures are used. My own knowledge of BioJava is a bit rusty. Kind regards, Peter On 02/03/2011 05:12, Andreas Prlic wrote: > Hi Peter, > > we still don;t know yet if we will have support from Google again this > year. Once we have a confirmation we will use the wiki site again for > hosting pages related to GSoC. However we should do this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin wrote: >>>>> What other functionality would you >>>>> like to see that is currently not there? >> >> I think that the methods below would be a good starting point, then the >> Google Summer of Code student can propose something else that he/she would >> fancy implementing. >> >> Molecular weight >> Extinction coefficient >> Instability index >> Aliphatic index >> Grand Average of Hydropathy >> Isoelectric point >> Number of amino acids in the protein (His, Met, Cys) >> >> I know BioJava projects were managed under Open Bioinformatics Foundation >> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >> somewhere? >> >> Regards, >> Peter >> >> >> On 25/02/2011 05:12, Andreas Prlic wrote: >>> Great, seems we have an agreement that we want to improve >>> functionality for this. How complex is this going to be? From quickly >>> checking the 1.8 source it looks like just a few classes that need to >>> be converted and not too painful. What other functionality would you >>> like to see that is currently not there? >>> >>> Andreas >>> >>> >>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis wrote: >>>> We put in some basics regarding modeling amino acid properties in the >>>> core module but really didn't have any pressing use cases to drive the >>>> api beyond calculating the mass of a peptide. We currently have >>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>> get the attributes/features of amino acids properly modeled in core >>>> and extend when reasonable useful summary methods at higher levels. >>>> You should be able to query mass of a peptide and have it valid for an >>>> amino acid with a PTM which means the amino acid needs to support the >>>> ability to be modified in a flexible manner. I spent the last year+ >>>> developing a software suite for peptide detection in MS data for >>>> deuterium exchange where automated PTM detection was important. Would >>>> be great to get some focused attention on the core to make sure we can >>>> model nucleotides and amino acids with a chemistry friendly API. >>>> >>>> Thanks >>>> >>>> Scooter >>>> >>>> On Thu, Feb 24, 2011 at 2:15 PM, George Waldon >>>> wrote: >>>>> Hello Peter& Andreas >>>>> >>>>> I effectively did some work on these methods, mostly fixing and adding >>>>> the >>>>> ExPASy algorithm that was kindly provided to me. I think it makes a lot >>>>> of >>>>> sense to port all physico-chemical property calculations related to >>>>> amino >>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>> definitively >>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>> this is >>>>> done. Let me know how I could help. >>>>> >>>>> Thanks >>>>> George >>>>> >>>>> Quoting Peter Troshin: >>>>> >>>>>> Hi Andreas, >>>>>> >>>>>> In fact I'd be happy to help with the development of the tools for >>>>>> simple >>>>>> physico-chemical properties calculation for peptides. We could port >>>>>> George?s >>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can also >>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>> work >>>>>> would have benefited from having these calculations readily available. >>>>>> >>>>>> I was thinking about participation in the Google Summer of Code (GoSC) >>>>>> this year as a mentor, and I think this would be an easy project for a >>>>>> student. What do you think about this? >>>>>> >>>>>> Thank you for your prompt reply. >>>>>> >>>>>> Regards, >>>>>> Peter >>>>>> >>>>>> >>>>>> >>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>> Hi Peter, >>>>>>> >>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>> like to port this to biojava 3 as well.. George do you want to help me >>>>>>> with that, since you are one of the authors of this package? The basic >>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>> class) >>>>>>> >>>>>>> Andreas >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>> Troshin >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>> org.biojava.bio.proteomics package, which had methods for isoelectric >>>>>>>> point >>>>>>>> and molecular weight calculations for peptides. I could not find this >>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>> wonder >>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>> of >>>>>>>> BioJava? >>>>>>>> >>>>>>>> Thank you for your help, >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Peter >>>>>>>> >>>>>>>> Dr Peter Troshin >>>>>>>> Bioinformatics Software Developer >>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>> The Barton Group >>>>>>>> College of Life Sciences >>>>>>>> Medical Sciences Institute >>>>>>>> University of Dundee >>>>>>>> Dundee >>>>>>>> DD1 5EH >>>>>>>> UK >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>> >> > > From willishf at ufl.edu Tue Mar 8 06:44:59 2011 From: willishf at ufl.edu (Scooter Willis) Date: Tue, 8 Mar 2011 06:44:59 -0500 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D760FD8.2010002@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> Message-ID: Peter Happy to co-mentor and make sure everything gets integrated properly into either core or another module. Thanks Scooter On Tue, Mar 8, 2011 at 6:15 AM, Peter Troshin wrote: > Hi guys, > > Follow the invitation from Robert, I now registered this idea on the GSoC > page for BioJava > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > I wonder if anyone of you fancy co-mentoring a student? > It would be good to have someone with up-to-date knowledge of BioJava to > ensure that all the appropriate data structures are used. My own knowledge > of BioJava is a bit rusty. > > Kind regards, > Peter > > > On 02/03/2011 05:12, Andreas Prlic wrote: >> >> Hi Peter, >> >> we still don;t know yet if we will have support from Google again this >> year. Once we have a confirmation we will use the wiki site again for >> hosting pages related to GSoC. However we should do this project in >> any case... >> >> Andreas >> >> On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin >> ?wrote: >>>>>> >>>>>> What other functionality would you >>>>>> like to see that is currently not there? >>> >>> I think that the methods below would be a good starting point, then the >>> Google Summer of Code student can propose something else that he/she >>> would >>> fancy implementing. >>> >>> ?Molecular weight >>> ?Extinction coefficient >>> ?Instability index >>> ?Aliphatic index >>> ?Grand Average of Hydropathy >>> ?Isoelectric point >>> ?Number of amino acids in the protein (His, Met, Cys) >>> >>> I know BioJava projects were managed under Open Bioinformatics Foundation >>> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >>> somewhere? >>> >>> Regards, >>> Peter >>> >>> >>> On 25/02/2011 05:12, Andreas Prlic wrote: >>>> >>>> Great, seems we have an agreement that we want to improve >>>> functionality for this. How complex is this going to be? From quickly >>>> checking the 1.8 source it looks like just a few classes that need to >>>> be converted and not too painful. ?What other functionality would you >>>> like to see that is currently not there? >>>> >>>> Andreas >>>> >>>> >>>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis >>>> ?wrote: >>>>> >>>>> We put in some basics regarding modeling amino acid properties in the >>>>> core module but really didn't have any pressing use cases to drive the >>>>> api beyond calculating the mass of a peptide. We currently have >>>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>>> get the attributes/features of amino acids properly modeled in core >>>>> and extend when reasonable useful summary methods at higher levels. >>>>> You should be able to query mass of a peptide and have it valid for an >>>>> amino acid with a PTM which means the amino acid needs to support the >>>>> ability to be modified in a flexible manner. I spent the last year+ >>>>> developing a software suite for peptide detection in MS data for >>>>> deuterium exchange where automated PTM detection was important. Would >>>>> be great to get some focused attention on the core to make sure we can >>>>> model nucleotides and amino acids with a chemistry friendly API. >>>>> >>>>> Thanks >>>>> >>>>> Scooter >>>>> >>>>> On Thu, Feb 24, 2011 at 2:15 PM, George >>>>> Waldon >>>>> ?wrote: >>>>>> >>>>>> Hello Peter& ? ?Andreas >>>>>> >>>>>> I effectively did some work on these methods, mostly fixing and adding >>>>>> the >>>>>> ExPASy algorithm that was kindly provided to me. I think it makes a >>>>>> lot >>>>>> of >>>>>> sense to port all physico-chemical property calculations related to >>>>>> amino >>>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>>> definitively >>>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>>> this is >>>>>> done. Let me know how I could help. >>>>>> >>>>>> Thanks >>>>>> George >>>>>> >>>>>> Quoting Peter Troshin: >>>>>> >>>>>>> Hi Andreas, >>>>>>> >>>>>>> In fact I'd be happy to help with the development of the tools for >>>>>>> simple >>>>>>> physico-chemical properties calculation for peptides. We could port >>>>>>> George?s >>>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can >>>>>>> also >>>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>>> work >>>>>>> would have benefited from having these calculations readily >>>>>>> available. >>>>>>> >>>>>>> I was thinking about participation in the Google Summer of Code >>>>>>> (GoSC) >>>>>>> this year as a mentor, and I think this would be an easy project for >>>>>>> a >>>>>>> student. What do you think about this? >>>>>>> >>>>>>> Thank you for your prompt reply. >>>>>>> >>>>>>> Regards, >>>>>>> Peter >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>>> >>>>>>>> Hi Peter, >>>>>>>> >>>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>>> like to port this to biojava 3 as well.. George do you want to help >>>>>>>> me >>>>>>>> with that, since you are one of the authors of this package? The >>>>>>>> basic >>>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>>> class) >>>>>>>> >>>>>>>> Andreas >>>>>>>> >>>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>>> Troshin >>>>>>>> ?wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>>> org.biojava.bio.proteomics package, which had methods for >>>>>>>>> isoelectric >>>>>>>>> point >>>>>>>>> and molecular weight calculations for peptides. I could not find >>>>>>>>> this >>>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>>> wonder >>>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>>> of >>>>>>>>> BioJava? >>>>>>>>> >>>>>>>>> Thank you for your help, >>>>>>>>> >>>>>>>>> Kind regards, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> Dr Peter Troshin >>>>>>>>> Bioinformatics Software Developer >>>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>>> The Barton Group >>>>>>>>> College of Life Sciences >>>>>>>>> Medical Sciences Institute >>>>>>>>> University of Dundee >>>>>>>>> Dundee >>>>>>>>> DD1 5EH >>>>>>>>> UK >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>> >>>> >>> >> >> > > > From p.v.troshin at dundee.ac.uk Tue Mar 8 08:08:22 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 08 Mar 2011 13:08:22 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> Message-ID: <4D762A46.3090204@dundee.ac.uk> Hi Scooter, Great! Please feel free to update the proposal page accordingly! http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals Regards, Peter On 08/03/2011 11:44, Scooter Willis wrote: > Peter > > Happy to co-mentor and make sure everything gets integrated properly > into either core or another module. > > Thanks > > Scooter > > On Tue, Mar 8, 2011 at 6:15 AM, Peter Troshin wrote: >> Hi guys, >> >> Follow the invitation from Robert, I now registered this idea on the GSoC >> page for BioJava >> >> http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals >> >> I wonder if anyone of you fancy co-mentoring a student? >> It would be good to have someone with up-to-date knowledge of BioJava to >> ensure that all the appropriate data structures are used. My own knowledge >> of BioJava is a bit rusty. >> >> Kind regards, >> Peter >> >> >> On 02/03/2011 05:12, Andreas Prlic wrote: >>> Hi Peter, >>> >>> we still don;t know yet if we will have support from Google again this >>> year. Once we have a confirmation we will use the wiki site again for >>> hosting pages related to GSoC. However we should do this project in >>> any case... >>> >>> Andreas >>> >>> On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin >>> wrote: >>>>>>> What other functionality would you >>>>>>> like to see that is currently not there? >>>> I think that the methods below would be a good starting point, then the >>>> Google Summer of Code student can propose something else that he/she >>>> would >>>> fancy implementing. >>>> >>>> Molecular weight >>>> Extinction coefficient >>>> Instability index >>>> Aliphatic index >>>> Grand Average of Hydropathy >>>> Isoelectric point >>>> Number of amino acids in the protein (His, Met, Cys) >>>> >>>> I know BioJava projects were managed under Open Bioinformatics Foundation >>>> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >>>> somewhere? >>>> >>>> Regards, >>>> Peter >>>> >>>> >>>> On 25/02/2011 05:12, Andreas Prlic wrote: >>>>> Great, seems we have an agreement that we want to improve >>>>> functionality for this. How complex is this going to be? From quickly >>>>> checking the 1.8 source it looks like just a few classes that need to >>>>> be converted and not too painful. What other functionality would you >>>>> like to see that is currently not there? >>>>> >>>>> Andreas >>>>> >>>>> >>>>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis >>>>> wrote: >>>>>> We put in some basics regarding modeling amino acid properties in the >>>>>> core module but really didn't have any pressing use cases to drive the >>>>>> api beyond calculating the mass of a peptide. We currently have >>>>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>>>> get the attributes/features of amino acids properly modeled in core >>>>>> and extend when reasonable useful summary methods at higher levels. >>>>>> You should be able to query mass of a peptide and have it valid for an >>>>>> amino acid with a PTM which means the amino acid needs to support the >>>>>> ability to be modified in a flexible manner. I spent the last year+ >>>>>> developing a software suite for peptide detection in MS data for >>>>>> deuterium exchange where automated PTM detection was important. Would >>>>>> be great to get some focused attention on the core to make sure we can >>>>>> model nucleotides and amino acids with a chemistry friendly API. >>>>>> >>>>>> Thanks >>>>>> >>>>>> Scooter >>>>>> >>>>>> On Thu, Feb 24, 2011 at 2:15 PM, George >>>>>> Waldon >>>>>> wrote: >>>>>>> Hello Peter& Andreas >>>>>>> >>>>>>> I effectively did some work on these methods, mostly fixing and adding >>>>>>> the >>>>>>> ExPASy algorithm that was kindly provided to me. I think it makes a >>>>>>> lot >>>>>>> of >>>>>>> sense to port all physico-chemical property calculations related to >>>>>>> amino >>>>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>>>> definitively >>>>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>>>> this is >>>>>>> done. Let me know how I could help. >>>>>>> >>>>>>> Thanks >>>>>>> George >>>>>>> >>>>>>> Quoting Peter Troshin: >>>>>>> >>>>>>>> Hi Andreas, >>>>>>>> >>>>>>>> In fact I'd be happy to help with the development of the tools for >>>>>>>> simple >>>>>>>> physico-chemical properties calculation for peptides. We could port >>>>>>>> George?s >>>>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can >>>>>>>> also >>>>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>>>> work >>>>>>>> would have benefited from having these calculations readily >>>>>>>> available. >>>>>>>> >>>>>>>> I was thinking about participation in the Google Summer of Code >>>>>>>> (GoSC) >>>>>>>> this year as a mentor, and I think this would be an easy project for >>>>>>>> a >>>>>>>> student. What do you think about this? >>>>>>>> >>>>>>>> Thank you for your prompt reply. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Peter >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>>>> Hi Peter, >>>>>>>>> >>>>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>>>> like to port this to biojava 3 as well.. George do you want to help >>>>>>>>> me >>>>>>>>> with that, since you are one of the authors of this package? The >>>>>>>>> basic >>>>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>>>> class) >>>>>>>>> >>>>>>>>> Andreas >>>>>>>>> >>>>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>>>> Troshin >>>>>>>>> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>>>> org.biojava.bio.proteomics package, which had methods for >>>>>>>>>> isoelectric >>>>>>>>>> point >>>>>>>>>> and molecular weight calculations for peptides. I could not find >>>>>>>>>> this >>>>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>>>> wonder >>>>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>>>> of >>>>>>>>>> BioJava? >>>>>>>>>> >>>>>>>>>> Thank you for your help, >>>>>>>>>> >>>>>>>>>> Kind regards, >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>> Dr Peter Troshin >>>>>>>>>> Bioinformatics Software Developer >>>>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>>>> The Barton Group >>>>>>>>>> College of Life Sciences >>>>>>>>>> Medical Sciences Institute >>>>>>>>>> University of Dundee >>>>>>>>>> Dundee >>>>>>>>>> DD1 5EH >>>>>>>>>> UK >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>> >>> >> >> From uchathuranga at gmail.com Tue Mar 8 08:31:55 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Tue, 8 Mar 2011 19:01:55 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D762A46.3090204@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> Message-ID: Hi Peter, On Tue, Mar 8, 2011 at 6:38 PM, Peter Troshin wrote: > Hi Scooter, > > Great! Please feel free to update the proposal page accordingly! > > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > Regards, > Peter > > > > On 08/03/2011 11:44, Scooter Willis wrote: > >> Peter >> >> Happy to co-mentor and make sure everything gets integrated properly >> into either core or another module. >> >> Thanks >> >> Scooter >> >> On Tue, Mar 8, 2011 at 6:15 AM, Peter Troshin >> wrote: >> >>> Hi guys, >>> >>> Follow the invitation from Robert, I now registered this idea on the GSoC >>> page for BioJava >>> >>> http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals >>> >>> I wonder if anyone of you fancy co-mentoring a student? >>> It would be good to have someone with up-to-date knowledge of BioJava to >>> ensure that all the appropriate data structures are used. My own >>> knowledge >>> of BioJava is a bit rusty. >>> >>> Kind regards, >>> Peter >>> >>> >>> On 02/03/2011 05:12, Andreas Prlic wrote: >>> >>>> Hi Peter, >>>> >>>> we still don;t know yet if we will have support from Google again this >>>> year. Once we have a confirmation we will use the wiki site again for >>>> hosting pages related to GSoC. However we should do this project in >>>> any case... >>>> >>>> Andreas >>>> >>>> On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin>>> > >>>> wrote: >>>> >>>>> What other functionality would you >>>>>>>> like to see that is currently not there? >>>>>>>> >>>>>>> I think that the methods below would be a good starting point, then >>>>> the >>>>> Google Summer of Code student can propose something else that he/she >>>>> would >>>>> fancy implementing. >>>>> >>>>> Molecular weight >>>>> Extinction coefficient >>>>> Instability index >>>>> Aliphatic index >>>>> Grand Average of Hydropathy >>>>> Isoelectric point >>>>> Number of amino acids in the protein (His, Met, Cys) >>>>> >>>>> I know BioJava projects were managed under Open Bioinformatics >>>>> Foundation >>>>> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >>>>> somewhere? >>>>> >>>>> Regards, >>>>> Peter >>>>> >>>>> >>>>> On 25/02/2011 05:12, Andreas Prlic wrote: >>>>> >>>>>> Great, seems we have an agreement that we want to improve >>>>>> functionality for this. How complex is this going to be? From quickly >>>>>> checking the 1.8 source it looks like just a few classes that need to >>>>>> be converted and not too painful. What other functionality would you >>>>>> like to see that is currently not there? >>>>>> >>>>>> Andreas >>>>>> >>>>>> >>>>>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis >>>>>> wrote: >>>>>> >>>>>>> We put in some basics regarding modeling amino acid properties in the >>>>>>> core module but really didn't have any pressing use cases to drive >>>>>>> the >>>>>>> api beyond calculating the mass of a peptide. We currently have >>>>>>> getMolecularWeight() as a method in AbstractCompound but never added >>>>>>> a >>>>>>> getSequenceMolecularWeight() to AbstractSequence. It would be great >>>>>>> to >>>>>>> get the attributes/features of amino acids properly modeled in core >>>>>>> and extend when reasonable useful summary methods at higher levels. >>>>>>> You should be able to query mass of a peptide and have it valid for >>>>>>> an >>>>>>> amino acid with a PTM which means the amino acid needs to support the >>>>>>> ability to be modified in a flexible manner. I spent the last year+ >>>>>>> developing a software suite for peptide detection in MS data for >>>>>>> deuterium exchange where automated PTM detection was important. Would >>>>>>> be great to get some focused attention on the core to make sure we >>>>>>> can >>>>>>> model nucleotides and amino acids with a chemistry friendly API. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Scooter >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 2:15 PM, George >>>>>>> Waldon >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Peter& Andreas >>>>>>>> >>>>>>>> I effectively did some work on these methods, mostly fixing and >>>>>>>> adding >>>>>>>> the >>>>>>>> ExPASy algorithm that was kindly provided to me. I think it makes a >>>>>>>> lot >>>>>>>> of >>>>>>>> sense to port all physico-chemical property calculations related to >>>>>>>> amino >>>>>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>>>>> definitively >>>>>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>>>>> this is >>>>>>>> done. Let me know how I could help. >>>>>>>> >>>>>>>> Thanks >>>>>>>> George >>>>>>>> >>>>>>>> Quoting Peter Troshin: >>>>>>>> >>>>>>>> Hi Andreas, >>>>>>>>> >>>>>>>>> In fact I'd be happy to help with the development of the tools for >>>>>>>>> simple >>>>>>>>> physico-chemical properties calculation for peptides. We could port >>>>>>>>> George?s >>>>>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can >>>>>>>>> also >>>>>>>>> provide a few other methods. A couple of projects in the lab where >>>>>>>>> I >>>>>>>>> work >>>>>>>>> would have benefited from having these calculations readily >>>>>>>>> available. >>>>>>>>> >>>>>>>>> I was thinking about participation in the Google Summer of Code >>>>>>>>> (GoSC) >>>>>>>>> this year as a mentor, and I think this would be an easy project >>>>>>>>> for >>>>>>>>> a >>>>>>>>> student. What do you think about this? >>>>>>>>> >>>>>>>>> Thank you for your prompt reply. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>>>> >>>>>>>>>> Hi Peter, >>>>>>>>>> >>>>>>>>>> if you get a copy of biojava 1.8, it is still there. However I >>>>>>>>>> would >>>>>>>>>> like to port this to biojava 3 as well.. George do you want to >>>>>>>>>> help >>>>>>>>>> me >>>>>>>>>> with that, since you are one of the authors of this package? The >>>>>>>>>> basic >>>>>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. >>>>>>>>>> Element >>>>>>>>>> class) >>>>>>>>>> >>>>>>>>>> Andreas >>>>>>>>>> >>>>>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>>>>> Troshin >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>>>>> org.biojava.bio.proteomics package, which had methods for >>>>>>>>>>> isoelectric >>>>>>>>>>> point >>>>>>>>>>> and molecular weight calculations for peptides. I could not find >>>>>>>>>>> this >>>>>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods >>>>>>>>>>> and >>>>>>>>>>> wonder >>>>>>>>>>> if there are any equivalent methods available in the latest >>>>>>>>>>> version >>>>>>>>>>> of >>>>>>>>>>> BioJava? >>>>>>>>>>> >>>>>>>>>>> Thank you for your help, >>>>>>>>>>> >>>>>>>>>>> Kind regards, >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> Dr Peter Troshin >>>>>>>>>>> Bioinformatics Software Developer >>>>>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>>>>> The Barton Group >>>>>>>>>>> College of Life Sciences >>>>>>>>>>> Medical Sciences Institute >>>>>>>>>>> University of Dundee >>>>>>>>>>> Dundee >>>>>>>>>>> DD1 5EH >>>>>>>>>>> UK >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>>>>> >>>> >>> >>> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From uchathuranga at gmail.com Tue Mar 8 08:42:32 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Tue, 8 Mar 2011 19:12:32 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D762A46.3090204@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> Message-ID: Hi Peter, I am a student from university of moratuwa reading for my engineering degree in Computer Science Engineering and I am planning to participate in this year GSoC.I went through your project idea and sound like it perfect idea for me as I have done a bioinformatics course for my degree. What are the areas that I have study apart from the one you have mention in the idea?. Regards Udana On Tue, Mar 8, 2011 at 6:38 PM, Peter Troshin wrote: > Hi Scooter, > > Great! Please feel free to update the proposal page accordingly! > > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > Regards, > Peter > > > > On 08/03/2011 11:44, Scooter Willis wrote: > >> Peter >> >> Happy to co-mentor and make sure everything gets integrated properly >> into either core or another module. >> >> Thanks >> >> Scooter >> >> On Tue, Mar 8, 2011 at 6:15 AM, Peter Troshin >> wrote: >> >>> Hi guys, >>> >>> Follow the invitation from Robert, I now registered this idea on the GSoC >>> page for BioJava >>> >>> http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals >>> >>> I wonder if anyone of you fancy co-mentoring a student? >>> It would be good to have someone with up-to-date knowledge of BioJava to >>> ensure that all the appropriate data structures are used. My own >>> knowledge >>> of BioJava is a bit rusty. >>> >>> Kind regards, >>> Peter >>> >>> >>> On 02/03/2011 05:12, Andreas Prlic wrote: >>> >>>> Hi Peter, >>>> >>>> we still don;t know yet if we will have support from Google again this >>>> year. Once we have a confirmation we will use the wiki site again for >>>> hosting pages related to GSoC. However we should do this project in >>>> any case... >>>> >>>> Andreas >>>> >>>> On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin>>> > >>>> wrote: >>>> >>>>> What other functionality would you >>>>>>>> like to see that is currently not there? >>>>>>>> >>>>>>> I think that the methods below would be a good starting point, then >>>>> the >>>>> Google Summer of Code student can propose something else that he/she >>>>> would >>>>> fancy implementing. >>>>> >>>>> Molecular weight >>>>> Extinction coefficient >>>>> Instability index >>>>> Aliphatic index >>>>> Grand Average of Hydropathy >>>>> Isoelectric point >>>>> Number of amino acids in the protein (His, Met, Cys) >>>>> >>>>> I know BioJava projects were managed under Open Bioinformatics >>>>> Foundation >>>>> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >>>>> somewhere? >>>>> >>>>> Regards, >>>>> Peter >>>>> >>>>> >>>>> On 25/02/2011 05:12, Andreas Prlic wrote: >>>>> >>>>>> Great, seems we have an agreement that we want to improve >>>>>> functionality for this. How complex is this going to be? From quickly >>>>>> checking the 1.8 source it looks like just a few classes that need to >>>>>> be converted and not too painful. What other functionality would you >>>>>> like to see that is currently not there? >>>>>> >>>>>> Andreas >>>>>> >>>>>> >>>>>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis >>>>>> wrote: >>>>>> >>>>>>> We put in some basics regarding modeling amino acid properties in the >>>>>>> core module but really didn't have any pressing use cases to drive >>>>>>> the >>>>>>> api beyond calculating the mass of a peptide. We currently have >>>>>>> getMolecularWeight() as a method in AbstractCompound but never added >>>>>>> a >>>>>>> getSequenceMolecularWeight() to AbstractSequence. It would be great >>>>>>> to >>>>>>> get the attributes/features of amino acids properly modeled in core >>>>>>> and extend when reasonable useful summary methods at higher levels. >>>>>>> You should be able to query mass of a peptide and have it valid for >>>>>>> an >>>>>>> amino acid with a PTM which means the amino acid needs to support the >>>>>>> ability to be modified in a flexible manner. I spent the last year+ >>>>>>> developing a software suite for peptide detection in MS data for >>>>>>> deuterium exchange where automated PTM detection was important. Would >>>>>>> be great to get some focused attention on the core to make sure we >>>>>>> can >>>>>>> model nucleotides and amino acids with a chemistry friendly API. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Scooter >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 2:15 PM, George >>>>>>> Waldon >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Peter& Andreas >>>>>>>> >>>>>>>> I effectively did some work on these methods, mostly fixing and >>>>>>>> adding >>>>>>>> the >>>>>>>> ExPASy algorithm that was kindly provided to me. I think it makes a >>>>>>>> lot >>>>>>>> of >>>>>>>> sense to port all physico-chemical property calculations related to >>>>>>>> amino >>>>>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>>>>> definitively >>>>>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>>>>> this is >>>>>>>> done. Let me know how I could help. >>>>>>>> >>>>>>>> Thanks >>>>>>>> George >>>>>>>> >>>>>>>> Quoting Peter Troshin: >>>>>>>> >>>>>>>> Hi Andreas, >>>>>>>>> >>>>>>>>> In fact I'd be happy to help with the development of the tools for >>>>>>>>> simple >>>>>>>>> physico-chemical properties calculation for peptides. We could port >>>>>>>>> George?s >>>>>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can >>>>>>>>> also >>>>>>>>> provide a few other methods. A couple of projects in the lab where >>>>>>>>> I >>>>>>>>> work >>>>>>>>> would have benefited from having these calculations readily >>>>>>>>> available. >>>>>>>>> >>>>>>>>> I was thinking about participation in the Google Summer of Code >>>>>>>>> (GoSC) >>>>>>>>> this year as a mentor, and I think this would be an easy project >>>>>>>>> for >>>>>>>>> a >>>>>>>>> student. What do you think about this? >>>>>>>>> >>>>>>>>> Thank you for your prompt reply. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>>>> >>>>>>>>>> Hi Peter, >>>>>>>>>> >>>>>>>>>> if you get a copy of biojava 1.8, it is still there. However I >>>>>>>>>> would >>>>>>>>>> like to port this to biojava 3 as well.. George do you want to >>>>>>>>>> help >>>>>>>>>> me >>>>>>>>>> with that, since you are one of the authors of this package? The >>>>>>>>>> basic >>>>>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. >>>>>>>>>> Element >>>>>>>>>> class) >>>>>>>>>> >>>>>>>>>> Andreas >>>>>>>>>> >>>>>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>>>>> Troshin >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>>>>> org.biojava.bio.proteomics package, which had methods for >>>>>>>>>>> isoelectric >>>>>>>>>>> point >>>>>>>>>>> and molecular weight calculations for peptides. I could not find >>>>>>>>>>> this >>>>>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods >>>>>>>>>>> and >>>>>>>>>>> wonder >>>>>>>>>>> if there are any equivalent methods available in the latest >>>>>>>>>>> version >>>>>>>>>>> of >>>>>>>>>>> BioJava? >>>>>>>>>>> >>>>>>>>>>> Thank you for your help, >>>>>>>>>>> >>>>>>>>>>> Kind regards, >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> Dr Peter Troshin >>>>>>>>>>> Bioinformatics Software Developer >>>>>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>>>>> The Barton Group >>>>>>>>>>> College of Life Sciences >>>>>>>>>>> Medical Sciences Institute >>>>>>>>>>> University of Dundee >>>>>>>>>>> Dundee >>>>>>>>>>> DD1 5EH >>>>>>>>>>> UK >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>>>>> >>>> >>> >>> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From p.v.troshin at dundee.ac.uk Tue Mar 8 09:21:18 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 08 Mar 2011 14:21:18 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> Message-ID: <4D763B5E.5040700@dundee.ac.uk> Dear M. Rehan, I am happy to hear that you'd like to take my idea forward and I wish you the best of luck with your GSoC application. However please bear in mind that 1) OBF may not be accepted as a mentor organisation this year 2) my idea may not be funded even if the OBF will be accepted as a mentor organisation. 3) You as a student may not be accepted by Google (you have to make an application to them on your own) 4) You may not be the best candidate for the project 5) I have no say for the most of the above. I will be happy to have your as a student once we get to this stage but I feel that right now any requests for the supervision is a little preliminary. You can find out how to apply to GSoC here: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs Best of luck, Peter On 08/03/2011 13:54, M. Rehan Shaukat wrote: > Dear Peter, > > After participating in Google SoC 2010 last year I am looking forward > to participate again in SoC 2011. I was following this thread > regarding the "Amino acids physico-chemical properties calculation" > idea and I also read the details of GSoC page for this project idea. > This idea sounds very interesting to me and also matches my interests > and experience (Optimisation, Multi-threading, High Performance > Computing). I have passion for work and contribute in open source > projects. I am linked with Medical Research Council (Harwell, UK) and > contributing to Europhenome (An open > source system for handling large datasets and analysing as well as > annotating mouse phenotyping data) and EUMODIC > projects in collaboration with HMGU, > Germany; ICS, France;and the Sanger Institute, UK. > > During my Masters thesis, I worked on a project: "Using Cell > Processors to Speed up Phylogenetic Inference" that aimed on > optimising a compute-intensive Bioinformatics application on Cell > Broadband Engine using IBM Cell Broadband Engine SDK and MPI. I have > over 4 years of research+industrial software development experience. I > have worked on different programming languages (mainly: Java, C/C++, > PHP, XML) and variety of tools and frameworks (J2EE, JUnit, Hibernate, > Spring, JMS, JMX, CORBA, RMI, Eclipse, Netbeans, SVN, CVS and more). > > I am interested in working on this project under your supervision. I > have plenty of similar experience and would be grateful for your kind > supervision. > > Please find my CV attached. > > Thank you & Best Regards, > > Muhammad Rehan Shaukat > Bioinformatician > Medical Research Council, Harwell > Mammalian Genetics Unit > Harwell Science and Innovation Campus > Oxfordshire > OX11 0RD > www.har.mrc.ac.uk > > On 8 March 2011 13:08, Peter Troshin > wrote: > > Hi Scooter, > > Great! Please feel free to update the proposal page accordingly! > > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > Regards, > Peter > > > > On 08/03/2011 11:44, Scooter Willis wrote: > > Peter > > Happy to co-mentor and make sure everything gets integrated > properly > into either core or another module. > > Thanks > > Scooter > > On Tue, Mar 8, 2011 at 6:15 AM, Peter > Troshin > wrote: > > Hi guys, > > Follow the invitation from Robert, I now registered this > idea on the GSoC > page for BioJava > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > I wonder if anyone of you fancy co-mentoring a student? > It would be good to have someone with up-to-date knowledge > of BioJava to > ensure that all the appropriate data structures are used. > My own knowledge > of BioJava is a bit rusty. > > Kind regards, > Peter > > > On 02/03/2011 05:12, Andreas Prlic wrote: > > Hi Peter, > > we still don;t know yet if we will have support from > Google again this > year. Once we have a confirmation we will use the wiki > site again for > hosting pages related to GSoC. However we should do > this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter > Troshin > > wrote: > > What other functionality would you > like to see that is currently not there? > > I think that the methods below would be a good > starting point, then the > Google Summer of Code student can propose > something else that he/she > would > fancy implementing. > > Molecular weight > Extinction coefficient > Instability index > Aliphatic index > Grand Average of Hydropathy > Isoelectric point > Number of amino acids in the protein (His, Met, Cys) > > I know BioJava projects were managed under Open > Bioinformatics Foundation > (OBF) during last years GSoC. Is there a page for > this year GSoC ideas > somewhere? > > Regards, > Peter > > > On 25/02/2011 05:12, Andreas Prlic wrote: > > Great, seems we have an agreement that we want > to improve > functionality for this. How complex is this > going to be? From quickly > checking the 1.8 source it looks like just a > few classes that need to > be converted and not too painful. What other > functionality would you > like to see that is currently not there? > > Andreas > > > On Thu, Feb 24, 2011 at 8:08 PM, Scooter > Willis> > wrote: > > We put in some basics regarding modeling > amino acid properties in the > core module but really didn't have any > pressing use cases to drive the > api beyond calculating the mass of a > peptide. We currently have > getMolecularWeight() as a method in > AbstractCompound but never added a > getSequenceMolecularWeight() to > AbstractSequence. It would be great to > get the attributes/features of amino acids > properly modeled in core > and extend when reasonable useful summary > methods at higher levels. > You should be able to query mass of a > peptide and have it valid for an > amino acid with a PTM which means the > amino acid needs to support the > ability to be modified in a flexible > manner. I spent the last year+ > developing a software suite for peptide > detection in MS data for > deuterium exchange where automated PTM > detection was important. Would > be great to get some focused attention on > the core to make sure we can > model nucleotides and amino acids with a > chemistry friendly API. > > Thanks > > Scooter > > On Thu, Feb 24, 2011 at 2:15 PM, George > Waldon > > wrote: > > Hello Peter& Andreas > > I effectively did some work on these > methods, mostly fixing and adding > the > ExPASy algorithm that was kindly > provided to me. I think it makes a > lot > of > sense to port all physico-chemical > property calculations related to > amino > acids and polypeptides to bj3, as > suggested by Andreas, and I > definitively > support the effort. We could smoothly > deprecate the bj1 package when > this is > done. Let me know how I could help. > > Thanks > George > > Quoting Peter > Troshin >: > > Hi Andreas, > > In fact I'd be happy to help with > the development of the tools for > simple > physico-chemical properties > calculation for peptides. We could > port > George?s > code (assuming he is happy with > this) from BioJava 1.8 but we can > also > provide a few other methods. A > couple of projects in the lab where I > work > would have benefited from having > these calculations readily > available. > > I was thinking about participation > in the Google Summer of Code > (GoSC) > this year as a mentor, and I think > this would be an easy project for > a > student. What do you think about this? > > Thank you for your prompt reply. > > Regards, > Peter > > > > On 24/02/2011 16:54, Andreas Prlic > wrote: > > Hi Peter, > > if you get a copy of biojava > 1.8, it is still there. > However I would > like to port this to biojava 3 > as well.. George do you want > to help > me > with that, since you are one > of the authors of this > package? The > basic > support for chemistry in > BioJava 3 is a bit better... > (e.g. Element > class) > > Andreas > > On Thu, Feb 24, 2011 at 7:33 > AM, Peter > Troshin > > wrote: > > Hi, > > I've noticed that BioJava > up to about version 1.7 had an > org.biojava.bio.proteomics > package, which had methods for > isoelectric > point > and molecular weight > calculations for peptides. > I could not find > this > package in the BioJava > 3.0.1 API. I?d like to use > these methods and > wonder > if there are any > equivalent methods > available in the latest > version > of > BioJava? > > Thank you for your help, > > Kind regards, > Peter > > Dr Peter Troshin > Bioinformatics Software > Developer > Phone: +44 (0)1382 388589 > Fax: +44 (0)1382 385764 > The Barton Group > College of Life Sciences > Medical Sciences Institute > University of Dundee > Dundee > DD1 5EH > UK > > > > _______________________________________________ > Biojava-l mailing list - > Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - > Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From p.v.troshin at dundee.ac.uk Tue Mar 8 09:01:01 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 08 Mar 2011 14:01:01 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> Message-ID: <4D76369D.8060403@dundee.ac.uk> Hi Udana, I'd suggest looking at the BioJava 1.8 code for isoelectric point and molecular weight calculations for peptides, as this is something you would need to port to BioJava3. Study BioJava 3 API, in particular examine the data structures for representing peptides. For more ideas look at http://www.expasy.org/tools/ web site. Hope that helps. Good luck with your application! Regards, Peter On 08/03/2011 13:42, udana chathuranga wrote: > Hi Peter, > > I am a student from university of moratuwa reading for my engineering > degree in Computer Science Engineering and I am planning to > participate in this year GSoC.I went through your project idea and > sound like it perfect idea for me as I have done a bioinformatics > course for my degree. > > What are the areas that I have study apart from the one you have > mention in the idea?. > > Regards > Udana > > On Tue, Mar 8, 2011 at 6:38 PM, Peter Troshin > > wrote: > > Hi Scooter, > > Great! Please feel free to update the proposal page accordingly! > > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > Regards, > Peter > > > > On 08/03/2011 11:44, Scooter Willis wrote: > > Peter > > Happy to co-mentor and make sure everything gets integrated > properly > into either core or another module. > > Thanks > > Scooter > > On Tue, Mar 8, 2011 at 6:15 AM, Peter > Troshin > wrote: > > Hi guys, > > Follow the invitation from Robert, I now registered this > idea on the GSoC > page for BioJava > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > I wonder if anyone of you fancy co-mentoring a student? > It would be good to have someone with up-to-date knowledge > of BioJava to > ensure that all the appropriate data structures are used. > My own knowledge > of BioJava is a bit rusty. > > Kind regards, > Peter > > > On 02/03/2011 05:12, Andreas Prlic wrote: > > Hi Peter, > > we still don;t know yet if we will have support from > Google again this > year. Once we have a confirmation we will use the wiki > site again for > hosting pages related to GSoC. However we should do > this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter > Troshin > > wrote: > > What other functionality would you > like to see that is currently not there? > > I think that the methods below would be a good > starting point, then the > Google Summer of Code student can propose > something else that he/she > would > fancy implementing. > > Molecular weight > Extinction coefficient > Instability index > Aliphatic index > Grand Average of Hydropathy > Isoelectric point > Number of amino acids in the protein (His, Met, Cys) > > I know BioJava projects were managed under Open > Bioinformatics Foundation > (OBF) during last years GSoC. Is there a page for > this year GSoC ideas > somewhere? > > Regards, > Peter > > > On 25/02/2011 05:12, Andreas Prlic wrote: > > Great, seems we have an agreement that we want > to improve > functionality for this. How complex is this > going to be? From quickly > checking the 1.8 source it looks like just a > few classes that need to > be converted and not too painful. What other > functionality would you > like to see that is currently not there? > > Andreas > > > On Thu, Feb 24, 2011 at 8:08 PM, Scooter > Willis> > wrote: > > We put in some basics regarding modeling > amino acid properties in the > core module but really didn't have any > pressing use cases to drive the > api beyond calculating the mass of a > peptide. We currently have > getMolecularWeight() as a method in > AbstractCompound but never added a > getSequenceMolecularWeight() to > AbstractSequence. It would be great to > get the attributes/features of amino acids > properly modeled in core > and extend when reasonable useful summary > methods at higher levels. > You should be able to query mass of a > peptide and have it valid for an > amino acid with a PTM which means the > amino acid needs to support the > ability to be modified in a flexible > manner. I spent the last year+ > developing a software suite for peptide > detection in MS data for > deuterium exchange where automated PTM > detection was important. Would > be great to get some focused attention on > the core to make sure we can > model nucleotides and amino acids with a > chemistry friendly API. > > Thanks > > Scooter > > On Thu, Feb 24, 2011 at 2:15 PM, George > Waldon > > wrote: > > Hello Peter& Andreas > > I effectively did some work on these > methods, mostly fixing and adding > the > ExPASy algorithm that was kindly > provided to me. I think it makes a > lot > of > sense to port all physico-chemical > property calculations related to > amino > acids and polypeptides to bj3, as > suggested by Andreas, and I > definitively > support the effort. We could smoothly > deprecate the bj1 package when > this is > done. Let me know how I could help. > > Thanks > George > > Quoting Peter > Troshin >: > > Hi Andreas, > > In fact I'd be happy to help with > the development of the tools for > simple > physico-chemical properties > calculation for peptides. We could > port > George?s > code (assuming he is happy with > this) from BioJava 1.8 but we can > also > provide a few other methods. A > couple of projects in the lab where I > work > would have benefited from having > these calculations readily > available. > > I was thinking about participation > in the Google Summer of Code > (GoSC) > this year as a mentor, and I think > this would be an easy project for > a > student. What do you think about this? > > Thank you for your prompt reply. > > Regards, > Peter > > > > On 24/02/2011 16:54, Andreas Prlic > wrote: > > Hi Peter, > > if you get a copy of biojava > 1.8, it is still there. > However I would > like to port this to biojava 3 > as well.. George do you want > to help > me > with that, since you are one > of the authors of this > package? The > basic > support for chemistry in > BioJava 3 is a bit better... > (e.g. Element > class) > > Andreas > > On Thu, Feb 24, 2011 at 7:33 > AM, Peter > Troshin > > wrote: > > Hi, > > I've noticed that BioJava > up to about version 1.7 had an > org.biojava.bio.proteomics > package, which had methods for > isoelectric > point > and molecular weight > calculations for peptides. > I could not find > this > package in the BioJava > 3.0.1 API. I?d like to use > these methods and > wonder > if there are any > equivalent methods > available in the latest > version > of > BioJava? > > Thank you for your help, > > Kind regards, > Peter > > Dr Peter Troshin > Bioinformatics Software > Developer > Phone: +44 (0)1382 388589 > Fax: +44 (0)1382 385764 > The Barton Group > College of Life Sciences > Medical Sciences Institute > University of Dundee > Dundee > DD1 5EH > UK > > > > _______________________________________________ > Biojava-l mailing list - > Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - > Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From fuyu12345 at gmail.com Wed Mar 9 04:53:19 2011 From: fuyu12345 at gmail.com (Richard Fu) Date: Wed, 9 Mar 2011 17:53:19 +0800 Subject: [Biojava-l] About translation to Chinese In-Reply-To: References: Message-ID: As a Chinese student interested in biojava, I noticed that there is not a Chinese version of documentation of biojava 3.0. Thus I would like to translate some of the javadoc or cookbook. Which part is the essential part and should be set priority to? Maybe I can help popularize biojava among more Chinese scientists and students who are enthusiastic about it. From flf.mib at gmail.com Wed Mar 9 15:12:46 2011 From: flf.mib at gmail.com (=?ISO-8859-1?Q?Fran=E7ois_Le_Fevre?=) Date: Wed, 09 Mar 2011 21:12:46 +0100 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature Message-ID: <4D77DF3E.4000609@gmail.com> Dear all, I would like to know if 2 proteins that have the same sequence of aminoacid should be equals? I was wandering that 2 Sequence that have exactly the same string signature should be the same. But it seems to be not the case. Is it normal? Thank for your help. Sequence protein1 = MKRISTTITTTITITTGNGAG hash=801818331 Sequence protein2 = MKRISTTITTTITITTGNGAG hash=700804192 protein1.equals(protein2) return false. Francois -- ---------------------- Francois LE FEVRE From ayates at ebi.ac.uk Wed Mar 9 15:27:39 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 9 Mar 2011 20:27:39 +0000 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: <4D77DF3E.4000609@gmail.com> References: <4D77DF3E.4000609@gmail.com> Message-ID: <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> Hi Francois, Neither Compounds nor Sequences have an over-ridden equals() & hashcode() method which is why you're seeing the current behaviour. Andy On 9 Mar 2011, at 20:12, Fran?ois Le Fevre wrote: > Dear all, > > I would like to know if 2 proteins that have the same sequence of aminoacid should be equals? > I was wandering that 2 Sequence that have exactly the same string signature should be the same. > But it seems to be not the case. > Is it normal? > > Thank for your help. > > > Sequence protein1 = MKRISTTITTTITITTGNGAG hash=801818331 > Sequence protein2 = MKRISTTITTTITITTGNGAG hash=700804192 > > protein1.equals(protein2) return false. > > Francois > > -- > ---------------------- > Francois LE FEVRE > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andreas at sdsc.edu Wed Mar 9 20:02:11 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 9 Mar 2011 17:02:11 -0800 Subject: [Biojava-l] About translation to Chinese In-Reply-To: References: Message-ID: Hi Richard, great that you want to work on the translation. I don't think there is any part that is more important than others, so ideally you would cover it all... The BioJava 1.7 (legacy) cookbook has been translated into several languages, including simplified Chinese. Probably the best maintained translation is the one by Sylvain Foisy into French. Thanks for volunteering, Andreas On Wed, Mar 9, 2011 at 1:53 AM, Richard Fu wrote: > As a Chinese student interested in biojava, I noticed that there is not a > Chinese version of documentation of biojava 3.0. Thus I would like to > translate some of the javadoc or cookbook. Which part is the essential part > and should be set priority to? Maybe I can help popularize biojava among > more Chinese scientists and students who are enthusiastic about it. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Wed Mar 9 20:04:55 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 9 Mar 2011 17:04:55 -0800 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> Message-ID: Hi Fran?ois, you could try to compare the string representation of the sequences... Andreas 2011/3/9 Andy Yates : > Hi Francois, > > Neither Compounds nor Sequences have an over-ridden equals() & hashcode() method which is why you're seeing the current behaviour. > > Andy > > On 9 Mar 2011, at 20:12, Fran?ois Le Fevre wrote: > >> Dear all, >> >> I would like to know if 2 proteins that have the same sequence of aminoacid should be equals? >> I was wandering that 2 Sequence that have exactly the same string signature should be the same. >> But it seems to be not the case. >> Is it normal? >> >> Thank for your help. >> >> >> Sequence protein1 = MKRISTTITTTITITTGNGAG ? ?hash=801818331 >> Sequence protein2 = MKRISTTITTTITITTGNGAG ? ?hash=700804192 >> >> protein1.equals(protein2) return false. >> >> Francois >> >> -- >> ---------------------- >> Francois LE FEVRE >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas.draeger at uni-tuebingen.de Thu Mar 10 01:42:57 2011 From: andreas.draeger at uni-tuebingen.de (Andreas Draeger) Date: Thu, 10 Mar 2011 07:42:57 +0100 Subject: [Biojava-l] About translation to Chinese In-Reply-To: References: Message-ID: <4D7872F1.6050706@uni-tuebingen.de> > great that you want to work on the translation. I don't think there is > any part that is more important than others, so ideally you would > cover it all... The BioJava 1.7 (legacy) cookbook has been translated > into several languages, including simplified Chinese. Probably the > best maintained translation is the one by Sylvain Foisy into French. Hi, Just one comment on that. Wouldn't it be of more benefit to try to exclude all String messages from BioJava, such as warnings or error messages, to gather these in XML files and to provide a Chinese version? Maybe, this could be a good starting point that others may volunteer and contribute an XML file for their language. In this way, messages from BioJava could be in multiple languages. The structure of these XML files would consist of key-value pairs: Some key to access the String and then the actual String. With the help of ResourceBundle it would be very easy to use it. Cheers Andreas -- Dipl.-Bioinform. Andreas Dr?ger University of T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-78982 Fax: +49-7071-29-5091 From ayates at ebi.ac.uk Thu Mar 10 04:17:45 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 10 Mar 2011 09:17:45 +0000 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> Message-ID: <9D4702B0-F592-4E79-B466-78A1C89F15FB@ebi.ac.uk> I cannot remember the reason why we decided to not include equality for these objects. It's not an unreasonable thing to want though. Assuming I have some time soon I can have a look into implementing it on AbstractCompound, AbstractSequence & the backing stores but it will be some time away. If anyone else wants to give it a shot ... :) Andy On 10 Mar 2011, at 01:04, Andreas Prlic wrote: > Hi Fran?ois, > > you could try to compare the string representation of the sequences... > > Andreas > > 2011/3/9 Andy Yates : >> Hi Francois, >> >> Neither Compounds nor Sequences have an over-ridden equals() & hashcode() method which is why you're seeing the current behaviour. >> >> Andy >> >> On 9 Mar 2011, at 20:12, Fran?ois Le Fevre wrote: >> >>> Dear all, >>> >>> I would like to know if 2 proteins that have the same sequence of aminoacid should be equals? >>> I was wandering that 2 Sequence that have exactly the same string signature should be the same. >>> But it seems to be not the case. >>> Is it normal? >>> >>> Thank for your help. >>> >>> >>> Sequence protein1 = MKRISTTITTTITITTGNGAG hash=801818331 >>> Sequence protein2 = MKRISTTITTTITITTGNGAG hash=700804192 >>> >>> protein1.equals(protein2) return false. >>> >>> Francois >>> >>> -- >>> ---------------------- >>> Francois LE FEVRE >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From flf.mib at gmail.com Thu Mar 10 07:22:24 2011 From: flf.mib at gmail.com (Francois Le Fevre) Date: Thu, 10 Mar 2011 13:22:24 +0100 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> <9D4702B0-F592-4E79-B466-78A1C89F15FB@ebi.ac.uk> Message-ID: This could be great. But for me equals means only s?quence identity and not features. Le 10 mars 2011 10:17, "Andy Yates" a ?crit : I cannot remember the reason why we decided to not include equality for these objects. It's not an unreasonable thing to want though. Assuming I have some time soon I can have a look into implementing it on AbstractCompound, AbstractSequence & the backing stores but it will be some time away. If anyone else wants to give it a shot ... :) Andy On 10 Mar 2011, at 01:04, Andreas Prlic wrote: > Hi Fran?ois, > > you could try to compare the st... -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1... From ayates at ebi.ac.uk Thu Mar 10 07:47:32 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 10 Mar 2011 12:47:32 +0000 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> <9D4702B0-F592-4E79-B466-78A1C89F15FB@ebi.ac.uk> Message-ID: This is where the subject becomes murky & will probably mean that any code written for equals() & hashcode() will have to take them into account where present. However Sequence compound identity would still be available from another method but this will require an extension of the Sequence interface Andy On 10 Mar 2011, at 12:22, Francois Le Fevre wrote: > This could be great. But for me equals means only s?quence identity and not features. > > >> Le 10 mars 2011 10:17, "Andy Yates" a ?crit : >> >> I cannot remember the reason why we decided to not include equality for these objects. It's not an unreasonable thing to want though. Assuming I have some time soon I can have a look into implementing it on AbstractCompound, AbstractSequence & the backing stores but it will be some time away. If anyone else wants to give it a shot ... :) >> >> Andy >> >> On 10 Mar 2011, at 01:04, Andreas Prlic wrote: >> >> > Hi Fran?ois, >> > >> > you could try to compare the st... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1... >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From rmb32 at cornell.edu Thu Mar 10 12:13:31 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 10 Mar 2011 12:13:31 -0500 Subject: [Biojava-l] update Google Summer of Code project ideas Message-ID: <4D7906BB.3030006@cornell.edu> Hi all, Please make sure the BioJava information is up to date for 2011 on both the OBF and BioJava wikis. The current page looks pretty good, just be aware that Google will be looking at it soon to evaluate whether OBF will be accepted again this year to GSoC. OBF wiki page: http://www.open-bio.org/wiki/Google_Summer_of_Code BioJava wiki: http://biojava.org/wiki/Google_Summer_of_Code Rob ---- Robert Buels (prospective) 2011 OBF GSoC Organization Admin From ayates at ebi.ac.uk Fri Mar 11 17:48:34 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 11 Mar 2011 22:48:34 +0000 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> <9D4702B0-F592-4E79-B466-78A1C89F15FB@ebi.ac.uk> Message-ID: <5FC99578-CB7E-4659-968E-01AC8A60FA3E@ebi.ac.uk> Hi Francois, So I've been thinking about this & if we add this to a small set of objects (compounds & compound sets) we can get sequence equality working. This will be done as part of the SequenceMixin class & we can do case sensitive & insensitive versions. We can also do some tricks WRT length and compound sets to reject a pair of sequences without the need to iterate through the sequence. The code will look like SequenceMixin.sequenceEquality(dnaOne, dnaTwo); or SequenceMixin.sequenceEqualityIgnoreCase(dnaOne, dnaTwo); Don't forget you can also use checksums like md5 & sha1 to calculate a value which should be very unlikely to clash (projects like InterPro use this technique to cache results against a very quick lookup). You can do this like: MessageDigest m = MessageDigest.getInstance("MD5"); for(Compound c: seq) { m.update(c.getShortName().getBytes()); } BigInteger i = new BigInteger(1,m.digest()); String md5checksum = String.format("%1$032X", i); HTH Andy On 10 Mar 2011, at 12:47, Andy Yates wrote: > This is where the subject becomes murky & will probably mean that any code written for equals() & hashcode() will have to take them into account where present. However Sequence compound identity would still be available from another method but this will require an extension of the Sequence interface > > Andy > > On 10 Mar 2011, at 12:22, Francois Le Fevre wrote: > >> This could be great. But for me equals means only s?quence identity and not features. >> >> >>> Le 10 mars 2011 10:17, "Andy Yates" a ?crit : >>> >>> I cannot remember the reason why we decided to not include equality for these objects. It's not an unreasonable thing to want though. Assuming I have some time soon I can have a look into implementing it on AbstractCompound, AbstractSequence & the backing stores but it will be some time away. If anyone else wants to give it a shot ... :) >>> >>> Andy >>> >>> On 10 Mar 2011, at 01:04, Andreas Prlic wrote: >>> >>>> Hi Fran?ois, >>>> >>>> you could try to compare the st... >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1... >>> >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From rmb32 at cornell.edu Fri Mar 18 15:25:10 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 18 Mar 2011 15:25:10 -0400 Subject: [Biojava-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4D83B196.2090403@cornell.edu> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2011 FAQ at http://bit.ly/hpoz8W Student applications are due April 8, 2011 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and whom to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2011 Administrator From andreas at sdsc.edu Fri Mar 18 16:52:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 18 Mar 2011 13:52:57 -0700 Subject: [Biojava-l] Problem with Multiple Sequence Alignment in BioJava In-Reply-To: References: Message-ID: Hi Udana, sounds like forester.jar is missing from your classpath.... Andreas On Thu, Feb 10, 2011 at 9:01 AM, udana chathuranga wrote: > hi all, > > When I was going through the biojava cookbook as I was interested in this > project. I tried the example in the page > http://biojava.org/wiki/BioJava:CookBook3:MSA and I got a classnotfound > exception for the line "Profile profile > = Alignments.getMultipleSequenceAlignment(lst);". > > Error Message: > > Exception in thread "main" java.lang.NoClassDefFoundError: > org/forester/phylogenyinference/DistanceMatrix > ? ?at > org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:176) > ? ?at CookbookMSA.multipleSequenceAlignment(CookbookMSA.java:29) > ? ?at CookbookMSA.main(CookbookMSA.java:18) > Caused by: java.lang.ClassNotFoundException: > org.forester.phylogenyinference.DistanceMatrix > ? ?at java.net.URLClassLoader$1.run(Unknown Source) > ? ?at java.security.AccessController.doPrivileged(Native Method) > ? ?at java.net.URLClassLoader.findClass(Unknown Source) > ? ?at java.lang.ClassLoader.loadClass(Unknown Source) > ? ?at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) > ? ?at java.lang.ClassLoader.loadClass(Unknown Source) > ? ?at java.lang.ClassLoader.loadClassInternal(Unknown Source) > > Is this a know issue or Am I doing something wrong with the code? > Help me on this I have attached the java source file that I have tried. > > Thanks > Regards > udana. > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From uchathuranga at gmail.com Sat Mar 19 04:17:26 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Sat, 19 Mar 2011 13:47:26 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D76369D.8060403@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> Message-ID: hi Peter, First of all congratulation to OBF for selecting as a mentoring organization in gsoc 2011.I was little busy last few days because my final year semester one exam and that's why I couldn't reply to your mail. Thanks for the guide in your reply.I have already study about Bio Java 3 API and looked into the tools in the web site you send. I 'd like to know what kind of things I should put in my proposal to make it a good proposal apart from the fact in the topic *When you apply *section in the web site http://www.open-bio.org/wiki/Google_Summer_of_Code.Any hints about that? Thanks Regards udana From shameerainfo at gmail.com Sat Mar 19 13:22:22 2011 From: shameerainfo at gmail.com (Shameera Rathnayaka) Date: Sat, 19 Mar 2011 23:22:22 +0600 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' Message-ID: hi, Im Shameera Rathnayaka, a third year Undergraduate of Department of Computer Science and Engineering University of Moratuwa ,Sri Lanka.I am interested in implementing "Amino acids physico-chemical properties calculation" project as my GSOC 2011 project.I have previous experience in working with Java, MySql and Algorithms from my university projects. As my most recent project i have developed a visual navigation (XVisualNavigator) plugin for openoffice 3.2.1 using java. I am Currently working as an intern at WSO2 which is an open source middle-ware development company. As a starting point to the project i would like to know about the Equations which are needed to these calculations, how can i get those and more about the project. -- Shameera Rathnayaka Undergraduate Department of Computer Science and Engineering University of Moratuwa. Sri Lanka. T.P. 0719221454 From paiualex12 at gmail.com Tue Mar 22 16:38:47 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Tue, 22 Mar 2011 22:38:47 +0200 Subject: [Biojava-l] Help! Gsoc Project Message-ID: Full Name : Alexandru Paiu Country : Romania E-mail : paiualex12 at gmail.com or paiualex12 at yahoo.com Phone Number : 40733924684 Hi . My name is Alexandru Paiu , I'm a student in third year at University Politehnica of Bucharest , Romania , and I really want to participate in Gsoc . I don't have any professional experience in programming because i've never worked in a company . So , I think that this is my chance , of course in gaining real professional experience and I want to practice everything I?ve learned in all this years of study . I was accepted for two summer internships at 2 companies , Java Programming Internships , but they don't offer any sallary so will be kinda hard to work in another town (they are in Bucharest , and I live in Constanta , another town) After all , i'd really like to be accepted in this Gsoc program to gain professional experience , work from home and make some money After searching on the Organisation list , and Project List i found Open Bioinformatics Foundation . It is the only organization that attracted my attention , because I consider that the proposed Projects from BioJava favors me . I want to apply to Amino acids physico-chemical properties calculation Project , because it will be implemented in Java , and especially it uses Threads .*I've never applied for a Gsoc project before so I need some help . I read that i must specify in my application , how may I improve the Project and how should I do it , but I really don't understand what exactly must I do , so I don't have my own Ideeas for now . I really need some informations about the Project . ** *I've started studying Java programming for about one and a half years , alone from Java Books , from internet ( http://download.oracle.com/javase/tutorial/java/index.html) and from school . I've implemented a lot of school projects in Java , and Java Books projects . Three of my Projects that I am proud of are : making a Lanchat Client-Server using TCP/IP , a Lanchat Peer-Peer using UDP multicasting and administrating a Database from a Java Applet with J/Connector and Mysql (you can see the tables and insert/delete/update selected rows from the selected table , and making some reports ). I used Threads in the lanchat projects and in Java game Projects like Brick or Ricochet . I really hope that this is the mail for BioJava and that i will get an answer soon . I haven't found an irc for this departament . Thank you very much for your time Alexandru Paiu From p.v.troshin at dundee.ac.uk Wed Mar 23 07:56:35 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 23 Mar 2011 11:56:35 +0000 Subject: [Biojava-l] Help! Gsoc Project In-Reply-To: References: Message-ID: <4D89DFF3.3040305@dundee.ac.uk> Hello Alexandru, Welcome to the BioJava list! It is good to have you interested in my project. >>> I really need some informations about the Project The ideas behind the project are really simple and most of the methods are not new. So you have a benefit of being able to compare the results of you implementation with others. To get a better understanding of what the project involve I?d suggest you to read more about each of the calculations involved. Google is your friend here. There are plenty of other methods which would be good to add to BioJava, to have a better idea of what that might be I'd suggest you read through the Expasy web site tools section http://expasy.org/tools/ . Please read the documentation for each method and associated papers to have a better idea of what they do. I hope this will get you started. I would also suggest paying a special attention to writing the project plan as through this you can demonstrate your understanding of programming and a task in hands. Kind regards, Peter On 22/03/2011 20:38, Alexandru Paiu wrote: > Full Name : Alexandru Paiu Country : Romania E-mail : > paiualex12 at gmail.com or paiualex12 at yahoo.com Phone Number : > 40733924684 > > Hi . My name is Alexandru Paiu , I'm a student in third year at > University Politehnica of Bucharest , Romania , and I really want to > participate in Gsoc . I don't have any professional experience in > programming because i've never worked in a company . So , I think > that this is my chance , of course in gaining real professional > experience and I want to practice everything I?ve learned in all this > years of study . I was accepted for two summer internships at 2 > companies , Java Programming Internships , but they don't offer any > sallary so will be kinda hard to work in another town (they are in > Bucharest , and I live in Constanta , another town) After all , i'd > really like to be accepted in this Gsoc program to gain professional > experience , work from home and make some money > > After searching on the Organisation list , and Project List i found > Open Bioinformatics Foundation . It is the only organization that > attracted my attention , because I consider that the proposed > Projects from BioJava favors me . I want to apply to Amino acids > physico-chemical properties calculation Project , because it will be > implemented in Java , and especially it uses Threads .*I've never > applied for a Gsoc project before so I need some help . I read that > i must specify in my application , how may I improve the Project and > how should I do it , but I really don't understand what exactly must > I do , so I don't have my own Ideeas for now . I really need some > informations about the Project . ** > > > *I've started studying Java programming for about one and a half > years , alone from Java Books , from internet ( > http://download.oracle.com/javase/tutorial/java/index.html) and from > school . I've implemented a lot of school projects in Java , and Java > Books projects . Three of my Projects that I am proud of are : making > a Lanchat Client-Server using TCP/IP , a Lanchat Peer-Peer using UDP > multicasting and administrating a Database from a Java Applet with > J/Connector and Mysql (you can see the tables and > insert/delete/update selected rows from the selected table , and > making some reports ). I used Threads in the lanchat projects and in > Java game Projects like Brick or Ricochet . > > I really hope that this is the mail for BioJava and that i will get > an answer soon . I haven't found an irc for this departament . > > Thank you very much for your time > > Alexandru Paiu > > _______________________________________________ Biojava-l mailing > list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Wed Mar 23 21:02:59 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 23 Mar 2011 18:02:59 -0700 Subject: [Biojava-l] Google Summer of Code 2011 Message-ID: Hi, As you probably already have heard, the Open Bioinformatics Foundation has been accepted as a mentoring organization for this year's Google Summer of Code. This means we will be able to offer mentoring through BioJava again this year. Accepted students will get a stipend of 5,000$ from Google. Participation is possible from most countries in the world, as long as you are eligible to work in the country in which you'll reside throughout the duration of the program. If you are interested in working on a BioJava related project, now is the time to start preparing and discussing your proposals. Last year we had many applications for the projects proposed by mentors. If you want to distinguish your application I recommend to propose your own project. Don't forget to discuss any proposal with us before you submit them. We will try to provide feedback and match you with a suitable Mentor. Also see http://biojava.org/wiki/Google_Summer_of_Code and Google's FAQs: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs The student application deadline is April 8th. Google will announce which proposals got accepted on April 25th. Andreas From p.v.troshin at dundee.ac.uk Fri Mar 25 09:15:07 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 25 Mar 2011 13:15:07 +0000 Subject: [Biojava-l] coding exercise for amino acids physico-chemical properties calculation GSoC project idea Message-ID: <4D8C955B.5000203@dundee.ac.uk> Dear prospective GSoC students, There have been a considerable interest in the project. To help selecting the best candidate I decided to make a short coding exercise http://biojava.org/wiki/Short_coding_exercise. The exercise is simple and should not take much of your time. That's more you have a complete freedom in devising a solution! Although it?s not required, I think it gives you an opportunity to show your skills so I'd recommend you to have a go at it, besides it should be fun. Happy coding! Peter From p.v.troshin at dundee.ac.uk Fri Mar 25 09:40:43 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 25 Mar 2011 13:40:43 +0000 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' In-Reply-To: References: <4D872DD2.3030903@dundee.ac.uk> <4D88954D.1050304@dundee.ac.uk> Message-ID: <4D8C9B5B.5040201@dundee.ac.uk> Dear Shameera, >>>As i felt it is required some knowledge about BioSQL so is it a good point to start my next step???? I am sorry but I do not see how BioSQL relates to this project. Perhaps more explanation from your side would have helped me to see the connection. >>>I have some problem with checkout and installing BioJava3 Me too, I was unable to checkout the BioJava Live from svn://code.open-bio.org/biojava/biojava-live/trunk due to the following error " A socket operation was attempted to an unreachable host. svn: Can't connect to host 'code.open-bio.org': A socket operation was attempted to an unreachable host. " However, a git read-only mirror http://svn.github.com/biojava/biojava.git seems to work fine, at least I was able to checkout the project. So I'd suggest you to try that. >>>2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the Biojava : started says i want to set the CLASSPATH variable as Have you downloaded biojava-all package from http://biojava.org/download/bj3.0.1/biojava3.0.1-all.tar.gz or was this something else? I cannot really comment on that one, as I have never tried this. I am coping this message to the BioJava list let?s see if someone else is aware of any problems with this package. I hope this helps. Regards, Peter On 25/03/2011 04:30, Shameera Rathnayaka wrote: > Sorry for the delay, > > > I went through some sources and got a basic understanding of the > project, and I referred some links in the cookbook also.As i felt it > is required some knowledge about BioSQL so is it a good point to start > my next step???? > > I have some problem with checkout and installing BioJava3 > > 1. I logged in using openid log in and tried to checkout via Developer > SVN, but it is asking password for three times and then shows "svn: > Network connection closed unexpectedly" then i used a USB modem but it > gives same result. > > 2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in > the Biojava : started says i want to set the CLASSPATH variable as > "export CLASSPATH=/home/thomas/ > biojava-live.jar:/home/thomas/bytecode.jar:/home/thomas/commons-cli.jar:/home/thomas/commons-collections- > 2.1.jar:/home/thomas/commons-dbcp-1.1.jar:/home/thomas/commons-pool-1.1.jar:." > change according to my directory, in UNIX Bourne-type > shells, but i couldn't find those .jar files in my directory. > > Thanks > > > On Tue, Mar 22, 2011 at 6:25 PM, Peter Troshin > > wrote: > > Hi Shameera, > > >>>Yes as you said, equations are not the case. But i needed to > know will the equations be provided or not :) > > No one asks you to devise the equations yourself all you need is > to be able to read the paper and get the equation out of it. > Please note that this is YOUR project and a mentor is there to > help but not to provide everything for you. I think the equations > are not hard to find besides if I just give you the equations you > will not understand them thoroughly. As I said before you need to > read the papers and by the source I meant the papers not the > source code. If this is something that sounds too hard for you > then, you may want to reconsider your decision contributing to > BioJava project. BioJava is not only about Java but about Biology > as well. Ability and will to find the information you need is a > crucial for the success in any project. > In addition to it there are plenty of implementations for the > method that you set to implement. My advice to you would be to > Google for molecular weight, extinction coefficient, instability > index etc. I am sure you will find plenty of information on these > topics. > > To get you started I am giving you the link to the software that > calculates half of the properties you need. There are formulas and > plenty of documentation http://expasy.org/tools/protparam-doc.html > > But please next time come back with a little more specific questions. > > P.S. I do not envisage any GUI interfaces for the library you want > to develop however, as a student you are free to propose this if > you think this would be of benefit to the project. > > Regards, > Peter > > > On 21/03/2011 17 :32, Shameera Rathnayaka > wrote: >> >> >> Yes as you said, equations are not the case. But i needed to know >> will the equations be provided or not :) >> And also i'm willing to see those equations. Could you please >> help me to find out those equations, >> then i could figure it out how to apply multi-threading >> technology to speed up the process. >> >> As i think i need to also develop a GUI except to the methods >> isn't it??? >> >> >> > understanding how these things work> >> >> Yes now im referring to biojava source code . But im in a little >> bit confusion about how to get a start and where the start should >> be? >> >> >> On Mon, Mar 21, 2011 at 4:52 PM, Peter Troshin >> > wrote: >> >> Hi Shameera, >> >> >> >>> As a >> starting point to >> >> the project i would like to know about the >> >> >> >> >>> Equations which are needed to these >> calculations, how >> >> can i get >> >> >> >> >>> those and more about the project. >> >> It is good that you are interested in the project. >> Did you try to find the formula yourself? I will be happy to >> help you if you cannot find them, but really >> this must not be hard. Also I believe that you are much >> better off reading the sources and >> understanding how these things work than just have a formula. >> >> Regards, >> Peter >> >> >> On 19/03/2011 17:22, Shameera Rathnayaka wrote: >> >> > hi, >> >> >> >> > >> >> >> >> > Im Shameera Rathnayaka, a third year Undergraduate >> of >> >> Department of >> >> >> >> > Computer Science and Engineering University of >> Moratuwa ,Sri >> >> Lanka.I >> >> >> >> > am interested in implementing "Amino acids >> physico-chemical >> >> >> >> > properties calculation" project as my GSOC 2011 >> project.I >> >> have >> >> >> >> > previous experience in working with Java, MySql and >> >> Algorithms from >> >> >> >> > my university projects. As my most recent project i >> have >> >> developed a >> >> >> >> > visual navigation >> >> >> >> >> > >> >> (XVisualNavigator) >> >> >> >> > >> >> >> >> > >> plugin for openoffice 3.2.1 using java. I am Currently >> working as an intern >> > at WSO2 >> >> >> which is an open source >> >> >> >> > middle-ware development company. >> >> >> >> > >> >> >> >> > As a starting point to the project i would like to >> know about >> >> the >> >> >> >> > Equations which are needed to these calculations, >> how can i >> >> get those >> >> >> >> > and more about the project. >> >> >> >> >> >> >> >> -- >> Shameera Rathnayaka >> Undergraduate >> Department of Computer Science and Engineering >> University of Moratuwa. >> Sri Lanka. >> T.P. 0719221454 > > > > > -- > Shameera Rathnayaka > Undergraduate > Department of Computer Science and Engineering > University of Moratuwa. > Sri Lanka. > T.P. 0719221454 From p.v.troshin at dundee.ac.uk Fri Mar 25 10:45:34 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 25 Mar 2011 14:45:34 +0000 Subject: [Biojava-l] Fwd: [Biojava-dev] Google Summer of Code 2011 Message-ID: <4D8CAA8E.10109@dundee.ac.uk> Dear Nirmal, Thanks for your interest in GSoC and the project! My name is Peter and I am the mentor for the project you are interested in. Andreas has kindly sent your email to me. >>>I participated in GSoC 2010 for Apache Derby (RDBMS in Java) project and successfully finished the project. I have recently finished a course module on Bio Informatics and have a basic understanding about few algorithms (Nussinov, Profile HMM, Needle-Wucsh etc.), which made me interested in this area of computer science. It is very good to have someone with prior experience in GSoC, Java and Bioinformatics, this must help you to develop a compelling proposal. >>>I would like to contribute to Bio-Java in this summer, would you please direct me to relevant sources which I should start reading on and also possible guidelines would be highly appreciated. The description of the idea, selection criteria and the coding exercise can be found here: http://biojava.org/wiki/Google_Summer_of_Code The guidelines for GSoC students is available from http://open-bio.org/wiki/Google_Summer_of_Code The ideas behind the project are really simple and most of the methods are not new. So you have a benefit of being able to compare the results of you implementation with others. To get a better understanding of what the project involve I?d suggest you to read more about each of the calculations involved. Google is your friend here. There are plenty of other methods which would be good to add to BioJava, to have a better idea of what that might be I'd suggest you read through the Expasy web site tools section http://expasy.org/tools/ . As you studied BioInformatics you should have no difficulties understanding these. I hope I addressed your questions. Regards, Peter -------- Original Message -------- Subject: Fwd: [Biojava-dev] Google Summer of Code 2011 Date: Thu, 24 Mar 2011 20:26:35 -0700 From: Andreas Prlic To: Peter Troshin Hi Peter, do you want to reply to this one? Although he is addressing me his question is about your project.. Thanks, Andreas ---------- Forwarded message ---------- From: Nirmal Fernando Date: Wed, Mar 23, 2011 at 7:44 PM Subject: Re: [Biojava-dev] Google Summer of Code 2011 To: Andreas Prlic Cc: Biojava , biojava-dev On Thu, Mar 24, 2011 at 6:32 AM, Andreas Prlic wrote: > Hi, > > As you probably already have heard, the Open Bioinformatics > Foundation has been accepted as a mentoring organization for this > year's Google Summer of Code. This means we will be able to offer > mentoring through BioJava again this year. Accepted students will get > a stipend of 5,000$ from Google. Participation is possible from most > countries in the world, as long as you are eligible to work in the > country in which you'll reside throughout the duration of the > program. > > If you are interested in working on a BioJava related project, now > is the time to start preparing and discussing your proposals. Last > year we had many applications for the projects proposed by mentors. > If you want to distinguish your application I recommend to propose > your own project. Don't forget to discuss any proposal with us before > you submit them. We will try to provide feedback and match you with > a suitable Mentor. > > Also see http://biojava.org/wiki/Google_Summer_of_Code and Google's > FAQs: > http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs > > > The student application deadline is April 8th. Google will announce > which proposals got accepted on April 25th. > > Andreas _______________________________________________ biojava-dev > mailing list biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > Hi Andreas, I'm an undergraduate at Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka, and I'm hoping to have an exciting summer with GSoC 2011. I participated in GSoC 2010 for Apache Derby (RDBMS in Java) project and successfully finished the project. This is a sample of the work (final output) which I've done for Derby last summer (http://nirmalfdo.blogspot.com/p/my-work-at-gsoc-2010.html). You can find my profile and recommendations at LinkedIn (http://www.linkedin.com/profile/view?id=54105394&trk=tab_pro). I have recently finished a course module on Bio Informatics and have a basic understanding about few algorithms (Nussinov, Profile HMM, Needle-Wucsh etc.), which made me interested in this area of computer science. While looking at your ideas page "Amino acids physico-chemical properties calculation" interested me most, since it involves implementation of algorithms. The sounding Java knowledge and the experiences of concurrent programming makes me more comfortable. I would like to contribute to Bio-Java in this summer, would you please direct me to relevant sources which I should start reading on and also possible guidelines would be highly appreciated. Thanks. -- Best Regards, Nirmal C.S.Nirmal J. Fernando Department of Computer Science & Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka. Blog: http://nirmalfdo.blogspot.com/ -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Fri Mar 25 11:41:34 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 25 Mar 2011 08:41:34 -0700 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' In-Reply-To: <4D8C9B5B.5040201@dundee.ac.uk> References: <4D872DD2.3030903@dundee.ac.uk> <4D88954D.1050304@dundee.ac.uk> <4D8C9B5B.5040201@dundee.ac.uk> Message-ID: I removed the checkout instructions from the broken anonymus SVN repository. The recommended way to check out the code is now via github, either using SVN or GIT... Andreas On Fri, Mar 25, 2011 at 6:40 AM, Peter Troshin wrote: > Dear Shameera, > >>>>As i felt it is required some knowledge about BioSQL so is it a good >>>> point to start my next step???? > > I am sorry but I do not see how BioSQL relates to this project. Perhaps more > explanation from your side would have helped me to see the connection. > >>>>I have some problem with checkout and installing BioJava3 > > Me too, I was unable to checkout the BioJava Live from > svn://code.open-bio.org/biojava/biojava-live/trunk due to the following > error > > " A socket operation was attempted to an unreachable host. > svn: Can't connect to host 'code.open-bio.org': A socket operation was > attempted to an unreachable host. " > > However, a git read-only mirror http://svn.github.com/biojava/biojava.git > seems to work fine, at least I was able to checkout the project. > So I'd suggest you to try that. > >>>>2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the >>>> Biojava : started says i want to set the CLASSPATH variable as > > Have you downloaded biojava-all package from > http://biojava.org/download/bj3.0.1/biojava3.0.1-all.tar.gz or was this > something else? I cannot really comment on that one, as I have never tried > this. I am coping this message to the BioJava list let?s see if someone else > is aware of any problems with this package. > > > I hope this helps. > > Regards, > Peter > > > On 25/03/2011 04:30, Shameera Rathnayaka wrote: >> >> Sorry for the delay, >> >> >> I went through some sources and got a basic understanding of the project, >> and I referred some links in the cookbook also.As i felt it is required some >> knowledge about BioSQL so is it a good point to start my next step???? >> >> I have some problem with checkout and installing BioJava3 >> >> 1. I logged in using openid log in and tried to checkout via Developer >> SVN, but it is asking password for three times and then shows "svn: Network >> connection closed unexpectedly" then i used a USB modem but it gives same >> result. >> >> 2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the >> Biojava : started says i want to set the CLASSPATH variable as "export >> CLASSPATH=/home/thomas/ >> >> biojava-live.jar:/home/thomas/bytecode.jar:/home/thomas/commons-cli.jar:/home/thomas/commons-collections- >> 2.1.jar:/home/thomas/commons-dbcp-1.1.jar:/home/thomas/commons-pool-1.1.jar:." >> change according to my directory, in UNIX Bourne-type shells, >> but i couldn't find those .jar files in my directory. >> >> Thanks >> >> >> On Tue, Mar 22, 2011 at 6:25 PM, Peter Troshin > > wrote: >> >> ? ?Hi Shameera, >> >> ? ?>>>Yes as you said, equations are not the case. But i needed to >> ? ?know will the equations be provided or not :) >> >> ? ?No one asks you to devise the equations yourself all you need is >> ? ?to be able to read the paper and get the equation out of it. >> ? ?Please note that this is YOUR project and a mentor is there to >> ? ?help but not to provide everything for you. I think the equations >> ? ?are not hard to find besides if I just give you the equations you >> ? ?will not understand them thoroughly. As I said before you need to >> ? ?read the papers and by the source I meant the papers not the >> ? ?source code. If this is something that sounds too hard for you >> ? ?then, you may want to reconsider your decision contributing to >> ? ?BioJava project. BioJava is not only about Java but about Biology >> ? ?as well. Ability and will to find the information you need is a >> ? ?crucial for the success in any project. >> ? ?In addition to it there are plenty of implementations for the >> ? ?method that you set to implement. My advice to you would be to >> ? ?Google for molecular weight, extinction coefficient, instability >> ? ?index etc. I am sure you will find plenty of information on these >> ? ?topics. >> >> ? ?To get you started I am giving you the link to the software that >> ? ?calculates half of the properties you need. There are formulas and >> ? ?plenty of documentation http://expasy.org/tools/protparam-doc.html >> >> ? ?But please next time come back with a little more specific questions. >> >> ? ?P.S. I do not envisage any GUI interfaces for the library you want >> ? ?to develop however, as a student you are free to propose this if >> ? ?you think this would be of benefit to the project. >> >> ? ?Regards, >> ? ?Peter >> >> >> ? ?On 21/03/2011 17 :32, Shameera Rathnayaka >> ? ?wrote: >>> >>> >>> ? ?Yes as you said, equations are not the case. But i needed to know >>> ? ?will the equations be provided or not :) >>> ? ?And also i'm willing to see those equations. Could you please >>> ? ?help me to find out those equations, >>> ? ?then i could figure it out how to apply multi-threading >>> ? ?technology to speed up the process. >>> >>> ? ?As i think i need to also develop a GUI except to the methods >>> ? ?isn't it??? >>> >>> >>> ? ?>> ? ?understanding how these things work> >>> >>> ? ?Yes now im referring to biojava source code . But im in a little >>> ? ?bit confusion about how to get a start and where the start should >>> ? ?be? >>> >>> >>> ? ?On Mon, Mar 21, 2011 at 4:52 PM, Peter Troshin >>> ? ?> wrote: >>> >>> ? ? ? ?Hi Shameera, >>> >>> >>> ? ? ? ?>>> As a >>> ? ? ? ?starting point to >>> >>> ? ? ? ?the project i would like to know about the >>> >>> >>> >>> ? ? ? ?>>> Equations which are needed to these >>> ? ? ? ?calculations, how >>> >>> ? ? ? ?can i get >>> >>> >>> >>> ? ? ? ?>>> those and more about the project. >>> >>> ? ? ? ?It is good that you are interested in the project. >>> ? ? ? ?Did you try to find the formula yourself? I will be happy to >>> ? ? ? ?help you if you cannot find them, but really >>> ? ? ? ?this must not be hard. Also I believe that you are much >>> ? ? ? ?better off reading the sources and >>> ? ? ? ?understanding how these things work than just have a formula. >>> >>> ? ? ? ?Regards, >>> ? ? ? ?Peter >>> >>> >>> ? ? ? ?On 19/03/2011 17:22, Shameera Rathnayaka wrote: >>> >>> ? ? ? ?> hi, >>> >>> >>> >>> ? ? ? ?> >>> >>> >>> >>> ? ? ? ?> Im Shameera Rathnayaka, a third year Undergraduate >>> ? ? ? ?of >>> >>> ? ? ? ?Department of >>> >>> >>> >>> ? ? ? ?> Computer Science and Engineering University of >>> ? ? ? ?Moratuwa ,Sri >>> >>> ? ? ? ?Lanka.I >>> >>> >>> >>> ? ? ? ?> am interested in implementing "Amino acids >>> ? ? ? ?physico-chemical >>> >>> >>> >>> ? ? ? ?> properties calculation" project as my GSOC 2011 >>> ? ? ? ?project.I >>> >>> ? ? ? ?have >>> >>> >>> >>> ? ? ? ?> previous experience in working with Java, MySql and >>> >>> ? ? ? ?Algorithms from >>> >>> >>> >>> ? ? ? ?> my university projects. As my most recent project i >>> ? ? ? ?have >>> >>> ? ? ? ?developed a >>> >>> >>> >>> ? ? ? ?> visual navigation >>> >>> >>> >>> >>> ? ? ? ?> >>> >>> >>> ?(XVisualNavigator) >>> >>> >>> >>> ? ? ? ?> >>> >>> >>> >>> ? ? ? ?> >>> ? ? ? ?plugin for openoffice 3.2.1 using java. I am Currently >>> ? ? ? ?working as an intern >>> ? ? ? ?> at WSO2 >>> >>> ? ? ? ? >>> ? ? ? ?which is an open source >>> >>> >>> >>> ? ? ? ?> middle-ware development company. >>> >>> >>> >>> ? ? ? ?> >>> >>> >>> >>> ? ? ? ?> As a starting point to the project i would like to >>> ? ? ? ?know about >>> >>> ? ? ? ?the >>> >>> >>> >>> ? ? ? ?> Equations which are needed to these calculations, >>> ? ? ? ?how can i >>> >>> ? ? ? ?get those >>> >>> >>> >>> ? ? ? ?> and more about the project. >>> >>> >>> >>> >>> >>> >>> >>> ? ?-- ? ? Shameera Rathnayaka >>> ? ?Undergraduate >>> ? ?Department of Computer Science and Engineering >>> ? ?University of Moratuwa. >>> ? ?Sri Lanka. >>> ? ?T.P. 0719221454 >> >> >> >> >> -- >> Shameera Rathnayaka >> Undergraduate >> Department of Computer Science and Engineering >> University of Moratuwa. >> Sri Lanka. >> T.P. 0719221454 > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From paiualex12 at gmail.com Fri Mar 25 14:06:37 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Fri, 25 Mar 2011 20:06:37 +0200 Subject: [Biojava-l] Gsoc Amino acids physico-chemical properties calculation Message-ID: Hi Peter It's me again Paiu Alexandru from Romania . I've started to study for 2 days about amino acids , but i didn't find anything in romanian about the properties I have to implement for this project . I found only about amino acids in general . I went to the university library to find books about amino acids but i wasn't so lucky . I'm really stucked in getting some information about that properties . I found on wikipedia formulas and some informations about every method , but I really can't understand exactly what is every formula trying to do . I'd need some concrete examples for every method I have to implement . There are to many abbreviations that I don't understand . I tried to get some help from some friends that are studying pharmacy , but they couldn't help me . I studied that tool http://expasy.org/tools/, but i haven't figured out yet exactly how those methods work (which are the inputs for each method and how they obtain those outputs ) I've started to work the goals on Selection criteria . I've finished the first 2 , now i'm working on using threads . I'll use threads for taking multiple lines from the input file . Each Thread will take a line at a time , will take the 2 strings separated by '\t' and will be applied StringOverlapFinder over the 2 Strings . In StringOverlapFinder i look for the last 5 characters of String 1 in String 2 . If it isn't found i print String 1 . If is found an overlap , then there are more cases to take care of . Some examples : Ex 1 : x = "abcdefghijklm" and y = "hijklmnopqrst" (your example) then the output is =abcdefg Ex 2 : x="asdcsadvsasevenfiveseven" and y="sevenfivesevenasdasdas" then the output is = asdcsadvsa Ex 3: x="ascdaeseven" and y="bseven" and the output is x So after this 3 examples i've found the "right" implementation . And for this I'll take Ex Nr 2 , because I think it's the most complex . I find that the the last 5 chars (seven) is found on index 0 . I take the substring from y starting with index 0 and ending with the index of the first overlap that is 0 too . So having a null substring the program should stop here and the output would be asdcsadvsasevenfive . But the real output shoud be asdcsadvsa . So , after finding that this is a posible output , i should look for a second overlap in y . It is found at index 9 ( the second seven) . I take the substring again from y that starts with index 0 and end with index 8 ( 9-1 ) and that is sevenfive . The lenght of this substring is 9 . Now i take another substring but this time from x . It starts from index (x.length()-5-9) and ends with index ( x.length()-5) . In this case this substrings are equall and the program will write the correct output that is asdcsadvsa . But before that , it should search for another possible overlap , but in this case there isn't one . In Ex Nr 3 it is found the "seven" string in y . It's taken the substring from y , that is "b" , and is compared with the substring from x that is "e" . They aren't equal so there is not an overlap . The program will look for another string "seven" in y , but there isn't one so the output will be x , because there wasn't found any overlap . I hope you understand my Ideeas . I'll send to you the jar when it's finished Is there a deadline for this selection criteria ? That's it for today Best Regards , Alex From paiualex12 at gmail.com Fri Mar 25 14:23:41 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Fri, 25 Mar 2011 20:23:41 +0200 Subject: [Biojava-l] Question for Peter Message-ID: Hi again ! Is there a way to talk in real time ? I noticed that there isn't listed any Irc channel on Biojava . I can find you somewhere on an Irc channel ? In Romania we have GMT : +2 , like in Cairo (Egipt) . From where are you , what's you GMT and on what channel can i find you That's all Thanks Alex From sunilthomas13 at gmail.com Sat Mar 26 10:17:50 2011 From: sunilthomas13 at gmail.com (Sunil Thomas) Date: Sat, 26 Mar 2011 19:47:50 +0530 Subject: [Biojava-l] GSoC 2011 Message-ID: Hello. I'm Sunil Thomas. I'm interested in the project proposal ' *Amino acids physico-chemical properties calculation'*. I'm a proficient Java programmer, and Java has been my main coding language for about six years now. I also happen to have experience in Multi threaded Java applications. I have done some reasonably complex algorithms such as efficient QR factorization of a matrix( although this was using C++). Mainly, i have a passion for creating the fastest running and most efficient code for any algorithm. Therefore i feel Java implementation of standard algorithms(as described in the ideas page) is right up my alley. I have already started working on the coding exercise given in the wiki. I would like to know if there is a deadline for the coding exercise(like, does it count after the official application period opens up?), as i have some exams in my college right now till next week. Hoping to hear more from the mentors Regards, Sunil Thomas Dept. of Electrical/Electronics Engineering BITS Pilani, Goa Campus INDIA From andreas at sdsc.edu Sat Mar 26 12:43:48 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 26 Mar 2011 09:43:48 -0700 Subject: [Biojava-l] GSoC 2011 In-Reply-To: References: Message-ID: Hi Sunil, > I have already started working on the coding exercise given in the wiki. > I would like to know if there is a deadline for the coding exercise(like, > does it count after the official application period opens up?), as i have The main criteria to rank student applications will be the proposals. As such I would put most of my energy into that. Last year we did coding tests with some of the top ranking proposals as an additional criteria for the ranking that will get submitted to Google, but that was only after all the proposals had been submitted. At this point I would really emphasize the need for a good project plan. GSoC applications are highly competitive and only a very well thought through proposal will have a chance. Andreas From uchathuranga at gmail.com Sat Mar 26 23:56:28 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Sun, 27 Mar 2011 09:26:28 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D872F38.20305@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> <4D872F38.20305@dundee.ac.uk> Message-ID: hi Peter, Thanks for the guide peter.As you mentioned I have to pay more attention to the project plan. I have started working on the coding exercise that you have posed in http://biojava.org/wiki/Short_coding_exercise.Is there a deadline to this or Do we have to submit it together with the proposal? I have some concerns regarding the BioJava 1.8 and BioJava 3 so I am planning to start a new thread in the mailing list.Hope you will help me there too. Thanks Regards udana From shameerainfo at gmail.com Mon Mar 28 05:56:32 2011 From: shameerainfo at gmail.com (Shameera Rathnayaka) Date: Mon, 28 Mar 2011 15:26:32 +0530 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' In-Reply-To: References: <4D872DD2.3030903@dundee.ac.uk> <4D88954D.1050304@dundee.ac.uk> <4D8C9B5B.5040201@dundee.ac.uk> Message-ID: Hi Peter and Andreas, I was able to check out the source using GIT. I have already implemented the short coding exercise for 'Amino acids physico-chemical properties calculation project' but as i noticed there is not any link given to submit the assignment. I need to know that where should i submit it , or do i have to submit it with my project proposal? Thanks in advance! On Fri, Mar 25, 2011 at 9:11 PM, Andreas Prlic wrote: > I removed the checkout instructions from the broken anonymus SVN > repository. The recommended way to check out the code is now via > github, either using SVN or GIT... > > Andreas > > On Fri, Mar 25, 2011 at 6:40 AM, Peter Troshin > wrote: > > Dear Shameera, > > > >>>>As i felt it is required some knowledge about BioSQL so is it a good > >>>> point to start my next step???? > > > > I am sorry but I do not see how BioSQL relates to this project. Perhaps > more > > explanation from your side would have helped me to see the connection. > > > >>>>I have some problem with checkout and installing BioJava3 > > > > Me too, I was unable to checkout the BioJava Live from > > svn://code.open-bio.org/biojava/biojava-live/trunk due to the following > > error > > > > " A socket operation was attempted to an unreachable host. > > svn: Can't connect to host 'code.open-bio.org': A socket operation was > > attempted to an unreachable host. " > > > > However, a git read-only mirror > http://svn.github.com/biojava/biojava.git > > seems to work fine, at least I was able to checkout the project. > > So I'd suggest you to try that. > > > >>>>2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the > >>>> Biojava : started says i want to set the CLASSPATH variable as > > > > Have you downloaded biojava-all package from > > http://biojava.org/download/bj3.0.1/biojava3.0.1-all.tar.gz or was this > > something else? I cannot really comment on that one, as I have never > tried > > this. I am coping this message to the BioJava list let?s see if someone > else > > is aware of any problems with this package. > > > > > > I hope this helps. > > > > Regards, > > Peter > > > > > > On 25/03/2011 04:30, Shameera Rathnayaka wrote: > >> > >> Sorry for the delay, > >> > >> > >> I went through some sources and got a basic understanding of the > project, > >> and I referred some links in the cookbook also.As i felt it is required > some > >> knowledge about BioSQL so is it a good point to start my next step???? > >> > >> I have some problem with checkout and installing BioJava3 > >> > >> 1. I logged in using openid log in and tried to checkout via Developer > >> SVN, but it is asking password for three times and then shows "svn: > Network > >> connection closed unexpectedly" then i used a USB modem but it gives > same > >> result. > >> > >> 2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the > >> Biojava : started says i want to set the CLASSPATH variable as "export > >> CLASSPATH=/home/thomas/ > >> > >> > biojava-live.jar:/home/thomas/bytecode.jar:/home/thomas/commons-cli.jar:/home/thomas/commons-collections- > >> > 2.1.jar:/home/thomas/commons-dbcp-1.1.jar:/home/thomas/commons-pool-1.1.jar:." > >> change according to my directory, in UNIX Bourne-type > shells, > >> but i couldn't find those .jar files in my directory. > >> > >> Thanks > >> > >> > >> On Tue, Mar 22, 2011 at 6:25 PM, Peter Troshin < > p.v.troshin at dundee.ac.uk > >> > wrote: > >> > >> Hi Shameera, > >> > >> >>>Yes as you said, equations are not the case. But i needed to > >> know will the equations be provided or not :) > >> > >> No one asks you to devise the equations yourself all you need is > >> to be able to read the paper and get the equation out of it. > >> Please note that this is YOUR project and a mentor is there to > >> help but not to provide everything for you. I think the equations > >> are not hard to find besides if I just give you the equations you > >> will not understand them thoroughly. As I said before you need to > >> read the papers and by the source I meant the papers not the > >> source code. If this is something that sounds too hard for you > >> then, you may want to reconsider your decision contributing to > >> BioJava project. BioJava is not only about Java but about Biology > >> as well. Ability and will to find the information you need is a > >> crucial for the success in any project. > >> In addition to it there are plenty of implementations for the > >> method that you set to implement. My advice to you would be to > >> Google for molecular weight, extinction coefficient, instability > >> index etc. I am sure you will find plenty of information on these > >> topics. > >> > >> To get you started I am giving you the link to the software that > >> calculates half of the properties you need. There are formulas and > >> plenty of documentation http://expasy.org/tools/protparam-doc.html > >> > >> But please next time come back with a little more specific questions. > >> > >> P.S. I do not envisage any GUI interfaces for the library you want > >> to develop however, as a student you are free to propose this if > >> you think this would be of benefit to the project. > >> > >> Regards, > >> Peter > >> > >> > >> On 21/03/2011 17 :32, Shameera Rathnayaka > >> wrote: > >>> > >>> > >>> Yes as you said, equations are not the case. But i needed to know > >>> will the equations be provided or not :) > >>> And also i'm willing to see those equations. Could you please > >>> help me to find out those equations, > >>> then i could figure it out how to apply multi-threading > >>> technology to speed up the process. > >>> > >>> As i think i need to also develop a GUI except to the methods > >>> isn't it??? > >>> > >>> > >>> >>> understanding how these things work> > >>> > >>> Yes now im referring to biojava source code . But im in a little > >>> bit confusion about how to get a start and where the start should > >>> be? > >>> > >>> > >>> On Mon, Mar 21, 2011 at 4:52 PM, Peter Troshin > >>> > wrote: > >>> > >>> Hi Shameera, > >>> > >>> > >>> >>> As a > >>> starting point to > >>> > >>> the project i would like to know about the > >>> > >>> > >>> > >>> >>> Equations which are needed to these > >>> calculations, how > >>> > >>> can i get > >>> > >>> > >>> > >>> >>> those and more about the project. > >>> > >>> It is good that you are interested in the project. > >>> Did you try to find the formula yourself? I will be happy to > >>> help you if you cannot find them, but really > >>> this must not be hard. Also I believe that you are much > >>> better off reading the sources and > >>> understanding how these things work than just have a formula. > >>> > >>> Regards, > >>> Peter > >>> > >>> > >>> On 19/03/2011 17:22, Shameera Rathnayaka wrote: > >>> > >>> > hi, > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > Im Shameera Rathnayaka, a third year Undergraduate > >>> of > >>> > >>> Department of > >>> > >>> > >>> > >>> > Computer Science and Engineering University of > >>> Moratuwa ,Sri > >>> > >>> Lanka.I > >>> > >>> > >>> > >>> > am interested in implementing "Amino acids > >>> physico-chemical > >>> > >>> > >>> > >>> > properties calculation" project as my GSOC 2011 > >>> project.I > >>> > >>> have > >>> > >>> > >>> > >>> > previous experience in working with Java, MySql and > >>> > >>> Algorithms from > >>> > >>> > >>> > >>> > my university projects. As my most recent project i > >>> have > >>> > >>> developed a > >>> > >>> > >>> > >>> > visual navigation > >>> > >>> > >>> > >>> > >>> > > >>> > >>> > >>> (XVisualNavigator< > http://extensions.services.openoffice.org/en/project/XVN>) > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > > >>> plugin for openoffice 3.2.1 using java. I am Currently > >>> working as an intern > >>> > at WSO2 > >>> > >>> > >>> which is an open source > >>> > >>> > >>> > >>> > middle-ware development company. > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > As a starting point to the project i would like to > >>> know about > >>> > >>> the > >>> > >>> > >>> > >>> > Equations which are needed to these calculations, > >>> how can i > >>> > >>> get those > >>> > >>> > >>> > >>> > and more about the project. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- Shameera Rathnayaka > >>> Undergraduate > >>> Department of Computer Science and Engineering > >>> University of Moratuwa. > >>> Sri Lanka. > >>> T.P. 0719221454 > >> > >> > >> > >> > >> -- > >> Shameera Rathnayaka > >> Undergraduate > >> Department of Computer Science and Engineering > >> University of Moratuwa. > >> Sri Lanka. > >> T.P. 0719221454 > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > -- Shameera Rathnayaka Undergraduate Department of Computer Science and Engineering University of Moratuwa. Sri Lanka. T.P. 0719221454 From p.v.troshin at dundee.ac.uk Mon Mar 28 05:47:43 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 10:47:43 +0100 Subject: [Biojava-l] Gsoc Amino acids physico-chemical properties calculation In-Reply-To: References: Message-ID: <4D90593F.8080008@dundee.ac.uk> Dear Alex, The BioJava projects all have some elements of Chemistry and/or Molecular Biology in them. So some understanding of these areas is necessary. If you do not possess them, then it is going to be hard to produce a competitive project plan. So you may be better off applying to pure Java projects. Completing the coding exercise will not be enough to get through! Kind regards, Peter On 25/03/2011 18:06, Alexandru Paiu wrote: > Hi Peter > > It's me again Paiu Alexandru from Romania . > > I've started to study for 2 days about amino acids , but i didn't find > anything in romanian about the properties I have to implement for this > project . I found only about amino acids in general . I went to the > university library to find books about amino acids but i wasn't so lucky . > I'm really stucked in getting some information about that properties . > > I found on wikipedia formulas and some informations about every method , but > I really can't understand exactly what is every formula trying to do . I'd > need some concrete examples for every method I have to implement . There are > to many abbreviations that I don't understand . I tried to get some help > from some friends that are studying pharmacy , but they couldn't help me . > I studied that tool > http://expasy.org/tools/, > but i haven't figured out yet exactly how those methods work (which > are > the inputs for each method and how they obtain those outputs ) > > I've started to work the goals on Selection criteria . I've finished the > first 2 , now i'm working on using threads . > I'll use threads for taking multiple lines from the input file . Each Thread > will take a line at a time , will take the 2 strings separated by '\t' and > will be applied StringOverlapFinder over the 2 Strings . In > StringOverlapFinder i look for the last 5 characters of String 1 in String 2 > . If it isn't found i print String 1 . If is found an overlap , then there > are more cases to take care of . Some examples : > Ex 1 : x = "abcdefghijklm" and y = "hijklmnopqrst" (your example) then the > output is =abcdefg > Ex 2 : x="asdcsadvsasevenfiveseven" and y="sevenfivesevenasdasdas" then the > output is = asdcsadvsa > Ex 3: x="ascdaeseven" and y="bseven" and the output is x > > So after this 3 examples i've found the "right" implementation . And for > this I'll take Ex Nr 2 , because I think it's the most complex . > I find that the the last 5 chars (seven) is found on index 0 . I take the > substring from y starting with index 0 and ending with the index of the > first overlap that is 0 too . So having a null substring the program should > stop here and the output would be asdcsadvsasevenfive . But the real output > shoud be asdcsadvsa . > So , after finding that this is a posible output , i should look for a > second overlap in y . It is found at index 9 ( the second seven) . I take > the substring again from y that starts with index 0 and end with index 8 ( > 9-1 ) and that is sevenfive . The lenght of this substring is 9 . Now i take > another substring but this time from x . It starts from index > (x.length()-5-9) and ends with index ( x.length()-5) . In this case this > substrings are equall and the program will write the correct output that is > asdcsadvsa . But before that , it should search for another possible overlap > , but in this case there isn't one . > In Ex Nr 3 it is found the "seven" string in y . It's taken the substring > from y , that is "b" , and is compared with the substring from x that is "e" > . They aren't equal so there is not an overlap . The program will look for > another string "seven" in y , but there isn't one so the output will be x , > because there wasn't found any overlap . > I hope you understand my Ideeas . I'll send to you the jar when it's > finished > > Is there a deadline for this selection criteria ? > That's it for today > > Best Regards , > Alex > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Mon Mar 28 06:01:35 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 11:01:35 +0100 Subject: [Biojava-l] GSoC 2011 In-Reply-To: References: Message-ID: <4D905C7F.6090908@dundee.ac.uk> Hi Sunil, Thanks for the interest. I think your experience with Java and passion for the algorithm development should help you on this project. As for the coding exercise I can only agree with Andreas, the project plan is the first thing you should worry about. You are going to need to complete a coding exercise at later stages of selection process. Good luck with your application. Peter On 26/03/2011 14:17, Sunil Thomas wrote: > Hello. > > I'm Sunil Thomas. I'm interested in the project proposal ' *Amino acids > physico-chemical properties calculation'*. I'm a proficient Java programmer, > and Java has been my main coding language for about six years now. I also > happen to have experience in Multi threaded Java applications. > I have done some reasonably complex algorithms such as efficient QR > factorization of a matrix( although this was using C++). > Mainly, i have a passion for creating the fastest running and most efficient > code for any algorithm. > Therefore i feel Java implementation of standard algorithms(as described in > the ideas page) is right up my alley. > I have already started working on the coding exercise given in the wiki. > I would like to know if there is a deadline for the coding exercise(like, > does it count after the official application period opens up?), as i have > some exams in my college right now till next week. > Hoping to hear more from the mentors > > Regards, > > Sunil Thomas > Dept. of Electrical/Electronics Engineering > BITS Pilani, Goa Campus > INDIA > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Mon Mar 28 06:21:27 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 11:21:27 +0100 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> <4D872F38.20305@dundee.ac.uk> Message-ID: <4D906127.7020003@dundee.ac.uk> Hi Udana, You can submit your coding exercise shortly after the proposal. So, it would be very good to have your coding exercise no later than the 10 of April. Mentors do not have much time to evaluate proposals either, so the exercise must be available quickly after the proposal. We may be able to accommodate later submissions, but I would not guarantee that. >>>I have some concerns regarding the BioJava 1.8 and BioJava 3 so I am >>>planning to start a new thread in the mailing list.Hope you will help me >>>there too. Good idea. Taking into account that people on the list have other jobs to do it is best to give them some time to look at your questions, and not expect immediate answer. However, you may get an immediate answer too. That's the nature of the list; you just need to be lucky (:-)) Regards, Peter On 27/03/2011 04:56, udana chathuranga wrote: > hi Peter, > > Thanks for the guide peter.As you mentioned I have to pay more > attention to the project plan. > > I have started working on the coding exercise that you have posed in > http://biojava.org/wiki/Short_coding_exercise.Is there a deadline to > this or Do we have to submit it together with the proposal? > > I have some concerns regarding the BioJava 1.8 and BioJava 3 so I am > planning to start a new thread in the mailing list.Hope you will help > me there too. > > > Thanks > Regards > udana From p.v.troshin at dundee.ac.uk Mon Mar 28 06:37:39 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 11:37:39 +0100 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' In-Reply-To: References: <4D872DD2.3030903@dundee.ac.uk> <4D88954D.1050304@dundee.ac.uk> <4D8C9B5B.5040201@dundee.ac.uk> Message-ID: <4D9064F3.5090303@dundee.ac.uk> Hi Shameera, You need to email the Jar file to me or to gsocexercise at gmail.com The exercise page is now updated too - http://biojava.org/wiki/Short_coding_exercise#Submission Thanks for the question! Regards, Peter On 28/03/2011 10:56, Shameera Rathnayaka wrote: > Hi Peter and Andreas, > > I was able to check out the source using GIT. > > I have already implemented the short coding exercise for 'Amino acids > physico-chemical properties calculation project' but as i noticed > there is not any link given to submit the assignment. I need to know > that where should i submit it , or do i have to submit it with my > project proposal? > > Thanks in advance! > > > > > > > > > > > On Fri, Mar 25, 2011 at 9:11 PM, Andreas Prlic > wrote: > > I removed the checkout instructions from the broken anonymus SVN > repository. The recommended way to check out the code is now via > github, either using SVN or GIT... > > Andreas > > On Fri, Mar 25, 2011 at 6:40 AM, Peter Troshin > > wrote: > > Dear Shameera, > > > >>>>As i felt it is required some knowledge about BioSQL so is it > a good > >>>> point to start my next step???? > > > > I am sorry but I do not see how BioSQL relates to this project. > Perhaps more > > explanation from your side would have helped me to see the > connection. > > > >>>>I have some problem with checkout and installing BioJava3 > > > > Me too, I was unable to checkout the BioJava Live from > > svn://code.open-bio.org/biojava/biojava-live/trunk > due to the > following > > error > > > > " A socket operation was attempted to an unreachable host. > > svn: Can't connect to host 'code.open-bio.org > ': A socket operation was > > attempted to an unreachable host. " > > > > However, a git read-only mirror > http://svn.github.com/biojava/biojava.git > > seems to work fine, at least I was able to checkout the project. > > So I'd suggest you to try that. > > > >>>>2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, > as in the > >>>> Biojava : started says i want to set the CLASSPATH variable as > > > > Have you downloaded biojava-all package from > > http://biojava.org/download/bj3.0.1/biojava3.0.1-all.tar.gz or > was this > > something else? I cannot really comment on that one, as I have > never tried > > this. I am coping this message to the BioJava list let?s see if > someone else > > is aware of any problems with this package. > > > > > > I hope this helps. > > > > Regards, > > Peter > > > > > > On 25/03/2011 04 :30, Shameera > Rathnayaka wrote: > >> > >> Sorry for the delay, > >> > >> > >> I went through some sources and got a basic understanding of > the project, > >> and I referred some links in the cookbook also.As i felt it is > required some > >> knowledge about BioSQL so is it a good point to start my next > step???? > >> > >> I have some problem with checkout and installing BioJava3 > >> > >> 1. I logged in using openid log in and tried to checkout via > Developer > >> SVN, but it is asking password for three times and then shows > "svn: Network > >> connection closed unexpectedly" then i used a USB modem but it > gives same > >> result. > >> > >> 2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, > as in the > >> Biojava : started says i want to set the CLASSPATH variable as > "export > >> CLASSPATH=/home/thomas/ > >> > >> > biojava-live.jar:/home/thomas/bytecode.jar:/home/thomas/commons-cli.jar:/home/thomas/commons-collections- > >> > 2.1.jar:/home/thomas/commons-dbcp-1.1.jar:/home/thomas/commons-pool-1.1.jar:." > >> change according to my directory, in UNIX > Bourne-type shells, > >> but i couldn't find those .jar files in my directory. > >> > >> Thanks > >> > >> > >> On Tue, Mar 22, 2011 at 6:25 PM, Peter Troshin > > >> >> wrote: > >> > >> Hi Shameera, > >> > >> >>>Yes as you said, equations are not the case. But i needed to > >> know will the equations be provided or not :) > >> > >> No one asks you to devise the equations yourself all you need is > >> to be able to read the paper and get the equation out of it. > >> Please note that this is YOUR project and a mentor is there to > >> help but not to provide everything for you. I think the > equations > >> are not hard to find besides if I just give you the > equations you > >> will not understand them thoroughly. As I said before you > need to > >> read the papers and by the source I meant the papers not the > >> source code. If this is something that sounds too hard for you > >> then, you may want to reconsider your decision contributing to > >> BioJava project. BioJava is not only about Java but about > Biology > >> as well. Ability and will to find the information you need is a > >> crucial for the success in any project. > >> In addition to it there are plenty of implementations for the > >> method that you set to implement. My advice to you would be to > >> Google for molecular weight, extinction coefficient, instability > >> index etc. I am sure you will find plenty of information on > these > >> topics. > >> > >> To get you started I am giving you the link to the software that > >> calculates half of the properties you need. There are > formulas and > >> plenty of documentation > http://expasy.org/tools/protparam-doc.html > >> > >> But please next time come back with a little more specific > questions. > >> > >> P.S. I do not envisage any GUI interfaces for the library > you want > >> to develop however, as a student you are free to propose this if > >> you think this would be of benefit to the project. > >> > >> Regards, > >> Peter > >> > >> > >> On 21/03/2011 17 > :32, Shameera Rathnayaka > >> wrote: > >>> > >>> > >>> Yes as you said, equations are not the case. But i needed > to know > >>> will the equations be provided or not :) > >>> And also i'm willing to see those equations. Could you please > >>> help me to find out those equations, > >>> then i could figure it out how to apply multi-threading > >>> technology to speed up the process. > >>> > >>> As i think i need to also develop a GUI except to the methods > >>> isn't it??? > >>> > >>> > >>> sources and > >>> understanding how these things work> > >>> > >>> Yes now im referring to biojava source code . But im in a > little > >>> bit confusion about how to get a start and where the start > should > >>> be? > >>> > >>> > >>> On Mon, Mar 21, 2011 at 4:52 PM, Peter Troshin > >>> > >> wrote: > >>> > >>> Hi Shameera, > >>> > >>> > >>> >>> As a > >>> starting point to > >>> > >>> the project i would like to know about the > >>> > >>> > >>> > >>> >>> Equations which are needed to these > >>> calculations, how > >>> > >>> can i get > >>> > >>> > >>> > >>> >>> those and more about the project. > >>> > >>> It is good that you are interested in the project. > >>> Did you try to find the formula yourself? I will be > happy to > >>> help you if you cannot find them, but really > >>> this must not be hard. Also I believe that you are much > >>> better off reading the sources and > >>> understanding how these things work than just have a > formula. > >>> > >>> Regards, > >>> Peter > >>> > >>> > >>> On 19/03/2011 17:22, Shameera Rathnayaka wrote: > >>> > >>> > hi, > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > Im Shameera Rathnayaka, a third year Undergraduate > >>> of > >>> > >>> Department of > >>> > >>> > >>> > >>> > Computer Science and Engineering University of > >>> Moratuwa ,Sri > >>> > >>> Lanka.I > >>> > >>> > >>> > >>> > am interested in implementing "Amino acids > >>> physico-chemical > >>> > >>> > >>> > >>> > properties calculation" project as my GSOC 2011 > >>> project.I > >>> > >>> have > >>> > >>> > >>> > >>> > previous experience in working with Java, MySql and > >>> > >>> Algorithms from > >>> > >>> > >>> > >>> > my university projects. As my most recent project i > >>> have > >>> > >>> developed a > >>> > >>> > >>> > >>> > visual navigation > >>> > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > (XVisualNavigator) > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > > >>> plugin for openoffice 3.2.1 using java. I am Currently > >>> working as an intern > >>> > at WSO2 > >>> > >>> > >>> which is an open source > >>> > >>> > >>> > >>> > middle-ware development company. > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > As a starting point to the project i would like to > >>> know about > >>> > >>> the > >>> > >>> > >>> > >>> > Equations which are needed to these calculations, > >>> how can i > >>> > >>> get those > >>> > >>> > >>> > >>> > and more about the project. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- Shameera Rathnayaka > >>> Undergraduate > >>> Department of Computer Science and Engineering > >>> University of Moratuwa. > >>> Sri Lanka. > >>> T.P. 0719221454 > >> > >> > >> > >> > >> -- > >> Shameera Rathnayaka > >> Undergraduate > >> Department of Computer Science and Engineering > >> University of Moratuwa. > >> Sri Lanka. > >> T.P. 0719221454 > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > -- > Shameera Rathnayaka > Undergraduate > Department of Computer Science and Engineering > University of Moratuwa. > Sri Lanka. > T.P. 0719221454 From uchathuranga at gmail.com Mon Mar 28 08:44:03 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Mon, 28 Mar 2011 18:14:03 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns Message-ID: Hi all, I am starting a new thread regarding some concerns about BioJava 1.8 and it's implementation of isoelectric point calculation and calculation the mass of peptides.Also about mapping these to BioJava 3.0 1. Package org.biojava.bio.proteomics - Class_IsoelectricPointCalc In the method "getIsoelectricPoint(SymbolList peptide)" they have used a SymbolList type as the parameter ,if we are going to port to BioJava3.0 ,What are the possible use of parameter instead of SymbolList in BioJava1.8? SymbolList interface in org.biojava.bio.symbol not helpful in understanding the use of SymbolList in Isoelectric Point calculation. 2. Package org.biojava.bio.proteomics - Class_MassCalc In "calcTermMass" method they have added extra H if MH_PLUS is true.I am little bit confuse why we have to add a extra H when calculating term mass? and What is the importance of MH_PLUS prpperty? If there are sample codes or demos how to use these classes and method, can anyone guide me to those? Thanks Regards Udana From uchathuranga at gmail.com Mon Mar 28 09:00:25 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Mon, 28 Mar 2011 18:30:25 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D906127.7020003@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> <4D872F38.20305@dundee.ac.uk> <4D906127.7020003@dundee.ac.uk> Message-ID: Hi Peter, Thanks Peter, I have already looked in to the updated short coding exercise page http://biojava.org/wiki/Short_coding_exercise#Submission. In my proposal can I mentioned the use of existing methods to calculate Molecular weight and Isoelectric pointin BioJava1.8? Can I add more related methods to my proposal? One more question regarding the proposal. What will be the input to these methods is it file containing protein sequences or string of the protein?. Thanks Regards Udana From p.v.troshin at dundee.ac.uk Mon Mar 28 10:30:04 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 15:30:04 +0100 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> <4D872F38.20305@dundee.ac.uk> <4D906127.7020003@dundee.ac.uk> Message-ID: <4D909B6C.7050506@dundee.ac.uk> >>>>In my proposal can I mentioned the use of existing methods to calculate Molecular weight and Isoelectric pointin BioJava1.8? Yes, good idea. >>>Can I add more related methods to my proposal? You definitely should. This is YOUR project, so you decide what to add/remove. If your ideas are sound and compelling you will have a better chance in the competition! >>>What will be the input to these methods is it file containing protein sequences or string of the protein? You should put in the proposal what you think would be the most appropriate method to handle the sequences. This will show how well you understand the task. Whatever you decide will have to be discussed further when it comes to implementation. I hope that helps. Regards, Peter On 28/03/2011 14:00, udana chathuranga wrote: > Hi Peter, > > Thanks Peter, I have already looked in to the updated short coding > exercise page http://biojava.org/wiki/Short_coding_exercise#Submission. > > In my proposal can I mentioned the use of existing methods to > calculate Molecular weight and Isoelectric pointin BioJava1.8? Can I > add more related methods to my proposal? One more question regarding > the proposal. What will be the input to these methods is it file > containing protein sequences or string of the protein?. > > > > Thanks > Regards > Udana > From Wim.DeSmet at UGent.be Mon Mar 28 10:46:33 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Mon, 28 Mar 2011 16:46:33 +0200 Subject: [Biojava-l] aligning sequences with ambiguous bases Message-ID: <4D909F49.1060605@UGent.be> Hi (sorry if you get 2 copies, I sent this to -request by mistake) Apologies if this has come up before, a quick search didn't turn anything up. I'm attempting to do a pairwise alignment between two DNA sequences using biojava 3. When I try to construct a DNASequence from a string that contains an ambiguous base though (in this case 'y'), I get the following stacktrace. Exception in thread "main" org.biojava3.core.exceptions.CompoundNotFoundError: Compound not found for: Cannot find compound for: y at org.biojava3.core.sequence.storage.ArrayListSequenceReader.setContents(ArrayListSequenceReader.java:196) at org.biojava3.core.sequence.template.AbstractSequence.(AbstractSequence.java:88) at org.biojava3.core.sequence.DNASequence.(DNASequence.java:64) Should I attempt to mask them somehow? What's the best way to deal with these? cheers Wim -- Wim De Smet http://www.straininfo.net/ From khalil.elmazouari at gmail.com Mon Mar 28 12:07:15 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 28 Mar 2011 18:07:15 +0200 Subject: [Biojava-l] RichSequence.IOTools performance Message-ID: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> Hi, I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! How one could improve the RichSequence.IOTools performance? Thanks. khalil From andreas at sdsc.edu Mon Mar 28 12:08:28 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 28 Mar 2011 09:08:28 -0700 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns In-Reply-To: References: Message-ID: Hi Udana, > 1. Package org.biojava.bio.proteomics - Class_IsoelectricPointCalc > In the method "getIsoelectricPoint(SymbolList peptide)" they have used a > SymbolList type as the parameter ,if we are going to port to BioJava3.0 > ,What are the possible use of parameter instead of SymbolList in BioJava1.8? The counterpart in the 3.x series would be to use directly the sequence interface or pass in a string representation of a sequence. > 2. Package org.biojava.bio.proteomics - Class_MassCalc > In "calcTermMass" method they have added extra H if MH_PLUS is true.I am > little bit confuse why we have to add a extra H when calculating term mass? > and What is the importance of MH_PLUS prpperty? not sure, somebody else might be able to say more about this > If there are sample codes or demos how to use these classes and method, can > anyone guide me to those? Did you see the cookbook ? http://biojava.org/wiki/BioJava:CookBook1.7#Proteomics Andreas From paiualex12 at gmail.com Mon Mar 28 12:17:35 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Mon, 28 Mar 2011 19:17:35 +0300 Subject: [Biojava-l] Final Application (Paiu Alexandru ) (added project plan) Message-ID: *1. **1.Your complete contact information*, including full name, physical address, preferred email address, and telephone number, plus other pertinent contact information such as IRC handles, etc. Full Name : Paiu Alexandru Address : Country Romania , city Constanta , Bld. Aurel Vlaicu , Nr. 41 , Bl. Pc1 sc. B , Et 6 , Apt. 46 E-mail : paiualex12 at google.com or paiualex12 at yahoo.com Telephone number : 40733924684 *2. **2.Why you are interested in the p*roject you are proposing and are well-suited to undertake it. This project suits me perfectly , because the interested students should have a general knowledge of core Java programming, knowledge of multi-threaded programming . I?ve started learning Java for 1 and a half years , and I used a lot of Threads in applications and projects . This is the only project that I apply , because I haven?t found a more interesting project than this one . *3. **3.A summary of your programming *experience and skills. I?ve did a lot of miniproject and applications for school and for me . I?ve made projects like : a) Lanchat Client-Server using TCP/IP ? I wrote two applications : one for the client and one for the server . I used an JApplet for the client with Swing elements . I?ve used Threads especially in the server sider application , and sockets b) Lanchat Peer-to-Peer using UDP and multicasting . I wrote only a application for the client . I used Threads and multicast sockets . c) A project for administrating a database , using a JApplet with Connector/J and MySql . It has to applications , one for clients and for the administrator . *4. **4.Programs or projects you have previously* authored or contributed to available as open-source, including, if applicable, any past Summer of Code involvement. I haven?t worked yet for any open-source and a I haven?t any past experience with GSoc , and it?s the first time a apply for a open-source project . I haven?t either worked for a company . *5. **5. A project plan for the project* you are proposing, even if your proposed project is directly based on one of the proposed project ideas for member projects. I wish to apply to the project called *Amino Acids physic-chemical properties calculation .* I?ve been thinking since some time of a possible implementation and I stopped at a single one (that I think it?s the best) . I will use two main classes . One that will represent an atom of a substance ( for example He , H , O , etc ) , that will have params like : atom weight , name , abbreviation , valence . I?ll use the second class for constructing amino-acids from this class . So , the second class will extend the class of atoms . So for example I have to initiate a molecule of H20 (water) . I will have a constructor with a string param , that will build the substance . For example , let?s say that the second class it?s called Aminoacids , and the first one Atoms . Let?s say I choose from a Combo box H2O ( it?s only a example) . Then I sad the string ? H2O1? to the aminoacids class , to intiate an object of aminoacid . That constructor will be evaluated char by char . If it?s found a char or two chars that means that I have to initiate an atom of that char or chars . If it?s found a number , then that means that it?s the multiplier of that atom before it . So the class aminoacids will have a private Object [] array , in which will be number and objects called atoms . So for H20 the array will look like this : array[0] = atom of H (Hidrogen) , array[1]=2 , array[2]=O (Oxigen) , array[3]=1 . All the know substances will be in a file called atoms.txt with atom mass , name , abbreviation etc . The atoms class will have a method to add new atoms to the list . And for calculating the molecular weigth the algorithm is very simple . We already have array={H,2,O,1} , and the atoms will have as params the atoms weight so all we have to do is just : Mol. Weight=H.weight*2+O.weigth.*1 The plan for implementing : - May 20-June 20 ? implementing the two classes and the first two methods - June 20 ? 20 July ? Implementing the rest of the methods - 20 July ? until the final ? final retouching , docummentation for end users , and 1 method proposed by me - *6. **6.Any obligations, vacations, or plans* for the summer that may require scheduling during the GSoC work period. I will have School final exams during 20 May ? 20 June . So I won?t be able to work at maximum capacity . That?s all . I *7. PS * I hope you've got my short coding exercise program ( I received a kinda error for sending a mail will atachement) thanks From p.v.troshin at dundee.ac.uk Fri Mar 25 09:01:17 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 25 Mar 2011 13:01:17 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> Message-ID: <4D8C921D.7040200@dundee.ac.uk> An HTML attachment was scrubbed... URL: From paiualex12 at gmail.com Sun Mar 27 13:59:13 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Sun, 27 Mar 2011 20:59:13 +0300 Subject: [Biojava-l] Alexandru Paiu(Coding exercise) Message-ID: I've attached to this mail my implementation of the Short Coding Exercise Thanks Alexandru Paiu -------------- next part -------------- A non-text attachment was scrubbed... Name: Paiu Alexandru.rar Type: application/rar Size: 22439 bytes Desc: not available URL: From uchathuranga at gmail.com Mon Mar 28 12:51:33 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Mon, 28 Mar 2011 22:21:33 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns In-Reply-To: References: Message-ID: Hi Andreas, Thanks for the help. Yes, I will look in to the CookBook1.7. Regards Udana From holland at eaglegenomics.com Mon Mar 28 12:15:09 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 28 Mar 2011 17:15:09 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> Message-ID: <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: > Hi, > > I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. > > When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! > > How one could improve the RichSequence.IOTools performance? > > Thanks. > > khalil > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From khalil.elmazouari at gmail.com Mon Mar 28 13:11:58 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 28 Mar 2011 19:11:58 +0200 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> Message-ID: Sequences objects are all in-memory. I agree, 10000 seq in ? 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am trying, as far as I can, to fine tune the app. Regards, khalil On 28 Mar 2011, at 18:15, Richard Holland wrote: > I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. > > > On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: > >> Hi, >> >> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >> >> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >> >> How one could improve the RichSequence.IOTools performance? >> >> Thanks. >> >> khalil >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > From holland at eaglegenomics.com Mon Mar 28 13:23:44 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 28 Mar 2011 18:23:44 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> Message-ID: <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> In which case you've got little option but to rewrite the GenbankFormat module to use NIO or other alternative methods for writing files. However before you do that I suggest you investigate the recent BioJava3 developments to see if they've already done anything in this area - Andy Yates is your man there. On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote: > Sequences objects are all in-memory. > I agree, 10000 seq in ? 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am trying, as far as I can, to fine tune the app. > > Regards, > > khalil > > On 28 Mar 2011, at 18:15, Richard Holland wrote: > >> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. >> >> >> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: >> >>> Hi, >>> >>> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >>> >>> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >>> >>> How one could improve the RichSequence.IOTools performance? >>> >>> Thanks. >>> >>> khalil >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From khalil.elmazouari at gmail.com Mon Mar 28 14:11:17 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 28 Mar 2011 20:11:17 +0200 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> Message-ID: <475BBAC9-2448-495F-85D1-D7BFFA2900D8@gmail.com> Hi, I just did some tests with FastaWriterHelper.writeSequence vs RichSequence.IOTools.writeFasta. Result was not expected!! FastaWriterHelper.writeSequence: 37% of the execution time. RichSequence.IOTools.writeFasta: 11.6% of the execution time. In this test, 9733 protein seq were annotated and 6599 seq out to multiple fasta file. In my case at lease, biojavax is more performant than Biojava3. Regards, Khalil PS: I AM NOT BENCHMARKING BIOJAVAX VS BIOJAVA3!!! On 28 Mar 2011, at 18:53, Scooter Willis wrote: > Khalil > > Biojava3 has significant speed improvements as a complete rewrite of > BioJava 1.7 but does not contain the same feature and functions. You > can do some testing with BioJava3 to see if you find the increased > performance and add port/write the code for the file formats you need > to support. > > Thanks > > Scooter > > On Mon, Mar 28, 2011 at 12:07 PM, Khalil El Mazouari > wrote: >> Hi, >> >> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >> >> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >> >> How one could improve the RichSequence.IOTools performance? >> >> Thanks. >> >> khalil >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> From p.v.troshin at dundee.ac.uk Mon Mar 28 15:44:06 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 20:44:06 +0100 Subject: [Biojava-l] Final Application (Paiu Alexandru ) (added project plan) In-Reply-To: References: Message-ID: <4D90E506.9090305@dundee.ac.uk> Hi Alex, You should not send your proposal directly to me or OBF. Here is what Google said about it: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs#directly Here is the GSoC program http://www.google-melange.com/gsoc/program/home/google/gsoc2011 Please do not forget to send your proposal to Google! Regards, Peter On 28/03/2011 17:17, Alexandru Paiu wrote: > *1. **1.Your complete contact information*, including full name, > physical address, preferred email address, and telephone number, plus other > pertinent contact information such as IRC handles, etc. > > > > Full Name : Paiu Alexandru > > Address : Country Romania , city Constanta , Bld. Aurel Vlaicu , Nr. 41 , > Bl. Pc1 sc. B , Et 6 , Apt. 46 > > E-mail : paiualex12 at google.com or paiualex12 at yahoo.com > > Telephone number : 40733924684 > > > > *2. **2.Why you are interested in the p*roject you are proposing and > are well-suited to undertake it. > > > > This project suits me perfectly , because the interested students should > have a general knowledge of core Java programming, knowledge of > multi-threaded programming . I?ve started learning Java for 1 and a half > years , and I used a lot of Threads in applications and projects . > > This is the only project that I apply , because I haven?t found a more > interesting project than this one . > > > > *3. **3.A summary of your programming *experience and skills. > > > > I?ve did a lot of miniproject and applications for school and for me . I?ve > made projects like : > > a) Lanchat Client-Server using TCP/IP ? I wrote two applications : one > for the client and one for the server . I used an JApplet for the client > with Swing elements . I?ve used Threads especially in the server sider > application , and sockets > > b) Lanchat Peer-to-Peer using UDP and multicasting . I wrote only a > application for the client . I used Threads and multicast sockets . > > c) A project for administrating a database , using a JApplet with > Connector/J and MySql . It has to applications , one for clients and for the > administrator . > > > *4. **4.Programs or projects you have previously* authored or > contributed to available as open-source, including, if applicable, any past > Summer of Code involvement. > > > > I haven?t worked yet for any open-source and a I haven?t any past experience > with GSoc , and it?s the first time a apply for a open-source project . I > haven?t either worked for a company . > > > > *5. **5. A project plan for the project* you are proposing, even if > your proposed project is directly based on one of the proposed project ideas > for member projects. > > > > I wish to apply to the project called *Amino Acids physic-chemical > properties calculation .* > > I?ve been thinking since some time of a possible implementation and I > stopped at a single one (that I think it?s the best) . > > > > I will use two main classes . One that will represent an atom of a substance > ( for example He , H , O , etc ) , that will have params like : atom weight > , name , abbreviation , valence . I?ll use the second class for > constructing amino-acids from this class . So , the second class will extend > the class of atoms . So for example I have to initiate a molecule of H20 > (water) . I will have a constructor with a string param , that will build > the substance . For example , let?s say that the second class it?s called > Aminoacids , and the first one Atoms . > > Let?s say I choose from a Combo box H2O ( it?s only a example) . Then I sad > the string ? H2O1? to the aminoacids class , to intiate an object of > aminoacid . That constructor will be evaluated char by char . If it?s found > a char or two chars that means that I have to initiate an atom of that char > or chars . If it?s found a number , then that means that it?s the multiplier > of that atom before it . > > So the class aminoacids will have a private Object [] array , in which will > be number and objects called atoms . > > So for H20 the array will look like this : array[0] = atom of H (Hidrogen) , > array[1]=2 , array[2]=O (Oxigen) , array[3]=1 . > > All the know substances will be in a file called atoms.txt with atom mass , > name , abbreviation etc . The atoms class will have a method to add new > atoms to the list . > > > > And for calculating the molecular weigth the algorithm is very simple . We > already have array={H,2,O,1} , and the atoms will have as params the atoms > weight so all we have to do is just : > > Mol. Weight=H.weight*2+O.weigth.*1 > > > > The plan for implementing : > > > > - May 20-June 20 ? implementing the two classes and the first two > methods > > - June 20 ? 20 July ? Implementing the rest of the methods > > - 20 July ? until the final ? final retouching , docummentation for > end users , and 1 method proposed by me > > - > > *6. **6.Any obligations, vacations, or plans* for the summer that may > require scheduling during the GSoC work period. > > > > I will have School final exams during 20 May ? 20 June . So I won?t be able > to work at maximum capacity . That?s all . I > > > *7. PS * > > I hope you've got my short coding exercise program ( I received a kinda > error for sending a mail will atachement) > > thanks > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From ayates at ebi.ac.uk Mon Mar 28 17:39:54 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 28 Mar 2011 22:39:54 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> Message-ID: <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> Dang Rich :). At the moment we've not done anything WRT Genbank outputting but would accept anything to help us out with this. As for the performance difference between BJ3 & BJ what happens if you use the writer objects directly with a BufferedOutputStream writer? Have you got any profiling results? It would be very interesting to see where we've lost the performance ... Andy On 28 Mar 2011, at 18:23, Richard Holland wrote: > In which case you've got little option but to rewrite the GenbankFormat module to use NIO or other alternative methods for writing files. However before you do that I suggest you investigate the recent BioJava3 developments to see if they've already done anything in this area - Andy Yates is your man there. > > On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote: > >> Sequences objects are all in-memory. >> I agree, 10000 seq in ? 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am trying, as far as I can, to fine tune the app. >> >> Regards, >> >> khalil >> >> On 28 Mar 2011, at 18:15, Richard Holland wrote: >> >>> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. >>> >>> >>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: >>> >>>> Hi, >>>> >>>> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >>>> >>>> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >>>> >>>> How one could improve the RichSequence.IOTools performance? >>>> >>>> Thanks. >>>> >>>> khalil >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From chapman at cs.wisc.edu Tue Mar 29 04:06:48 2011 From: chapman at cs.wisc.edu (Mark Chapman) Date: Tue, 29 Mar 2011 03:06:48 -0500 Subject: [Biojava-l] aligning sequences with ambiguous bases In-Reply-To: <4D909F49.1060605@UGent.be> References: <4D909F49.1060605@UGent.be> Message-ID: <4D919318.8000602@cs.wisc.edu> Hi Wim, The use of ambiguous nucleotides requires you to use the AmbiguityDNACompoundSet when you create your DNASequence, which means any: new DNASequence() changes to: new DNASequence(, AmbiguityDNACompoundSet.getDNACompoundSet()) I hope that helps, Mark On 3/28/2011 9:46 AM, Wim De Smet wrote: > Hi > > (sorry if you get 2 copies, I sent this to -request by mistake) > > Apologies if this has come up before, a quick search didn't turn anything up. > > I'm attempting to do a pairwise alignment between two DNA sequences using > biojava 3. When I try to construct a DNASequence from a string that contains an > ambiguous base though (in this case 'y'), I get the following stacktrace. > > Exception in thread "main" org.biojava3.core.exceptions.CompoundNotFoundError: > Compound not found for: Cannot find compound for: y > at > org.biojava3.core.sequence.storage.ArrayListSequenceReader.setContents(ArrayListSequenceReader.java:196) > > at > org.biojava3.core.sequence.template.AbstractSequence.(AbstractSequence.java:88) > > at org.biojava3.core.sequence.DNASequence.(DNASequence.java:64) > > Should I attempt to mask them somehow? What's the best way to deal with these? > > cheers > Wim From Wim.DeSmet at UGent.be Tue Mar 29 04:22:16 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Tue, 29 Mar 2011 10:22:16 +0200 Subject: [Biojava-l] aligning sequences with ambiguous bases In-Reply-To: <4D909F49.1060605@UGent.be> References: <4D909F49.1060605@UGent.be> Message-ID: <4D9196B8.6020409@UGent.be> I believe I figured it out. The constructor of DNASequence can take a CompoundSet and passing an AmbiguityDNACompoundSet in there seems to work. There's not a lot of documentation in the javadoc, but it seems to give the behaviour I want. cheers Wim On 28-03-11 16:46, Wim De Smet wrote: > Hi > > (sorry if you get 2 copies, I sent this to -request by mistake) > > Apologies if this has come up before, a quick search didn't turn > anything up. > > I'm attempting to do a pairwise alignment between two DNA sequences > using biojava 3. When I try to construct a DNASequence from a string > that contains an ambiguous base though (in this case 'y'), I get the > following stacktrace. > > Exception in thread "main" > org.biojava3.core.exceptions.CompoundNotFoundError: Compound not found > for: Cannot find compound for: y > at > org.biojava3.core.sequence.storage.ArrayListSequenceReader.setContents(ArrayListSequenceReader.java:196) > > at > org.biojava3.core.sequence.template.AbstractSequence.(AbstractSequence.java:88) > > at org.biojava3.core.sequence.DNASequence.(DNASequence.java:64) > > Should I attempt to mask them somehow? What's the best way to deal with > these? > > cheers > Wim -- Wim De Smet http://www.straininfo.net/ From chapman at cs.wisc.edu Tue Mar 29 04:45:57 2011 From: chapman at cs.wisc.edu (Mark Chapman) Date: Tue, 29 Mar 2011 03:45:57 -0500 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns In-Reply-To: References: Message-ID: <4D919C45.4090102@cs.wisc.edu> Hi Udana, >> 2. Package org.biojava.bio.proteomics - Class_MassCalc >> In "calcTermMass" method they have added extra H if MH_PLUS is true.I am >> little bit confuse why we have to add a extra H when calculating term mass? >> and What is the importance of MH_PLUS prpperty? > > not sure, somebody else might be able to say more about this > Common mass spectrometry techniques use a proton transfer to ionize proteins (or fragments) for analysis. This results in peaks in the mass spectra at the mass of MH+ for each molecule M. -Mark From chapman at cs.wisc.edu Tue Mar 29 05:35:48 2011 From: chapman at cs.wisc.edu (Mark Chapman) Date: Tue, 29 Mar 2011 04:35:48 -0500 Subject: [Biojava-l] question regarding MSA In-Reply-To: References: Message-ID: <4D91A7F4.6010306@cs.wisc.edu> Hi Bo, A starting point for formatted output with conservation symbols is already implemented for pairwise alignments. You can try it out by removing one of the protein ID's on line 16 of the cookbook code and replacing line 30 with: System.out.println(profile.toString(Profile.StringFormat.CLUSTALW)); The code that would need updating for multiple alignments is in SimpleProfile.printConservation and around the call to it in the toString helper method. -Mark On 2/24/2011 2:06 AM, Andreas Prlic wrote: > Hi Bo Li, > > The printing method currently does not add those characters to the > display of the aligned sequences. If you need it you would have to > patch the printing method... > > Andreas > > On Tue, Feb 22, 2011 at 11:16 PM, Bo Li wrote: >> Hi, >> >> Sorry for the bothering. I tried the MSA feature by following the link: >> >> http://www.biojava.org/wiki/BioJava:CookBook3:MSA >> >> However, I can't see the symbols like ".", ":", and "*" like I can see from >> the output ClustalW. >> >> So is there a way for users to obtain such information in the output from >> MSA? >> >> Thanks, >> Bo Li >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From Wim.DeSmet at UGent.be Tue Mar 29 05:59:18 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Tue, 29 Mar 2011 11:59:18 +0200 Subject: [Biojava-l] aligning sequences with ambiguous bases In-Reply-To: <4D919318.8000602@cs.wisc.edu> References: <4D909F49.1060605@UGent.be> <4D919318.8000602@cs.wisc.edu> Message-ID: <4D91AD76.8060305@UGent.be> Hi Mark Thanks! I guess I should have pressed my Get Mail button before sending my second message. Good to know I chose the "correct" solution. cheers Wim On 29-03-11 10:06, Mark Chapman wrote: > Hi Wim, > > The use of ambiguous nucleotides requires you to use the > AmbiguityDNACompoundSet when you create your DNASequence, which means any: > > new DNASequence() > > changes to: > > new DNASequence(, AmbiguityDNACompoundSet.getDNACompoundSet()) > > I hope that helps, > Mark > > > On 3/28/2011 9:46 AM, Wim De Smet wrote: >> Hi >> >> (sorry if you get 2 copies, I sent this to -request by mistake) >> >> Apologies if this has come up before, a quick search didn't turn >> anything up. >> >> I'm attempting to do a pairwise alignment between two DNA sequences using >> biojava 3. When I try to construct a DNASequence from a string that >> contains an >> ambiguous base though (in this case 'y'), I get the following stacktrace. >> >> Exception in thread "main" >> org.biojava3.core.exceptions.CompoundNotFoundError: >> Compound not found for: Cannot find compound for: y >> at >> org.biojava3.core.sequence.storage.ArrayListSequenceReader.setContents(ArrayListSequenceReader.java:196) >> >> >> at >> org.biojava3.core.sequence.template.AbstractSequence.(AbstractSequence.java:88) >> >> >> at org.biojava3.core.sequence.DNASequence.(DNASequence.java:64) >> >> Should I attempt to mask them somehow? What's the best way to deal >> with these? >> >> cheers >> Wim > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Wim De Smet http://www.straininfo.net/ From khalil.elmazouari at gmail.com Tue Mar 29 10:41:13 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 29 Mar 2011 16:41:13 +0200 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> Message-ID: <7AACD1B8-6215-4682-9BE6-6646BC6C66CC@gmail.com> Hi, using nio, the app performance improved well. App tested for 6599 annotated genbank seq. 1. RichSequence.IOTools.writeGenbank(myFileOutputStream, mySeq, null): 57% of app exec time. 2. writing mySeq -> byteArrayOutputStream -> byteBuffer -> fileChannel (code below): 31% of exec time. ByteArrayOutputStream baos = new ByteArrayOutputStream(); RichSequence.IOTools.writeGenbank(baos, mySeq, null); ByteBuffer buf = ByteBuffer.wrap(baos.toByteArray()); fileChannel.write(buf); any suggestion on how to improve the performance (further ;-)) is welcome. Regards, khalil On 28 Mar 2011, at 23:39, Andy Yates wrote: > Dang Rich :). > > At the moment we've not done anything WRT Genbank outputting but would accept anything to help us out with this. > > As for the performance difference between BJ3 & BJ what happens if you use the writer objects directly with a BufferedOutputStream writer? Have you got any profiling results? It would be very interesting to see where we've lost the performance ... > > Andy > > On 28 Mar 2011, at 18:23, Richard Holland wrote: > >> In which case you've got little option but to rewrite the GenbankFormat module to use NIO or other alternative methods for writing files. However before you do that I suggest you investigate the recent BioJava3 developments to see if they've already done anything in this area - Andy Yates is your man there. >> >> On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote: >> >>> Sequences objects are all in-memory. >>> I agree, 10000 seq in ? 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am trying, as far as I can, to fine tune the app. >>> >>> Regards, >>> >>> khalil >>> >>> On 28 Mar 2011, at 18:15, Richard Holland wrote: >>> >>>> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. >>>> >>>> >>>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: >>>> >>>>> Hi, >>>>> >>>>> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >>>>> >>>>> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >>>>> >>>>> How one could improve the RichSequence.IOTools performance? >>>>> >>>>> Thanks. >>>>> >>>>> khalil >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>> >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > From khalil.elmazouari at gmail.com Tue Mar 29 17:47:37 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 29 Mar 2011 23:47:37 +0200 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> Message-ID: <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> Hi I am using netbeans profiler. The total exec time was ? 20s (macbook pro i7, 4GB, SSD) for ? 10.000 seq. By writing the RichSequence object to ByteArrayOutputStream -> FileChannel, where appropriate, the total exec time dropped to 7s. Huge improvement, for the app I am developing. The app will be used to analyze ? 100,000 sequence per run. Regards, khalil On 29 Mar 2011, at 22:13, Scooter Willis wrote: > Instead of percentage metrics can you get the time before and after the write execution for comparison without profiling. What profiler are you using? > > >> On Mar 28, 2011 5:39 PM, "Andy Yates" wrote: >> >> Dang Rich :). >> >> At the moment we've not done anything WRT Genbank outputting but would accept anything to help us out with this. >> >> As for the performance difference between BJ3 & BJ what happens if you use the writer objects directly with a BufferedOutputStream writer? Have you got any profiling results? It would be very interesting to see where we've lost the performance ... >> >> Andy >> >> On 28 Mar 2011, at 18:23, Richard Holland wrote: >> >> > In which case you've got little option but to r... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open... >> > From uchathuranga at gmail.com Tue Mar 29 23:05:11 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Wed, 30 Mar 2011 08:35:11 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns In-Reply-To: <4D919C45.4090102@cs.wisc.edu> References: <4D919C45.4090102@cs.wisc.edu> Message-ID: Hi Mark, Thanks Mark. Really appreciate your help, If I have any more questions, I will post to this thread. Thanks, Regards, Udana. From rmb32 at cornell.edu Tue Mar 29 17:20:41 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 29 Mar 2011 14:20:41 -0700 Subject: [Biojava-l] Announcing OBF Summer of Code - please forward! Message-ID: <4D924D29.3020707@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 8! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2011 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2011 Applications due 19:00 UTC, April 8, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 28 through Friday, April 8th, 2011. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2011 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs From paiualex12 at gmail.com Wed Mar 30 08:50:08 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Wed, 30 Mar 2011 15:50:08 +0300 Subject: [Biojava-l] Short Coding exercise version 2 ( improved version ) (Alexandru Paiu) Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: runmev2.rar Type: application/rar Size: 29223 bytes Desc: not available URL: From ayates at ebi.ac.uk Thu Mar 31 03:57:33 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 31 Mar 2011 08:57:33 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> Message-ID: <1D956153-515B-4E0B-AC95-428426A31BEE@ebi.ac.uk> Makes a lot of sense. There's no way of knowing if a stream is buffered unless the top level object given was an instance of BufferedOutputStream. Does this mean that by some fluke we could buffer a buffered stream? TBH I'm more glad that we've got the speed back :). Andy On 30 Mar 2011, at 20:38, Scooter Willis wrote: > Khalil > > For BioJava3 FastaWriter was simply using an OutputStream where its > use was wrapped by FastaWriterHelper which was not using a > BufferedOutputStream. I made changes to FastaWriter to check if the > OutputStream is an instance of BufferedOutputStream and if not create > one locally and the close when returning. The writing of 10,000 > sequences or 4.5MB of data went from 15 seconds to .6 seconds. I > checked in the code change if you wanted to test using your code. > > Thanks > > Scooter > > On Tue, Mar 29, 2011 at 5:47 PM, Khalil El Mazouari > wrote: >> Hi >> I am using netbeans profiler. >> The total exec time was ? 20s (macbook pro i7, 4GB, SSD) for ? 10.000 seq. >> By writing the RichSequence object to ByteArrayOutputStream -> FileChannel, >> where appropriate, the total exec time dropped to 7s. Huge improvement, for >> the app I am developing. The app will be used to analyze ? 100,000 sequence >> per run. >> Regards, >> khalil >> >> On 29 Mar 2011, at 22:13, Scooter Willis wrote: >> >> Instead of percentage metrics can you get the time before and after the >> write execution for comparison without profiling. What profiler are you >> using? >> >> On Mar 28, 2011 5:39 PM, "Andy Yates" wrote: >> >> Dang Rich :). >> >> At the moment we've not done anything WRT Genbank outputting but would >> accept anything to help us out with this. >> >> As for the performance difference between BJ3 & BJ what happens if you use >> the writer objects directly with a BufferedOutputStream writer? Have you got >> any profiling results? It would be very interesting to see where we've lost >> the performance ... >> >> Andy >> >> On 28 Mar 2011, at 18:23, Richard Holland wrote: >> >>> In which case you've got little option but to r... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open... >> >> -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Thu Mar 31 07:01:33 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 31 Mar 2011 12:01:33 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> <1D956153-515B-4E0B-AC95-428426A31BEE@ebi.ac.uk> Message-ID: Won't that close down the underlying stream which was given in the first place? Not sure if anyone would notice it TBH but it could look odd that the level responsible for creating the original (file) stream isn't responsible for closing it Andy On 31 Mar 2011, at 11:59, Scooter Willis wrote: > Andy > > I check if OutputStream is an instance of BufferedOutputStrem. If it is don't do anything. If not created a local BufferedOutputStream use it then close it and return. > > Scooter > > >> On Mar 31, 2011 3:57 AM, "Andy Yates" wrote: >> >> Makes a lot of sense. There's no way of knowing if a stream is buffered unless the top level object given was an instance of BufferedOutputStream. Does this mean that by some fluke we could buffer a buffered stream? >> >> TBH I'm more glad that we've got the speed back :). >> >> Andy >> >> On 30 Mar 2011, at 20:38, Scooter Willis wrote: >> >> > Khalil >> > >> > For BioJava3 FastaWriter was simply ... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1... >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From holland at eaglegenomics.com Thu Mar 31 07:14:25 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 31 Mar 2011 12:14:25 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> <1D956153-515B-4E0B-AC95-428426A31BEE@ebi.ac.uk> Message-ID: <47AE0504-62D4-41EF-B234-7F1932026DAF@eaglegenomics.com> Closing BufferedOutputStream will close the parent stream too, as it inherits this behaviour from FilterOutputStream. On 31 Mar 2011, at 12:07, Scooter Willis wrote: > I don't think closing the BufferedOutputStream by contract should close a parent stream. Those that open should be responsible for closing. I have seen cases where you don't close BufferedOutputStream you don't get a flush of all data if you just close parent OutputStream. I will test it. > > >> On Mar 31, 2011 7:01 AM, "Andy Yates" wrote: >> >> Won't that close down the underlying stream which was given in the first place? Not sure if anyone would notice it TBH but it could look odd that the level responsible for creating the original (file) stream isn't responsible for closing it >> >> Andy >> >> On 31 Mar 2011, at 11:59, Scooter Willis wrote: >> >> > Andy >> > >> > I check if OutputStream is an instance... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)12... >> > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Thu Mar 31 08:28:40 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 31 Mar 2011 13:28:40 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> <1D956153-515B-4E0B-AC95-428426A31BEE@ebi.ac.uk> <47AE0504-62D4-41EF-B234-7F1932026DAF@eaglegenomics.com> Message-ID: Really it should require the minimum necessary, i.e. a simple OutputStream. If the user wants to improve that performance then they can pass in something better, e.g. BufferedOutputStream. By requiring BufferedOutputStream from the start you rule out users who want to use other mechanisms, e.g. FileOutputStream, ByteArrayOutputStream, etc. On 31 Mar 2011, at 13:23, Scooter Willis wrote: > Richard > > Just tested and it does look like it gets closed. I removed the close > of the BufferedOutputStream and replaced with a flush. In looking > through the BufferedOutputStream.java it looks like it should be > garbage collected as it doesn't pass any references to itself > anywhere. I checked in the change. > > The other option is to only allow FastaWriter to take a > BufferedOutputStream or change the code in FastaWriterHelper to use > BufferedOutputStream. Any suggestions on best practice for the API to > protect the innocent who want speed? Can you think of any reason we > would not use a BufferedOutputStream when writing? > > Scooter > > On Thu, Mar 31, 2011 at 7:14 AM, Richard Holland > wrote: >> Closing BufferedOutputStream will close the parent stream too, as it inherits this behaviour from FilterOutputStream. >> >> On 31 Mar 2011, at 12:07, Scooter Willis wrote: >> >>> I don't think closing the BufferedOutputStream by contract should close a parent stream. Those that open should be responsible for closing. I have seen cases where you don't close BufferedOutputStream you don't get a flush of all data if you just close parent OutputStream. I will test it. >>> >>> >>>> On Mar 31, 2011 7:01 AM, "Andy Yates" wrote: >>>> >>>> Won't that close down the underlying stream which was given in the first place? Not sure if anyone would notice it TBH but it could look odd that the level responsible for creating the original (file) stream isn't responsible for closing it >>>> >>>> Andy >>>> >>>> On 31 Mar 2011, at 11:59, Scooter Willis wrote: >>>> >>>>> Andy >>>>> >>>>> I check if OutputStream is an instance... >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)12... >>>> >>> >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From rmb32 at cornell.edu Thu Mar 31 17:58:52 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 31 Mar 2011 14:58:52 -0700 Subject: [Biojava-l] Reminder: GSoC proposals due in 1 week Message-ID: <4D94F91C.1080005@cornell.edu> Hi all, Just a reminder, Google Summer of Code student applications are due April 8! If you're a student planning to apply to GSoC with OBF, it's very much in your best interest to write your proposal *early*, like now, and get it into the hands of the developers and mentors on your subproject (BioPerl/Ruby/Python/etc) so that they can give you some feedback on it. The final proposals must, of course, still be submitted to Google through the GSoC web application, as described on the main GSoC site (http://www.google-melange.com/gsoc/homepage/google/gsoc2011). Rob Buels OBF GSoC 2011 Administrator From rmb32 at cornell.edu Thu Mar 31 18:04:49 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 31 Mar 2011 15:04:49 -0700 Subject: [Biojava-l] GSoC call for mentors Message-ID: <4D94FA81.5090701@cornell.edu> Hi all, For current developers on OBF projects: If you would not mind being a mentor to a Summer of Code student this summer, please make sure you sign up as an OBF mentor in the GSoC web app. There's a link under "mentors: apply now!" midway down the page at http://www.google-melange.com/. If you didn't do last year's summer of code, it would be a good idea to drop me an email introducing yourself, as well, or I won't know whether to approve your request. :-) Being signed up as an OBF GSoC mentor will give you access to the student proposals, as they come in, and the ability to comment on them and assign scores to the ones you think show the most promise. If you sign up as a mentor, please also add yourself to the two OBF GSoC mailing lists: OBF-GSoC and OBF-GSoC-mentors OBF-GSoC list: http://lists.open-bio.org/mailman/listinfo/gsoc OBF mentors: http://lists.open-bio.org/mailman/listinfo/gsoc-mentors Thanks in advance! Rob --- Robert Buels OBF GSoC 2011 Administrator From andreas at sdsc.edu Wed Mar 2 05:12:19 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 1 Mar 2011 21:12:19 -0800 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D6BBCB7.3010203@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> Message-ID: Hi Peter, we still don;t know yet if we will have support from Google again this year. Once we have a confirmation we will use the wiki site again for hosting pages related to GSoC. However we should do this project in any case... Andreas On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin wrote: >>>> What other functionality would you >>>> like to see that is currently not there? > > > I think that the methods below would be a good starting point, then the > Google Summer of Code student can propose something else that he/she would > fancy implementing. > > ?Molecular weight > ?Extinction coefficient > ?Instability index > ?Aliphatic index > ?Grand Average of Hydropathy > ?Isoelectric point > ?Number of amino acids in the protein (His, Met, Cys) > > I know BioJava projects were managed under Open Bioinformatics Foundation > (OBF) during last years GSoC. Is there a page for this year GSoC ideas > somewhere? > > Regards, > Peter > > > On 25/02/2011 05:12, Andreas Prlic wrote: >> >> Great, seems we have an agreement that we want to improve >> functionality for this. How complex is this going to be? From quickly >> checking the 1.8 source it looks like just a few classes that need to >> be converted and not too painful. ?What other functionality would you >> like to see that is currently not there? >> >> Andreas >> >> >> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis ?wrote: >>> >>> We put in some basics regarding modeling amino acid properties in the >>> core module but really didn't have any pressing use cases to drive the >>> api beyond calculating the mass of a peptide. We currently have >>> getMolecularWeight() as a method in AbstractCompound but never added a >>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>> get the attributes/features of amino acids properly modeled in core >>> and extend when reasonable useful summary methods at higher levels. >>> You should be able to query mass of a peptide and have it valid for an >>> amino acid with a PTM which means the amino acid needs to support the >>> ability to be modified in a flexible manner. I spent the last year+ >>> developing a software suite for peptide detection in MS data for >>> deuterium exchange where automated PTM detection was important. Would >>> be great to get some focused attention on the core to make sure we can >>> model nucleotides and amino acids with a chemistry friendly API. >>> >>> Thanks >>> >>> Scooter >>> >>> On Thu, Feb 24, 2011 at 2:15 PM, George Waldon >>> ?wrote: >>>> >>>> Hello Peter& ?Andreas >>>> >>>> I effectively did some work on these methods, mostly fixing and adding >>>> the >>>> ExPASy algorithm that was kindly provided to me. I think it makes a lot >>>> of >>>> sense to port all physico-chemical property calculations related to >>>> amino >>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>> definitively >>>> support the effort. We could smoothly deprecate the bj1 package when >>>> this is >>>> done. Let me know how I could help. >>>> >>>> Thanks >>>> George >>>> >>>> Quoting Peter Troshin: >>>> >>>>> Hi Andreas, >>>>> >>>>> In fact I'd be happy to help with the development of the tools for >>>>> simple >>>>> physico-chemical properties calculation for peptides. We could port >>>>> George?s >>>>> code (assuming he is happy with this) from BioJava 1.8 but we can also >>>>> provide a few other methods. A couple of projects in the lab where I >>>>> work >>>>> would have benefited from having these calculations readily available. >>>>> >>>>> I was thinking about participation in the Google Summer of Code (GoSC) >>>>> this year as a mentor, and I think this would be an easy project for a >>>>> student. What do you think about this? >>>>> >>>>> Thank you for your prompt reply. >>>>> >>>>> Regards, >>>>> Peter >>>>> >>>>> >>>>> >>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>> >>>>>> Hi Peter, >>>>>> >>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>> like to port this to biojava 3 as well.. George do you want to help me >>>>>> with that, since you are one of the authors of this package? The basic >>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>> class) >>>>>> >>>>>> Andreas >>>>>> >>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>> Troshin >>>>>> ?wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>> org.biojava.bio.proteomics package, which had methods for isoelectric >>>>>>> point >>>>>>> and molecular weight calculations for peptides. I could not find this >>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>> wonder >>>>>>> if there are any equivalent methods available in the latest version >>>>>>> of >>>>>>> BioJava? >>>>>>> >>>>>>> Thank you for your help, >>>>>>> >>>>>>> Kind regards, >>>>>>> Peter >>>>>>> >>>>>>> Dr Peter Troshin >>>>>>> Bioinformatics Software Developer >>>>>>> Phone: +44 (0)1382 388589 >>>>>>> Fax: +44 (0)1382 385764 >>>>>>> The Barton Group >>>>>>> College of Life Sciences >>>>>>> Medical Sciences Institute >>>>>>> University of Dundee >>>>>>> Dundee >>>>>>> DD1 5EH >>>>>>> UK >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >> >> > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From p.v.troshin at dundee.ac.uk Wed Mar 2 17:06:34 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 02 Mar 2011 17:06:34 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> Message-ID: <4D6E791A.1000907@dundee.ac.uk> My apologies, the link to the timeline should have been this http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2011/faqs#timeline I have to say that Google has been pretty consistent with the dates, so the date for the announcement is still the same (18 of March). Peter >>>>Hi Andreas, >>>>It's not a lot to wait before we know whether Google supports OBF this year or not. >>>>According to the GSoC timeline >>>>(http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#timeline ) >>>>on the 18 of March they will publish the list of organisations they >>>>will support. Let's wait and see. >>>>Kind regards, >>>>Peter On 02/03/2011 05:12, Andreas Prlic wrote: > Hi Peter, > > we still don;t know yet if we will have support from Google again this > year. Once we have a confirmation we will use the wiki site again for > hosting pages related to GSoC. However we should do this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin wrote: >>>>> What other functionality would you >>>>> like to see that is currently not there? >> >> I think that the methods below would be a good starting point, then the >> Google Summer of Code student can propose something else that he/she would >> fancy implementing. >> >> Molecular weight >> Extinction coefficient >> Instability index >> Aliphatic index >> Grand Average of Hydropathy >> Isoelectric point >> Number of amino acids in the protein (His, Met, Cys) >> >> I know BioJava projects were managed under Open Bioinformatics Foundation >> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >> somewhere? >> >> Regards, >> Peter >> >> >> On 25/02/2011 05:12, Andreas Prlic wrote: >>> Great, seems we have an agreement that we want to improve >>> functionality for this. How complex is this going to be? From quickly >>> checking the 1.8 source it looks like just a few classes that need to >>> be converted and not too painful. What other functionality would you >>> like to see that is currently not there? >>> >>> Andreas >>> >>> >>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis wrote: >>>> We put in some basics regarding modeling amino acid properties in the >>>> core module but really didn't have any pressing use cases to drive the >>>> api beyond calculating the mass of a peptide. We currently have >>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>> get the attributes/features of amino acids properly modeled in core >>>> and extend when reasonable useful summary methods at higher levels. >>>> You should be able to query mass of a peptide and have it valid for an >>>> amino acid with a PTM which means the amino acid needs to support the >>>> ability to be modified in a flexible manner. I spent the last year+ >>>> developing a software suite for peptide detection in MS data for >>>> deuterium exchange where automated PTM detection was important. Would >>>> be great to get some focused attention on the core to make sure we can >>>> model nucleotides and amino acids with a chemistry friendly API. >>>> >>>> Thanks >>>> >>>> Scooter >>>> >>>> On Thu, Feb 24, 2011 at 2:15 PM, George Waldon >>>> wrote: >>>>> Hello Peter& Andreas >>>>> >>>>> I effectively did some work on these methods, mostly fixing and adding >>>>> the >>>>> ExPASy algorithm that was kindly provided to me. I think it makes a lot >>>>> of >>>>> sense to port all physico-chemical property calculations related to >>>>> amino >>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>> definitively >>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>> this is >>>>> done. Let me know how I could help. >>>>> >>>>> Thanks >>>>> George >>>>> >>>>> Quoting Peter Troshin: >>>>> >>>>>> Hi Andreas, >>>>>> >>>>>> In fact I'd be happy to help with the development of the tools for >>>>>> simple >>>>>> physico-chemical properties calculation for peptides. We could port >>>>>> George?s >>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can also >>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>> work >>>>>> would have benefited from having these calculations readily available. >>>>>> >>>>>> I was thinking about participation in the Google Summer of Code (GoSC) >>>>>> this year as a mentor, and I think this would be an easy project for a >>>>>> student. What do you think about this? >>>>>> >>>>>> Thank you for your prompt reply. >>>>>> >>>>>> Regards, >>>>>> Peter >>>>>> >>>>>> >>>>>> >>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>> Hi Peter, >>>>>>> >>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>> like to port this to biojava 3 as well.. George do you want to help me >>>>>>> with that, since you are one of the authors of this package? The basic >>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>> class) >>>>>>> >>>>>>> Andreas >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>> Troshin >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>> org.biojava.bio.proteomics package, which had methods for isoelectric >>>>>>>> point >>>>>>>> and molecular weight calculations for peptides. I could not find this >>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>> wonder >>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>> of >>>>>>>> BioJava? >>>>>>>> >>>>>>>> Thank you for your help, >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Peter >>>>>>>> >>>>>>>> Dr Peter Troshin >>>>>>>> Bioinformatics Software Developer >>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>> The Barton Group >>>>>>>> College of Life Sciences >>>>>>>> Medical Sciences Institute >>>>>>>> University of Dundee >>>>>>>> Dundee >>>>>>>> DD1 5EH >>>>>>>> UK >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>> >> > > From p.v.troshin at dundee.ac.uk Wed Mar 2 17:00:16 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 02 Mar 2011 17:00:16 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> Message-ID: <4D6E77A0.4060104@dundee.ac.uk> Hi Andreas, It's not a lot to wait before we know whether Google supports OBF this year or not. According to the GSoC timeline (http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#timeline ) on the 18 of March they will publish the list of organisations they will support. Let's wait and see. Kind regards, Peter On 02/03/2011 05:12, Andreas Prlic wrote: > Hi Peter, > > we still don;t know yet if we will have support from Google again this > year. Once we have a confirmation we will use the wiki site again for > hosting pages related to GSoC. However we should do this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin wrote: >>>>> What other functionality would you >>>>> like to see that is currently not there? >> >> I think that the methods below would be a good starting point, then the >> Google Summer of Code student can propose something else that he/she would >> fancy implementing. >> >> Molecular weight >> Extinction coefficient >> Instability index >> Aliphatic index >> Grand Average of Hydropathy >> Isoelectric point >> Number of amino acids in the protein (His, Met, Cys) >> >> I know BioJava projects were managed under Open Bioinformatics Foundation >> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >> somewhere? >> >> Regards, >> Peter >> >> >> On 25/02/2011 05:12, Andreas Prlic wrote: >>> Great, seems we have an agreement that we want to improve >>> functionality for this. How complex is this going to be? From quickly >>> checking the 1.8 source it looks like just a few classes that need to >>> be converted and not too painful. What other functionality would you >>> like to see that is currently not there? >>> >>> Andreas >>> >>> >>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis wrote: >>>> We put in some basics regarding modeling amino acid properties in the >>>> core module but really didn't have any pressing use cases to drive the >>>> api beyond calculating the mass of a peptide. We currently have >>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>> get the attributes/features of amino acids properly modeled in core >>>> and extend when reasonable useful summary methods at higher levels. >>>> You should be able to query mass of a peptide and have it valid for an >>>> amino acid with a PTM which means the amino acid needs to support the >>>> ability to be modified in a flexible manner. I spent the last year+ >>>> developing a software suite for peptide detection in MS data for >>>> deuterium exchange where automated PTM detection was important. Would >>>> be great to get some focused attention on the core to make sure we can >>>> model nucleotides and amino acids with a chemistry friendly API. >>>> >>>> Thanks >>>> >>>> Scooter >>>> >>>> On Thu, Feb 24, 2011 at 2:15 PM, George Waldon >>>> wrote: >>>>> Hello Peter& Andreas >>>>> >>>>> I effectively did some work on these methods, mostly fixing and adding >>>>> the >>>>> ExPASy algorithm that was kindly provided to me. I think it makes a lot >>>>> of >>>>> sense to port all physico-chemical property calculations related to >>>>> amino >>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>> definitively >>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>> this is >>>>> done. Let me know how I could help. >>>>> >>>>> Thanks >>>>> George >>>>> >>>>> Quoting Peter Troshin: >>>>> >>>>>> Hi Andreas, >>>>>> >>>>>> In fact I'd be happy to help with the development of the tools for >>>>>> simple >>>>>> physico-chemical properties calculation for peptides. We could port >>>>>> George?s >>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can also >>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>> work >>>>>> would have benefited from having these calculations readily available. >>>>>> >>>>>> I was thinking about participation in the Google Summer of Code (GoSC) >>>>>> this year as a mentor, and I think this would be an easy project for a >>>>>> student. What do you think about this? >>>>>> >>>>>> Thank you for your prompt reply. >>>>>> >>>>>> Regards, >>>>>> Peter >>>>>> >>>>>> >>>>>> >>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>> Hi Peter, >>>>>>> >>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>> like to port this to biojava 3 as well.. George do you want to help me >>>>>>> with that, since you are one of the authors of this package? The basic >>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>> class) >>>>>>> >>>>>>> Andreas >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>> Troshin >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>> org.biojava.bio.proteomics package, which had methods for isoelectric >>>>>>>> point >>>>>>>> and molecular weight calculations for peptides. I could not find this >>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>> wonder >>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>> of >>>>>>>> BioJava? >>>>>>>> >>>>>>>> Thank you for your help, >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Peter >>>>>>>> >>>>>>>> Dr Peter Troshin >>>>>>>> Bioinformatics Software Developer >>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>> The Barton Group >>>>>>>> College of Life Sciences >>>>>>>> Medical Sciences Institute >>>>>>>> University of Dundee >>>>>>>> Dundee >>>>>>>> DD1 5EH >>>>>>>> UK >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>> >> > > From bart.mesuere at ugent.be Fri Mar 4 07:37:15 2011 From: bart.mesuere at ugent.be (Bart Mesuere) Date: Fri, 4 Mar 2011 08:37:15 +0100 Subject: [Biojava-l] read DBLINK field from genbank file Message-ID: Hi, I'm trying to read some genbank files using the 1.7 legacy code. All works fine but I'm having trouble extracting the projectid located in the DBLINK field (in bold in the example): LOCUS NC_009926 374161 bp DNA circular BCT > 25-JUL-2008 > DEFINITION Acaryochloris marina MBIC11017 plasmid pREB1, complete > sequence. > ACCESSION NC_009926 > VERSION NC_009926.1 GI:158339488 > *DBLINK Project: 58167* I already inspected the SimpleRichSequence object with a debugger but couldn't find anything useful. Is it possible to read this field using biojava 1.7? Kind regards, Bart Mesuere From andreas at sdsc.edu Fri Mar 4 16:38:10 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 4 Mar 2011 08:38:10 -0800 Subject: [Biojava-l] [Biojava-dev] Bioinformatics Open Source Conference (BOSC 2011)--Call for Abstracts In-Reply-To: <20110304123756.GE27839@sobchak> References: <20110304123756.GE27839@sobchak> Message-ID: Anybody who wants to submit the BioJava abstract for the BOSC meeting this year? - I will be in Vienna and can help, however I will be attending the 3D-SIG meeting which is at the same time... Andreas On Fri, Mar 4, 2011 at 4:37 AM, Brad Chapman wrote: > We invite you to submit an abstract to BOSC 2011! Please forward this > message as appropriate, and forgive multiple postings. > > Call for Abstracts for the 12th Annual Bioinformatics Open Source Conference (BOSC 2011) > An ISMB 2011 Special Interest Group (SIG) > > Dates: July 15-16, 2011 > Location: Vienna, Austria > Web site: http://www.open-bio.org/wiki/BOSC_2011 > Email: bosc at open-bio.org > BOSC announcements mailing list: ?http://lists.open-bio.org/mailman/listinfo/bosc-announce > > Important Dates: > April 18, 2011: Deadline for submitting abstracts to BOSC 2011 > May 9, 2011: Notifications of accepted abstracts emailed to corresponding authors > July 13-14, 2011: Codefest 2011 programming session (see http://www.open-bio.org/wiki/Codefest_2011 for details) > July 15-16, 2011: BOSC 2011 > July 17-19, 2011: ISMB 2011 > > The Bioinformatics Open Source Conference (BOSC) is sponsored by the > Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated > to promoting the practice and philosophy of Open Source software > development within the biological research community. To be considered > for acceptance, software systems representing the central topic in a > presentation submitted to BOSC must be licensed with a recognized Open > Source License, and be freely available for download in source code > form. > > We invite you to submit abstracts for talks and posters. ?Sessions include: > - Approaches to parallel processing > - Cloud-based approaches to improving software and data accessibility > - The Semantic Web in open source bioinformatics > - Data visualization > - Tools for next-generation sequencing > - Other Open Source software > > In addition to the above sessions, there will be a panel discussion > about "Meeting the challenges of inter-institutional collaboration". We > are also working to arrange a joint session with one of the other ISMB > SIGs. > > Thanks to generous sponsorship from Eagle Genomics and an anonymous > donor, we are pleased to announce a competition for three Student Travel > Awards for BOSC 2011. Each winner will be awarded $250 to defray the > costs of travel to BOSC 2011. > > For instructions on submitting your abstract, please visit > http://www.open-bio.org/wiki/BOSC_2011#Abstract_Submission_Information > > BOSC 2011 Organizing Committee: > Nomi Harris and Peter Rice (co-chairs); Brad Chapman, Peter Cock, Erwin Frise, Darin London, Ron Taylor > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Fri Mar 4 16:43:44 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 4 Mar 2011 16:43:44 +0000 Subject: [Biojava-l] [Biojava-dev] Bioinformatics Open Source Conference (BOSC 2011)--Call for Abstracts In-Reply-To: References: <20110304123756.GE27839@sobchak> Message-ID: i will be attending BOSC and chairing a session. probably best if i do not present though as i am no longer directly involved with biojava. On 4 Mar 2011, at 16:38, Andreas Prlic wrote: > Anybody who wants to submit the BioJava abstract for the BOSC meeting this year? > > - I will be in Vienna and can help, however I will be attending the > 3D-SIG meeting which is at the same time... > > Andreas > > On Fri, Mar 4, 2011 at 4:37 AM, Brad Chapman wrote: >> We invite you to submit an abstract to BOSC 2011! Please forward this >> message as appropriate, and forgive multiple postings. >> >> Call for Abstracts for the 12th Annual Bioinformatics Open Source Conference (BOSC 2011) >> An ISMB 2011 Special Interest Group (SIG) >> >> Dates: July 15-16, 2011 >> Location: Vienna, Austria >> Web site: http://www.open-bio.org/wiki/BOSC_2011 >> Email: bosc at open-bio.org >> BOSC announcements mailing list: http://lists.open-bio.org/mailman/listinfo/bosc-announce >> >> Important Dates: >> April 18, 2011: Deadline for submitting abstracts to BOSC 2011 >> May 9, 2011: Notifications of accepted abstracts emailed to corresponding authors >> July 13-14, 2011: Codefest 2011 programming session (see http://www.open-bio.org/wiki/Codefest_2011 for details) >> July 15-16, 2011: BOSC 2011 >> July 17-19, 2011: ISMB 2011 >> >> The Bioinformatics Open Source Conference (BOSC) is sponsored by the >> Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated >> to promoting the practice and philosophy of Open Source software >> development within the biological research community. To be considered >> for acceptance, software systems representing the central topic in a >> presentation submitted to BOSC must be licensed with a recognized Open >> Source License, and be freely available for download in source code >> form. >> >> We invite you to submit abstracts for talks and posters. Sessions include: >> - Approaches to parallel processing >> - Cloud-based approaches to improving software and data accessibility >> - The Semantic Web in open source bioinformatics >> - Data visualization >> - Tools for next-generation sequencing >> - Other Open Source software >> >> In addition to the above sessions, there will be a panel discussion >> about "Meeting the challenges of inter-institutional collaboration". We >> are also working to arrange a joint session with one of the other ISMB >> SIGs. >> >> Thanks to generous sponsorship from Eagle Genomics and an anonymous >> donor, we are pleased to announce a competition for three Student Travel >> Awards for BOSC 2011. Each winner will be awarded $250 to defray the >> costs of travel to BOSC 2011. >> >> For instructions on submitting your abstract, please visit >> http://www.open-bio.org/wiki/BOSC_2011#Abstract_Submission_Information >> >> BOSC 2011 Organizing Committee: >> Nomi Harris and Peter Rice (co-chairs); Brad Chapman, Peter Cock, Erwin Frise, Darin London, Ron Taylor >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From gwaldon at geneinfinity.org Fri Mar 4 17:16:25 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Fri, 04 Mar 2011 11:16:25 -0600 Subject: [Biojava-l] read DBLINK field from genbank file In-Reply-To: References: Message-ID: <20110304111625.100050dk9yj9m6ck@gator1273.hostgator.com> Hi Bart, DBLINK is a recent addition to Genbank format and unfortunately bj1 parser does not read it. You can check yourself in org.biojavax.bio.seq.io.GenbankFormat. Regards, George Quoting Bart Mesuere : > Hi, > > I'm trying to read some genbank files using the 1.7 legacy code. All works > fine but I'm having trouble extracting the projectid located in the DBLINK > field (in bold in the example): > > LOCUS NC_009926 374161 bp DNA circular BCT >> 25-JUL-2008 >> DEFINITION Acaryochloris marina MBIC11017 plasmid pREB1, complete >> sequence. >> ACCESSION NC_009926 >> VERSION NC_009926.1 GI:158339488 >> *DBLINK Project: 58167* > > > I already inspected the SimpleRichSequence object with a debugger but > couldn't find anything useful. Is it possible to read this field using > biojava 1.7? > > Kind regards, > Bart Mesuere > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From hlapp at drycafe.net Fri Mar 4 23:26:25 2011 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 4 Mar 2011 18:26:25 -0500 Subject: [Biojava-l] Informatics job opportunity at NESCent Message-ID: <1878F27F-000D-4C80-B9EA-A83F7887828F@drycafe.net> (Apologies if you receive multiple copies, and also if you are not interested in job opportunities. In my defense, quite a few people on Bio* lists might qualify for (let alone enjoy) the position. And if you know someone who might be interested please forward.) =================================================== User Interface Design and Web Application Developer =================================================== The National Evolutionary Synthesis Center (NESCent) seeks a creative and enthusiastic individual to design user interfaces and web applications for scientific applications. The incumbent will work as part of a small informatics team in close collaboration with domain scientists. NESCent is an NSF-funded center dedicated to cross-disciplinary research in evolutionary science. Our informatics team works closely with visiting and resident scientists to support their custom software and database development needs. All NESCent software products are open- source, and the Center has a number of initiatives to actively promote collaborative development of community software resources (informatics.nescent.org). Above all, we are enthusiastic about our work, about the mission of the Center, and about the contribution of informatics to that mission. Job description: The incumbent will design and develop user interfaces and web applications for databases and other software tools for sponsored scientists and staff. The job responsibilities include all stages of the software development process, including requirements gathering, design, implementation, release packaging and documentation, as part of a small team (typically 2-3 individuals) following project management best practices. We expect the incumbent to present their work at conferences and contribute to publications with scientific collaborators; interact regularly with visiting and resident scientists, other members of the informatics team and Center staff; and generally serve as an expert resource for Center personnel. The position provides opportunities for professional development. Most informatics staff work at our Durham NC offices, located adjacent to Duke University, but we do support a wide range of technologies for virtual communication with off-site staff and collaborators. Required Qualifications: * Demonstrated success collaborating with clients on custom software solutions * Experience with various stages of the software development cycle * Expertise in development and testing of user interface designs * Excellent communication skills, both virtual and face-to-face * A four-year college degree in Computer Science, Bioinformatics or a related field Preferred Qualifications: * M.S. or Ph.D. in Computer Science, Bioinformatics or related field along with demonstrated interest in science, particularly biology * Expertise in rapid application development and respective programming technologies and languages (e.g., modern scripting languages and web-application frameworks such as Python/Django, Ruby/ Ruby-on-Rails, and Perl/Catalyst), fluency in Java programming, and prior experience in relational database programming (PostgreSQL or MySQL) * Expertise in dynamic and interactive web technologies (JavaScript, CGI), web service (SOAP, REST, XML, JSON) and semantic web technologies * Experience with open-source, and collaborative, software development, software usability design and assessment * Expertise in graphic design, data visualization and/or scientific data integration How to apply: Please send cover letter, resume and contact information for three references to Dr. Karen Cranston, Training Coordinator and Bioinformatics Project Manager (karen.cranston at nescent.org). Review of applications will begin March 21, 2011. Informal inquires or requests for additional information may be directed to Dr. Cranston by email or phone (+1-919-613-2275). -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From andreas at sdsc.edu Sat Mar 5 21:56:40 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 5 Mar 2011 13:56:40 -0800 Subject: [Biojava-l] biojava wiki Message-ID: Hi, In order to prevent our wiki from being spammed, there is now a new plugin being used to block out bots. Let us know if you notice any problems when signing up or logging into your accounts... Andreas From jayunit100 at gmail.com Mon Mar 7 01:54:58 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Sun, 6 Mar 2011 20:54:58 -0500 Subject: [Biojava-l] DBREF parsing exception... Message-ID: Hi guys : It looks like the PDB parser for biojava is tripping up on the DBREFs for pdb id 3O62 . Its a little bit of a problem for me because a bunch of exception stack traces are getting streamed to the screen and making it difficult for me to debug my code.... Is there a way to disable the reading of DBREF lines, or alternatively, is there a way to fix the exception ? badly formatted line ... DBREF 3O62 A 1 135 UNP P84233 H32_XENLA 2 136 java.lang.StringIndexOutOfBoundsException: String index out of range: 68 at java.lang.String.substring(String.java:1934) at org.biojava.bio.structure.io.PDBFileParser.pdb_DBREF_Handler(PDBFileParser.java:1979) at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2413) -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Mon Mar 7 05:15:06 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 6 Mar 2011 21:15:06 -0800 Subject: [Biojava-l] DBREF parsing exception... In-Reply-To: References: Message-ID: Hi Jay, are you using the version from SVN or a particular release? I think this is already fixed in SVN... Andreas On Sun, Mar 6, 2011 at 5:54 PM, Jay Vyas wrote: > Hi guys : It looks like the PDB parser for biojava is tripping up on the > DBREFs for pdb id 3O62 . ?Its a little bit of a problem for me because > a bunch of exception stack traces are getting streamed to the screen and > making it difficult for me to debug my code.... > > Is there a way to disable the reading of DBREF lines, or alternatively, is > there > a way to fix the exception ? > > badly formatted line ... DBREF ?3O62 A ? ?1 ? 135 ?UNP ? ?P84233 > H32_XENLA ? ? ? ?2 ? ?136 > java.lang.StringIndexOutOfBoundsException: String index out of range: 68 > ? ?at java.lang.String.substring(String.java:1934) > ? ?at > org.biojava.bio.structure.io.PDBFileParser.pdb_DBREF_Handler(PDBFileParser.java:1979) > ? ?at > org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2413) > > -- > Jay Vyas > MMSB/UCHC > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From rmb32 at cornell.edu Mon Mar 7 16:37:29 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 07 Mar 2011 11:37:29 -0500 Subject: [Biojava-l] Google Summer of Code project ideas Message-ID: <4D7509C9.5090008@cornell.edu> Hi all, I'm going to be OBF project admin again this year for Google Summer of code. OBF's application is due later this week, and we need to update our project ideas on the OBF wiki page and on each project's individual wiki pages. So, for each of the OBF projects that wants to do GSoC again this year, please: a.) Update the list of project ideas on your project's GSoC page (BioPython, BioPerl, BioRuby, etc). Add new ones, remove ones that have already been done or no longer relevant, etc. b.) Update the list of project ideas on the main OBF GSoC page (http://www.open-bio.org/wiki/Google_Summer_of_Code) to match. c.) Let me know via email that you have done so and it's ready for Google to peruse. Please have the updates done, if possible, by this Friday (March 11). The number and quality of the project ideas are part of the evaluation process for whether OBF is accepted as a Summer of Code organization again this year, so let's come up with some good ones. :-) Rob ---- Robert Buels (prospective) 2011 OBF GSoC Organization Admin From jayunit100 at gmail.com Mon Mar 7 18:11:04 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Mon, 7 Mar 2011 13:11:04 -0500 Subject: [Biojava-l] New Maven Pom issue Message-ID: Hi guys : I tried to update my biojava pom , and it looks like I missed something. In particular the following biojava-core classes are missing. import org.biojava.bio.BioException; import org.biojava.bio.proteomics.IsoelectricPointCalc; import org.biojava.bio.proteomics.MassCalc; import org.biojava.bio.seq.ProteinTools; Heres the way my new pom looks : I think there is a problem with biojava-core ? org.biojava biojava3-alignment 3.0.1 compile org.biojava biojava3-core 3.0.1 org.biojava biojava3-protmod 3.0.1 compile From willishf at ufl.edu Mon Mar 7 18:37:26 2011 From: willishf at ufl.edu (Scooter Willis) Date: Mon, 7 Mar 2011 13:37:26 -0500 Subject: [Biojava-l] New Maven Pom issue In-Reply-To: References: Message-ID: I think those are all imports from biojava 1.X. We tried to use org.biojava3 for packages where code is in biojava3. On Mon, Mar 7, 2011 at 1:11 PM, Jay Vyas wrote: > Hi guys : I tried to update my biojava pom , and it looks like I missed > something. > > In particular the following biojava-core classes are missing. > import org.biojava.bio.BioException; > import org.biojava.bio.proteomics.IsoelectricPointCalc; > import org.biojava.bio.proteomics.MassCalc; > import org.biojava.bio.seq.ProteinTools; > > Heres the way my new pom looks : I think there is a problem with > biojava-core ? > > > ? ? ? ? ? ? ? ?org.biojava > ? ? ? ? ? ? ? ?biojava3-alignment > ? ? ? ? ? ? ? ?3.0.1 > ? ? ? ? ? ? ? ?compile > ? ? ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? ? ? ? ?org.biojava > ? ? ? ? ? ? ? ?biojava3-core > ? ? ? ? ? ? ? ?3.0.1 > ? ? ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? ? ? ? ?org.biojava > ? ? ? ? ? ? ? ?biojava3-protmod > ? ? ? ? ? ? ? ?3.0.1 > ? ? ? ? ? ? ? ?compile > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From p.v.troshin at dundee.ac.uk Tue Mar 8 11:15:36 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 08 Mar 2011 11:15:36 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> Message-ID: <4D760FD8.2010002@dundee.ac.uk> Hi guys, Follow the invitation from Robert, I now registered this idea on the GSoC page for BioJava http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals I wonder if anyone of you fancy co-mentoring a student? It would be good to have someone with up-to-date knowledge of BioJava to ensure that all the appropriate data structures are used. My own knowledge of BioJava is a bit rusty. Kind regards, Peter On 02/03/2011 05:12, Andreas Prlic wrote: > Hi Peter, > > we still don;t know yet if we will have support from Google again this > year. Once we have a confirmation we will use the wiki site again for > hosting pages related to GSoC. However we should do this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin wrote: >>>>> What other functionality would you >>>>> like to see that is currently not there? >> >> I think that the methods below would be a good starting point, then the >> Google Summer of Code student can propose something else that he/she would >> fancy implementing. >> >> Molecular weight >> Extinction coefficient >> Instability index >> Aliphatic index >> Grand Average of Hydropathy >> Isoelectric point >> Number of amino acids in the protein (His, Met, Cys) >> >> I know BioJava projects were managed under Open Bioinformatics Foundation >> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >> somewhere? >> >> Regards, >> Peter >> >> >> On 25/02/2011 05:12, Andreas Prlic wrote: >>> Great, seems we have an agreement that we want to improve >>> functionality for this. How complex is this going to be? From quickly >>> checking the 1.8 source it looks like just a few classes that need to >>> be converted and not too painful. What other functionality would you >>> like to see that is currently not there? >>> >>> Andreas >>> >>> >>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis wrote: >>>> We put in some basics regarding modeling amino acid properties in the >>>> core module but really didn't have any pressing use cases to drive the >>>> api beyond calculating the mass of a peptide. We currently have >>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>> get the attributes/features of amino acids properly modeled in core >>>> and extend when reasonable useful summary methods at higher levels. >>>> You should be able to query mass of a peptide and have it valid for an >>>> amino acid with a PTM which means the amino acid needs to support the >>>> ability to be modified in a flexible manner. I spent the last year+ >>>> developing a software suite for peptide detection in MS data for >>>> deuterium exchange where automated PTM detection was important. Would >>>> be great to get some focused attention on the core to make sure we can >>>> model nucleotides and amino acids with a chemistry friendly API. >>>> >>>> Thanks >>>> >>>> Scooter >>>> >>>> On Thu, Feb 24, 2011 at 2:15 PM, George Waldon >>>> wrote: >>>>> Hello Peter& Andreas >>>>> >>>>> I effectively did some work on these methods, mostly fixing and adding >>>>> the >>>>> ExPASy algorithm that was kindly provided to me. I think it makes a lot >>>>> of >>>>> sense to port all physico-chemical property calculations related to >>>>> amino >>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>> definitively >>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>> this is >>>>> done. Let me know how I could help. >>>>> >>>>> Thanks >>>>> George >>>>> >>>>> Quoting Peter Troshin: >>>>> >>>>>> Hi Andreas, >>>>>> >>>>>> In fact I'd be happy to help with the development of the tools for >>>>>> simple >>>>>> physico-chemical properties calculation for peptides. We could port >>>>>> George?s >>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can also >>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>> work >>>>>> would have benefited from having these calculations readily available. >>>>>> >>>>>> I was thinking about participation in the Google Summer of Code (GoSC) >>>>>> this year as a mentor, and I think this would be an easy project for a >>>>>> student. What do you think about this? >>>>>> >>>>>> Thank you for your prompt reply. >>>>>> >>>>>> Regards, >>>>>> Peter >>>>>> >>>>>> >>>>>> >>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>> Hi Peter, >>>>>>> >>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>> like to port this to biojava 3 as well.. George do you want to help me >>>>>>> with that, since you are one of the authors of this package? The basic >>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>> class) >>>>>>> >>>>>>> Andreas >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>> Troshin >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>> org.biojava.bio.proteomics package, which had methods for isoelectric >>>>>>>> point >>>>>>>> and molecular weight calculations for peptides. I could not find this >>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>> wonder >>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>> of >>>>>>>> BioJava? >>>>>>>> >>>>>>>> Thank you for your help, >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Peter >>>>>>>> >>>>>>>> Dr Peter Troshin >>>>>>>> Bioinformatics Software Developer >>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>> The Barton Group >>>>>>>> College of Life Sciences >>>>>>>> Medical Sciences Institute >>>>>>>> University of Dundee >>>>>>>> Dundee >>>>>>>> DD1 5EH >>>>>>>> UK >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>> >> > > From willishf at ufl.edu Tue Mar 8 11:44:59 2011 From: willishf at ufl.edu (Scooter Willis) Date: Tue, 8 Mar 2011 06:44:59 -0500 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D760FD8.2010002@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> Message-ID: Peter Happy to co-mentor and make sure everything gets integrated properly into either core or another module. Thanks Scooter On Tue, Mar 8, 2011 at 6:15 AM, Peter Troshin wrote: > Hi guys, > > Follow the invitation from Robert, I now registered this idea on the GSoC > page for BioJava > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > I wonder if anyone of you fancy co-mentoring a student? > It would be good to have someone with up-to-date knowledge of BioJava to > ensure that all the appropriate data structures are used. My own knowledge > of BioJava is a bit rusty. > > Kind regards, > Peter > > > On 02/03/2011 05:12, Andreas Prlic wrote: >> >> Hi Peter, >> >> we still don;t know yet if we will have support from Google again this >> year. Once we have a confirmation we will use the wiki site again for >> hosting pages related to GSoC. However we should do this project in >> any case... >> >> Andreas >> >> On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin >> ?wrote: >>>>>> >>>>>> What other functionality would you >>>>>> like to see that is currently not there? >>> >>> I think that the methods below would be a good starting point, then the >>> Google Summer of Code student can propose something else that he/she >>> would >>> fancy implementing. >>> >>> ?Molecular weight >>> ?Extinction coefficient >>> ?Instability index >>> ?Aliphatic index >>> ?Grand Average of Hydropathy >>> ?Isoelectric point >>> ?Number of amino acids in the protein (His, Met, Cys) >>> >>> I know BioJava projects were managed under Open Bioinformatics Foundation >>> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >>> somewhere? >>> >>> Regards, >>> Peter >>> >>> >>> On 25/02/2011 05:12, Andreas Prlic wrote: >>>> >>>> Great, seems we have an agreement that we want to improve >>>> functionality for this. How complex is this going to be? From quickly >>>> checking the 1.8 source it looks like just a few classes that need to >>>> be converted and not too painful. ?What other functionality would you >>>> like to see that is currently not there? >>>> >>>> Andreas >>>> >>>> >>>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis >>>> ?wrote: >>>>> >>>>> We put in some basics regarding modeling amino acid properties in the >>>>> core module but really didn't have any pressing use cases to drive the >>>>> api beyond calculating the mass of a peptide. We currently have >>>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>>> get the attributes/features of amino acids properly modeled in core >>>>> and extend when reasonable useful summary methods at higher levels. >>>>> You should be able to query mass of a peptide and have it valid for an >>>>> amino acid with a PTM which means the amino acid needs to support the >>>>> ability to be modified in a flexible manner. I spent the last year+ >>>>> developing a software suite for peptide detection in MS data for >>>>> deuterium exchange where automated PTM detection was important. Would >>>>> be great to get some focused attention on the core to make sure we can >>>>> model nucleotides and amino acids with a chemistry friendly API. >>>>> >>>>> Thanks >>>>> >>>>> Scooter >>>>> >>>>> On Thu, Feb 24, 2011 at 2:15 PM, George >>>>> Waldon >>>>> ?wrote: >>>>>> >>>>>> Hello Peter& ? ?Andreas >>>>>> >>>>>> I effectively did some work on these methods, mostly fixing and adding >>>>>> the >>>>>> ExPASy algorithm that was kindly provided to me. I think it makes a >>>>>> lot >>>>>> of >>>>>> sense to port all physico-chemical property calculations related to >>>>>> amino >>>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>>> definitively >>>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>>> this is >>>>>> done. Let me know how I could help. >>>>>> >>>>>> Thanks >>>>>> George >>>>>> >>>>>> Quoting Peter Troshin: >>>>>> >>>>>>> Hi Andreas, >>>>>>> >>>>>>> In fact I'd be happy to help with the development of the tools for >>>>>>> simple >>>>>>> physico-chemical properties calculation for peptides. We could port >>>>>>> George?s >>>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can >>>>>>> also >>>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>>> work >>>>>>> would have benefited from having these calculations readily >>>>>>> available. >>>>>>> >>>>>>> I was thinking about participation in the Google Summer of Code >>>>>>> (GoSC) >>>>>>> this year as a mentor, and I think this would be an easy project for >>>>>>> a >>>>>>> student. What do you think about this? >>>>>>> >>>>>>> Thank you for your prompt reply. >>>>>>> >>>>>>> Regards, >>>>>>> Peter >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>>> >>>>>>>> Hi Peter, >>>>>>>> >>>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>>> like to port this to biojava 3 as well.. George do you want to help >>>>>>>> me >>>>>>>> with that, since you are one of the authors of this package? The >>>>>>>> basic >>>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>>> class) >>>>>>>> >>>>>>>> Andreas >>>>>>>> >>>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>>> Troshin >>>>>>>> ?wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>>> org.biojava.bio.proteomics package, which had methods for >>>>>>>>> isoelectric >>>>>>>>> point >>>>>>>>> and molecular weight calculations for peptides. I could not find >>>>>>>>> this >>>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>>> wonder >>>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>>> of >>>>>>>>> BioJava? >>>>>>>>> >>>>>>>>> Thank you for your help, >>>>>>>>> >>>>>>>>> Kind regards, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> Dr Peter Troshin >>>>>>>>> Bioinformatics Software Developer >>>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>>> The Barton Group >>>>>>>>> College of Life Sciences >>>>>>>>> Medical Sciences Institute >>>>>>>>> University of Dundee >>>>>>>>> Dundee >>>>>>>>> DD1 5EH >>>>>>>>> UK >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>> >>>> >>> >> >> > > > From p.v.troshin at dundee.ac.uk Tue Mar 8 13:08:22 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 08 Mar 2011 13:08:22 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> Message-ID: <4D762A46.3090204@dundee.ac.uk> Hi Scooter, Great! Please feel free to update the proposal page accordingly! http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals Regards, Peter On 08/03/2011 11:44, Scooter Willis wrote: > Peter > > Happy to co-mentor and make sure everything gets integrated properly > into either core or another module. > > Thanks > > Scooter > > On Tue, Mar 8, 2011 at 6:15 AM, Peter Troshin wrote: >> Hi guys, >> >> Follow the invitation from Robert, I now registered this idea on the GSoC >> page for BioJava >> >> http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals >> >> I wonder if anyone of you fancy co-mentoring a student? >> It would be good to have someone with up-to-date knowledge of BioJava to >> ensure that all the appropriate data structures are used. My own knowledge >> of BioJava is a bit rusty. >> >> Kind regards, >> Peter >> >> >> On 02/03/2011 05:12, Andreas Prlic wrote: >>> Hi Peter, >>> >>> we still don;t know yet if we will have support from Google again this >>> year. Once we have a confirmation we will use the wiki site again for >>> hosting pages related to GSoC. However we should do this project in >>> any case... >>> >>> Andreas >>> >>> On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin >>> wrote: >>>>>>> What other functionality would you >>>>>>> like to see that is currently not there? >>>> I think that the methods below would be a good starting point, then the >>>> Google Summer of Code student can propose something else that he/she >>>> would >>>> fancy implementing. >>>> >>>> Molecular weight >>>> Extinction coefficient >>>> Instability index >>>> Aliphatic index >>>> Grand Average of Hydropathy >>>> Isoelectric point >>>> Number of amino acids in the protein (His, Met, Cys) >>>> >>>> I know BioJava projects were managed under Open Bioinformatics Foundation >>>> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >>>> somewhere? >>>> >>>> Regards, >>>> Peter >>>> >>>> >>>> On 25/02/2011 05:12, Andreas Prlic wrote: >>>>> Great, seems we have an agreement that we want to improve >>>>> functionality for this. How complex is this going to be? From quickly >>>>> checking the 1.8 source it looks like just a few classes that need to >>>>> be converted and not too painful. What other functionality would you >>>>> like to see that is currently not there? >>>>> >>>>> Andreas >>>>> >>>>> >>>>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis >>>>> wrote: >>>>>> We put in some basics regarding modeling amino acid properties in the >>>>>> core module but really didn't have any pressing use cases to drive the >>>>>> api beyond calculating the mass of a peptide. We currently have >>>>>> getMolecularWeight() as a method in AbstractCompound but never added a >>>>>> getSequenceMolecularWeight() to AbstractSequence. It would be great to >>>>>> get the attributes/features of amino acids properly modeled in core >>>>>> and extend when reasonable useful summary methods at higher levels. >>>>>> You should be able to query mass of a peptide and have it valid for an >>>>>> amino acid with a PTM which means the amino acid needs to support the >>>>>> ability to be modified in a flexible manner. I spent the last year+ >>>>>> developing a software suite for peptide detection in MS data for >>>>>> deuterium exchange where automated PTM detection was important. Would >>>>>> be great to get some focused attention on the core to make sure we can >>>>>> model nucleotides and amino acids with a chemistry friendly API. >>>>>> >>>>>> Thanks >>>>>> >>>>>> Scooter >>>>>> >>>>>> On Thu, Feb 24, 2011 at 2:15 PM, George >>>>>> Waldon >>>>>> wrote: >>>>>>> Hello Peter& Andreas >>>>>>> >>>>>>> I effectively did some work on these methods, mostly fixing and adding >>>>>>> the >>>>>>> ExPASy algorithm that was kindly provided to me. I think it makes a >>>>>>> lot >>>>>>> of >>>>>>> sense to port all physico-chemical property calculations related to >>>>>>> amino >>>>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>>>> definitively >>>>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>>>> this is >>>>>>> done. Let me know how I could help. >>>>>>> >>>>>>> Thanks >>>>>>> George >>>>>>> >>>>>>> Quoting Peter Troshin: >>>>>>> >>>>>>>> Hi Andreas, >>>>>>>> >>>>>>>> In fact I'd be happy to help with the development of the tools for >>>>>>>> simple >>>>>>>> physico-chemical properties calculation for peptides. We could port >>>>>>>> George?s >>>>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can >>>>>>>> also >>>>>>>> provide a few other methods. A couple of projects in the lab where I >>>>>>>> work >>>>>>>> would have benefited from having these calculations readily >>>>>>>> available. >>>>>>>> >>>>>>>> I was thinking about participation in the Google Summer of Code >>>>>>>> (GoSC) >>>>>>>> this year as a mentor, and I think this would be an easy project for >>>>>>>> a >>>>>>>> student. What do you think about this? >>>>>>>> >>>>>>>> Thank you for your prompt reply. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Peter >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>>>> Hi Peter, >>>>>>>>> >>>>>>>>> if you get a copy of biojava 1.8, it is still there. However I would >>>>>>>>> like to port this to biojava 3 as well.. George do you want to help >>>>>>>>> me >>>>>>>>> with that, since you are one of the authors of this package? The >>>>>>>>> basic >>>>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. Element >>>>>>>>> class) >>>>>>>>> >>>>>>>>> Andreas >>>>>>>>> >>>>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>>>> Troshin >>>>>>>>> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>>>> org.biojava.bio.proteomics package, which had methods for >>>>>>>>>> isoelectric >>>>>>>>>> point >>>>>>>>>> and molecular weight calculations for peptides. I could not find >>>>>>>>>> this >>>>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods and >>>>>>>>>> wonder >>>>>>>>>> if there are any equivalent methods available in the latest version >>>>>>>>>> of >>>>>>>>>> BioJava? >>>>>>>>>> >>>>>>>>>> Thank you for your help, >>>>>>>>>> >>>>>>>>>> Kind regards, >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>> Dr Peter Troshin >>>>>>>>>> Bioinformatics Software Developer >>>>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>>>> The Barton Group >>>>>>>>>> College of Life Sciences >>>>>>>>>> Medical Sciences Institute >>>>>>>>>> University of Dundee >>>>>>>>>> Dundee >>>>>>>>>> DD1 5EH >>>>>>>>>> UK >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>> >>> >> >> From uchathuranga at gmail.com Tue Mar 8 13:31:55 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Tue, 8 Mar 2011 19:01:55 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D762A46.3090204@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> Message-ID: Hi Peter, On Tue, Mar 8, 2011 at 6:38 PM, Peter Troshin wrote: > Hi Scooter, > > Great! Please feel free to update the proposal page accordingly! > > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > Regards, > Peter > > > > On 08/03/2011 11:44, Scooter Willis wrote: > >> Peter >> >> Happy to co-mentor and make sure everything gets integrated properly >> into either core or another module. >> >> Thanks >> >> Scooter >> >> On Tue, Mar 8, 2011 at 6:15 AM, Peter Troshin >> wrote: >> >>> Hi guys, >>> >>> Follow the invitation from Robert, I now registered this idea on the GSoC >>> page for BioJava >>> >>> http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals >>> >>> I wonder if anyone of you fancy co-mentoring a student? >>> It would be good to have someone with up-to-date knowledge of BioJava to >>> ensure that all the appropriate data structures are used. My own >>> knowledge >>> of BioJava is a bit rusty. >>> >>> Kind regards, >>> Peter >>> >>> >>> On 02/03/2011 05:12, Andreas Prlic wrote: >>> >>>> Hi Peter, >>>> >>>> we still don;t know yet if we will have support from Google again this >>>> year. Once we have a confirmation we will use the wiki site again for >>>> hosting pages related to GSoC. However we should do this project in >>>> any case... >>>> >>>> Andreas >>>> >>>> On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin>>> > >>>> wrote: >>>> >>>>> What other functionality would you >>>>>>>> like to see that is currently not there? >>>>>>>> >>>>>>> I think that the methods below would be a good starting point, then >>>>> the >>>>> Google Summer of Code student can propose something else that he/she >>>>> would >>>>> fancy implementing. >>>>> >>>>> Molecular weight >>>>> Extinction coefficient >>>>> Instability index >>>>> Aliphatic index >>>>> Grand Average of Hydropathy >>>>> Isoelectric point >>>>> Number of amino acids in the protein (His, Met, Cys) >>>>> >>>>> I know BioJava projects were managed under Open Bioinformatics >>>>> Foundation >>>>> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >>>>> somewhere? >>>>> >>>>> Regards, >>>>> Peter >>>>> >>>>> >>>>> On 25/02/2011 05:12, Andreas Prlic wrote: >>>>> >>>>>> Great, seems we have an agreement that we want to improve >>>>>> functionality for this. How complex is this going to be? From quickly >>>>>> checking the 1.8 source it looks like just a few classes that need to >>>>>> be converted and not too painful. What other functionality would you >>>>>> like to see that is currently not there? >>>>>> >>>>>> Andreas >>>>>> >>>>>> >>>>>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis >>>>>> wrote: >>>>>> >>>>>>> We put in some basics regarding modeling amino acid properties in the >>>>>>> core module but really didn't have any pressing use cases to drive >>>>>>> the >>>>>>> api beyond calculating the mass of a peptide. We currently have >>>>>>> getMolecularWeight() as a method in AbstractCompound but never added >>>>>>> a >>>>>>> getSequenceMolecularWeight() to AbstractSequence. It would be great >>>>>>> to >>>>>>> get the attributes/features of amino acids properly modeled in core >>>>>>> and extend when reasonable useful summary methods at higher levels. >>>>>>> You should be able to query mass of a peptide and have it valid for >>>>>>> an >>>>>>> amino acid with a PTM which means the amino acid needs to support the >>>>>>> ability to be modified in a flexible manner. I spent the last year+ >>>>>>> developing a software suite for peptide detection in MS data for >>>>>>> deuterium exchange where automated PTM detection was important. Would >>>>>>> be great to get some focused attention on the core to make sure we >>>>>>> can >>>>>>> model nucleotides and amino acids with a chemistry friendly API. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Scooter >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 2:15 PM, George >>>>>>> Waldon >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Peter& Andreas >>>>>>>> >>>>>>>> I effectively did some work on these methods, mostly fixing and >>>>>>>> adding >>>>>>>> the >>>>>>>> ExPASy algorithm that was kindly provided to me. I think it makes a >>>>>>>> lot >>>>>>>> of >>>>>>>> sense to port all physico-chemical property calculations related to >>>>>>>> amino >>>>>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>>>>> definitively >>>>>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>>>>> this is >>>>>>>> done. Let me know how I could help. >>>>>>>> >>>>>>>> Thanks >>>>>>>> George >>>>>>>> >>>>>>>> Quoting Peter Troshin: >>>>>>>> >>>>>>>> Hi Andreas, >>>>>>>>> >>>>>>>>> In fact I'd be happy to help with the development of the tools for >>>>>>>>> simple >>>>>>>>> physico-chemical properties calculation for peptides. We could port >>>>>>>>> George?s >>>>>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can >>>>>>>>> also >>>>>>>>> provide a few other methods. A couple of projects in the lab where >>>>>>>>> I >>>>>>>>> work >>>>>>>>> would have benefited from having these calculations readily >>>>>>>>> available. >>>>>>>>> >>>>>>>>> I was thinking about participation in the Google Summer of Code >>>>>>>>> (GoSC) >>>>>>>>> this year as a mentor, and I think this would be an easy project >>>>>>>>> for >>>>>>>>> a >>>>>>>>> student. What do you think about this? >>>>>>>>> >>>>>>>>> Thank you for your prompt reply. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>>>> >>>>>>>>>> Hi Peter, >>>>>>>>>> >>>>>>>>>> if you get a copy of biojava 1.8, it is still there. However I >>>>>>>>>> would >>>>>>>>>> like to port this to biojava 3 as well.. George do you want to >>>>>>>>>> help >>>>>>>>>> me >>>>>>>>>> with that, since you are one of the authors of this package? The >>>>>>>>>> basic >>>>>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. >>>>>>>>>> Element >>>>>>>>>> class) >>>>>>>>>> >>>>>>>>>> Andreas >>>>>>>>>> >>>>>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>>>>> Troshin >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>>>>> org.biojava.bio.proteomics package, which had methods for >>>>>>>>>>> isoelectric >>>>>>>>>>> point >>>>>>>>>>> and molecular weight calculations for peptides. I could not find >>>>>>>>>>> this >>>>>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods >>>>>>>>>>> and >>>>>>>>>>> wonder >>>>>>>>>>> if there are any equivalent methods available in the latest >>>>>>>>>>> version >>>>>>>>>>> of >>>>>>>>>>> BioJava? >>>>>>>>>>> >>>>>>>>>>> Thank you for your help, >>>>>>>>>>> >>>>>>>>>>> Kind regards, >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> Dr Peter Troshin >>>>>>>>>>> Bioinformatics Software Developer >>>>>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>>>>> The Barton Group >>>>>>>>>>> College of Life Sciences >>>>>>>>>>> Medical Sciences Institute >>>>>>>>>>> University of Dundee >>>>>>>>>>> Dundee >>>>>>>>>>> DD1 5EH >>>>>>>>>>> UK >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>>>>> >>>> >>> >>> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From uchathuranga at gmail.com Tue Mar 8 13:42:32 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Tue, 8 Mar 2011 19:12:32 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D762A46.3090204@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> Message-ID: Hi Peter, I am a student from university of moratuwa reading for my engineering degree in Computer Science Engineering and I am planning to participate in this year GSoC.I went through your project idea and sound like it perfect idea for me as I have done a bioinformatics course for my degree. What are the areas that I have study apart from the one you have mention in the idea?. Regards Udana On Tue, Mar 8, 2011 at 6:38 PM, Peter Troshin wrote: > Hi Scooter, > > Great! Please feel free to update the proposal page accordingly! > > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > Regards, > Peter > > > > On 08/03/2011 11:44, Scooter Willis wrote: > >> Peter >> >> Happy to co-mentor and make sure everything gets integrated properly >> into either core or another module. >> >> Thanks >> >> Scooter >> >> On Tue, Mar 8, 2011 at 6:15 AM, Peter Troshin >> wrote: >> >>> Hi guys, >>> >>> Follow the invitation from Robert, I now registered this idea on the GSoC >>> page for BioJava >>> >>> http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals >>> >>> I wonder if anyone of you fancy co-mentoring a student? >>> It would be good to have someone with up-to-date knowledge of BioJava to >>> ensure that all the appropriate data structures are used. My own >>> knowledge >>> of BioJava is a bit rusty. >>> >>> Kind regards, >>> Peter >>> >>> >>> On 02/03/2011 05:12, Andreas Prlic wrote: >>> >>>> Hi Peter, >>>> >>>> we still don;t know yet if we will have support from Google again this >>>> year. Once we have a confirmation we will use the wiki site again for >>>> hosting pages related to GSoC. However we should do this project in >>>> any case... >>>> >>>> Andreas >>>> >>>> On Mon, Feb 28, 2011 at 7:18 AM, Peter Troshin>>> > >>>> wrote: >>>> >>>>> What other functionality would you >>>>>>>> like to see that is currently not there? >>>>>>>> >>>>>>> I think that the methods below would be a good starting point, then >>>>> the >>>>> Google Summer of Code student can propose something else that he/she >>>>> would >>>>> fancy implementing. >>>>> >>>>> Molecular weight >>>>> Extinction coefficient >>>>> Instability index >>>>> Aliphatic index >>>>> Grand Average of Hydropathy >>>>> Isoelectric point >>>>> Number of amino acids in the protein (His, Met, Cys) >>>>> >>>>> I know BioJava projects were managed under Open Bioinformatics >>>>> Foundation >>>>> (OBF) during last years GSoC. Is there a page for this year GSoC ideas >>>>> somewhere? >>>>> >>>>> Regards, >>>>> Peter >>>>> >>>>> >>>>> On 25/02/2011 05:12, Andreas Prlic wrote: >>>>> >>>>>> Great, seems we have an agreement that we want to improve >>>>>> functionality for this. How complex is this going to be? From quickly >>>>>> checking the 1.8 source it looks like just a few classes that need to >>>>>> be converted and not too painful. What other functionality would you >>>>>> like to see that is currently not there? >>>>>> >>>>>> Andreas >>>>>> >>>>>> >>>>>> On Thu, Feb 24, 2011 at 8:08 PM, Scooter Willis >>>>>> wrote: >>>>>> >>>>>>> We put in some basics regarding modeling amino acid properties in the >>>>>>> core module but really didn't have any pressing use cases to drive >>>>>>> the >>>>>>> api beyond calculating the mass of a peptide. We currently have >>>>>>> getMolecularWeight() as a method in AbstractCompound but never added >>>>>>> a >>>>>>> getSequenceMolecularWeight() to AbstractSequence. It would be great >>>>>>> to >>>>>>> get the attributes/features of amino acids properly modeled in core >>>>>>> and extend when reasonable useful summary methods at higher levels. >>>>>>> You should be able to query mass of a peptide and have it valid for >>>>>>> an >>>>>>> amino acid with a PTM which means the amino acid needs to support the >>>>>>> ability to be modified in a flexible manner. I spent the last year+ >>>>>>> developing a software suite for peptide detection in MS data for >>>>>>> deuterium exchange where automated PTM detection was important. Would >>>>>>> be great to get some focused attention on the core to make sure we >>>>>>> can >>>>>>> model nucleotides and amino acids with a chemistry friendly API. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Scooter >>>>>>> >>>>>>> On Thu, Feb 24, 2011 at 2:15 PM, George >>>>>>> Waldon >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Peter& Andreas >>>>>>>> >>>>>>>> I effectively did some work on these methods, mostly fixing and >>>>>>>> adding >>>>>>>> the >>>>>>>> ExPASy algorithm that was kindly provided to me. I think it makes a >>>>>>>> lot >>>>>>>> of >>>>>>>> sense to port all physico-chemical property calculations related to >>>>>>>> amino >>>>>>>> acids and polypeptides to bj3, as suggested by Andreas, and I >>>>>>>> definitively >>>>>>>> support the effort. We could smoothly deprecate the bj1 package when >>>>>>>> this is >>>>>>>> done. Let me know how I could help. >>>>>>>> >>>>>>>> Thanks >>>>>>>> George >>>>>>>> >>>>>>>> Quoting Peter Troshin: >>>>>>>> >>>>>>>> Hi Andreas, >>>>>>>>> >>>>>>>>> In fact I'd be happy to help with the development of the tools for >>>>>>>>> simple >>>>>>>>> physico-chemical properties calculation for peptides. We could port >>>>>>>>> George?s >>>>>>>>> code (assuming he is happy with this) from BioJava 1.8 but we can >>>>>>>>> also >>>>>>>>> provide a few other methods. A couple of projects in the lab where >>>>>>>>> I >>>>>>>>> work >>>>>>>>> would have benefited from having these calculations readily >>>>>>>>> available. >>>>>>>>> >>>>>>>>> I was thinking about participation in the Google Summer of Code >>>>>>>>> (GoSC) >>>>>>>>> this year as a mentor, and I think this would be an easy project >>>>>>>>> for >>>>>>>>> a >>>>>>>>> student. What do you think about this? >>>>>>>>> >>>>>>>>> Thank you for your prompt reply. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 24/02/2011 16:54, Andreas Prlic wrote: >>>>>>>>> >>>>>>>>>> Hi Peter, >>>>>>>>>> >>>>>>>>>> if you get a copy of biojava 1.8, it is still there. However I >>>>>>>>>> would >>>>>>>>>> like to port this to biojava 3 as well.. George do you want to >>>>>>>>>> help >>>>>>>>>> me >>>>>>>>>> with that, since you are one of the authors of this package? The >>>>>>>>>> basic >>>>>>>>>> support for chemistry in BioJava 3 is a bit better... (e.g. >>>>>>>>>> Element >>>>>>>>>> class) >>>>>>>>>> >>>>>>>>>> Andreas >>>>>>>>>> >>>>>>>>>> On Thu, Feb 24, 2011 at 7:33 AM, Peter >>>>>>>>>> Troshin >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I've noticed that BioJava up to about version 1.7 had an >>>>>>>>>>> org.biojava.bio.proteomics package, which had methods for >>>>>>>>>>> isoelectric >>>>>>>>>>> point >>>>>>>>>>> and molecular weight calculations for peptides. I could not find >>>>>>>>>>> this >>>>>>>>>>> package in the BioJava 3.0.1 API. I?d like to use these methods >>>>>>>>>>> and >>>>>>>>>>> wonder >>>>>>>>>>> if there are any equivalent methods available in the latest >>>>>>>>>>> version >>>>>>>>>>> of >>>>>>>>>>> BioJava? >>>>>>>>>>> >>>>>>>>>>> Thank you for your help, >>>>>>>>>>> >>>>>>>>>>> Kind regards, >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> Dr Peter Troshin >>>>>>>>>>> Bioinformatics Software Developer >>>>>>>>>>> Phone: +44 (0)1382 388589 >>>>>>>>>>> Fax: +44 (0)1382 385764 >>>>>>>>>>> The Barton Group >>>>>>>>>>> College of Life Sciences >>>>>>>>>>> Medical Sciences Institute >>>>>>>>>>> University of Dundee >>>>>>>>>>> Dundee >>>>>>>>>>> DD1 5EH >>>>>>>>>>> UK >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>>>>> >>>> >>> >>> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From p.v.troshin at dundee.ac.uk Tue Mar 8 14:21:18 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 08 Mar 2011 14:21:18 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> Message-ID: <4D763B5E.5040700@dundee.ac.uk> Dear M. Rehan, I am happy to hear that you'd like to take my idea forward and I wish you the best of luck with your GSoC application. However please bear in mind that 1) OBF may not be accepted as a mentor organisation this year 2) my idea may not be funded even if the OBF will be accepted as a mentor organisation. 3) You as a student may not be accepted by Google (you have to make an application to them on your own) 4) You may not be the best candidate for the project 5) I have no say for the most of the above. I will be happy to have your as a student once we get to this stage but I feel that right now any requests for the supervision is a little preliminary. You can find out how to apply to GSoC here: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs Best of luck, Peter On 08/03/2011 13:54, M. Rehan Shaukat wrote: > Dear Peter, > > After participating in Google SoC 2010 last year I am looking forward > to participate again in SoC 2011. I was following this thread > regarding the "Amino acids physico-chemical properties calculation" > idea and I also read the details of GSoC page for this project idea. > This idea sounds very interesting to me and also matches my interests > and experience (Optimisation, Multi-threading, High Performance > Computing). I have passion for work and contribute in open source > projects. I am linked with Medical Research Council (Harwell, UK) and > contributing to Europhenome (An open > source system for handling large datasets and analysing as well as > annotating mouse phenotyping data) and EUMODIC > projects in collaboration with HMGU, > Germany; ICS, France;and the Sanger Institute, UK. > > During my Masters thesis, I worked on a project: "Using Cell > Processors to Speed up Phylogenetic Inference" that aimed on > optimising a compute-intensive Bioinformatics application on Cell > Broadband Engine using IBM Cell Broadband Engine SDK and MPI. I have > over 4 years of research+industrial software development experience. I > have worked on different programming languages (mainly: Java, C/C++, > PHP, XML) and variety of tools and frameworks (J2EE, JUnit, Hibernate, > Spring, JMS, JMX, CORBA, RMI, Eclipse, Netbeans, SVN, CVS and more). > > I am interested in working on this project under your supervision. I > have plenty of similar experience and would be grateful for your kind > supervision. > > Please find my CV attached. > > Thank you & Best Regards, > > Muhammad Rehan Shaukat > Bioinformatician > Medical Research Council, Harwell > Mammalian Genetics Unit > Harwell Science and Innovation Campus > Oxfordshire > OX11 0RD > www.har.mrc.ac.uk > > On 8 March 2011 13:08, Peter Troshin > wrote: > > Hi Scooter, > > Great! Please feel free to update the proposal page accordingly! > > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > Regards, > Peter > > > > On 08/03/2011 11:44, Scooter Willis wrote: > > Peter > > Happy to co-mentor and make sure everything gets integrated > properly > into either core or another module. > > Thanks > > Scooter > > On Tue, Mar 8, 2011 at 6:15 AM, Peter > Troshin > wrote: > > Hi guys, > > Follow the invitation from Robert, I now registered this > idea on the GSoC > page for BioJava > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > I wonder if anyone of you fancy co-mentoring a student? > It would be good to have someone with up-to-date knowledge > of BioJava to > ensure that all the appropriate data structures are used. > My own knowledge > of BioJava is a bit rusty. > > Kind regards, > Peter > > > On 02/03/2011 05:12, Andreas Prlic wrote: > > Hi Peter, > > we still don;t know yet if we will have support from > Google again this > year. Once we have a confirmation we will use the wiki > site again for > hosting pages related to GSoC. However we should do > this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter > Troshin > > wrote: > > What other functionality would you > like to see that is currently not there? > > I think that the methods below would be a good > starting point, then the > Google Summer of Code student can propose > something else that he/she > would > fancy implementing. > > Molecular weight > Extinction coefficient > Instability index > Aliphatic index > Grand Average of Hydropathy > Isoelectric point > Number of amino acids in the protein (His, Met, Cys) > > I know BioJava projects were managed under Open > Bioinformatics Foundation > (OBF) during last years GSoC. Is there a page for > this year GSoC ideas > somewhere? > > Regards, > Peter > > > On 25/02/2011 05:12, Andreas Prlic wrote: > > Great, seems we have an agreement that we want > to improve > functionality for this. How complex is this > going to be? From quickly > checking the 1.8 source it looks like just a > few classes that need to > be converted and not too painful. What other > functionality would you > like to see that is currently not there? > > Andreas > > > On Thu, Feb 24, 2011 at 8:08 PM, Scooter > Willis> > wrote: > > We put in some basics regarding modeling > amino acid properties in the > core module but really didn't have any > pressing use cases to drive the > api beyond calculating the mass of a > peptide. We currently have > getMolecularWeight() as a method in > AbstractCompound but never added a > getSequenceMolecularWeight() to > AbstractSequence. It would be great to > get the attributes/features of amino acids > properly modeled in core > and extend when reasonable useful summary > methods at higher levels. > You should be able to query mass of a > peptide and have it valid for an > amino acid with a PTM which means the > amino acid needs to support the > ability to be modified in a flexible > manner. I spent the last year+ > developing a software suite for peptide > detection in MS data for > deuterium exchange where automated PTM > detection was important. Would > be great to get some focused attention on > the core to make sure we can > model nucleotides and amino acids with a > chemistry friendly API. > > Thanks > > Scooter > > On Thu, Feb 24, 2011 at 2:15 PM, George > Waldon > > wrote: > > Hello Peter& Andreas > > I effectively did some work on these > methods, mostly fixing and adding > the > ExPASy algorithm that was kindly > provided to me. I think it makes a > lot > of > sense to port all physico-chemical > property calculations related to > amino > acids and polypeptides to bj3, as > suggested by Andreas, and I > definitively > support the effort. We could smoothly > deprecate the bj1 package when > this is > done. Let me know how I could help. > > Thanks > George > > Quoting Peter > Troshin >: > > Hi Andreas, > > In fact I'd be happy to help with > the development of the tools for > simple > physico-chemical properties > calculation for peptides. We could > port > George?s > code (assuming he is happy with > this) from BioJava 1.8 but we can > also > provide a few other methods. A > couple of projects in the lab where I > work > would have benefited from having > these calculations readily > available. > > I was thinking about participation > in the Google Summer of Code > (GoSC) > this year as a mentor, and I think > this would be an easy project for > a > student. What do you think about this? > > Thank you for your prompt reply. > > Regards, > Peter > > > > On 24/02/2011 16:54, Andreas Prlic > wrote: > > Hi Peter, > > if you get a copy of biojava > 1.8, it is still there. > However I would > like to port this to biojava 3 > as well.. George do you want > to help > me > with that, since you are one > of the authors of this > package? The > basic > support for chemistry in > BioJava 3 is a bit better... > (e.g. Element > class) > > Andreas > > On Thu, Feb 24, 2011 at 7:33 > AM, Peter > Troshin > > wrote: > > Hi, > > I've noticed that BioJava > up to about version 1.7 had an > org.biojava.bio.proteomics > package, which had methods for > isoelectric > point > and molecular weight > calculations for peptides. > I could not find > this > package in the BioJava > 3.0.1 API. I?d like to use > these methods and > wonder > if there are any > equivalent methods > available in the latest > version > of > BioJava? > > Thank you for your help, > > Kind regards, > Peter > > Dr Peter Troshin > Bioinformatics Software > Developer > Phone: +44 (0)1382 388589 > Fax: +44 (0)1382 385764 > The Barton Group > College of Life Sciences > Medical Sciences Institute > University of Dundee > Dundee > DD1 5EH > UK > > > > _______________________________________________ > Biojava-l mailing list - > Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - > Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From p.v.troshin at dundee.ac.uk Tue Mar 8 14:01:01 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 08 Mar 2011 14:01:01 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> Message-ID: <4D76369D.8060403@dundee.ac.uk> Hi Udana, I'd suggest looking at the BioJava 1.8 code for isoelectric point and molecular weight calculations for peptides, as this is something you would need to port to BioJava3. Study BioJava 3 API, in particular examine the data structures for representing peptides. For more ideas look at http://www.expasy.org/tools/ web site. Hope that helps. Good luck with your application! Regards, Peter On 08/03/2011 13:42, udana chathuranga wrote: > Hi Peter, > > I am a student from university of moratuwa reading for my engineering > degree in Computer Science Engineering and I am planning to > participate in this year GSoC.I went through your project idea and > sound like it perfect idea for me as I have done a bioinformatics > course for my degree. > > What are the areas that I have study apart from the one you have > mention in the idea?. > > Regards > Udana > > On Tue, Mar 8, 2011 at 6:38 PM, Peter Troshin > > wrote: > > Hi Scooter, > > Great! Please feel free to update the proposal page accordingly! > > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > Regards, > Peter > > > > On 08/03/2011 11:44, Scooter Willis wrote: > > Peter > > Happy to co-mentor and make sure everything gets integrated > properly > into either core or another module. > > Thanks > > Scooter > > On Tue, Mar 8, 2011 at 6:15 AM, Peter > Troshin > wrote: > > Hi guys, > > Follow the invitation from Robert, I now registered this > idea on the GSoC > page for BioJava > > http://biojava.org/wiki/Google_Summer_of_Code#Project_Proposals > > I wonder if anyone of you fancy co-mentoring a student? > It would be good to have someone with up-to-date knowledge > of BioJava to > ensure that all the appropriate data structures are used. > My own knowledge > of BioJava is a bit rusty. > > Kind regards, > Peter > > > On 02/03/2011 05:12, Andreas Prlic wrote: > > Hi Peter, > > we still don;t know yet if we will have support from > Google again this > year. Once we have a confirmation we will use the wiki > site again for > hosting pages related to GSoC. However we should do > this project in > any case... > > Andreas > > On Mon, Feb 28, 2011 at 7:18 AM, Peter > Troshin > > wrote: > > What other functionality would you > like to see that is currently not there? > > I think that the methods below would be a good > starting point, then the > Google Summer of Code student can propose > something else that he/she > would > fancy implementing. > > Molecular weight > Extinction coefficient > Instability index > Aliphatic index > Grand Average of Hydropathy > Isoelectric point > Number of amino acids in the protein (His, Met, Cys) > > I know BioJava projects were managed under Open > Bioinformatics Foundation > (OBF) during last years GSoC. Is there a page for > this year GSoC ideas > somewhere? > > Regards, > Peter > > > On 25/02/2011 05:12, Andreas Prlic wrote: > > Great, seems we have an agreement that we want > to improve > functionality for this. How complex is this > going to be? From quickly > checking the 1.8 source it looks like just a > few classes that need to > be converted and not too painful. What other > functionality would you > like to see that is currently not there? > > Andreas > > > On Thu, Feb 24, 2011 at 8:08 PM, Scooter > Willis> > wrote: > > We put in some basics regarding modeling > amino acid properties in the > core module but really didn't have any > pressing use cases to drive the > api beyond calculating the mass of a > peptide. We currently have > getMolecularWeight() as a method in > AbstractCompound but never added a > getSequenceMolecularWeight() to > AbstractSequence. It would be great to > get the attributes/features of amino acids > properly modeled in core > and extend when reasonable useful summary > methods at higher levels. > You should be able to query mass of a > peptide and have it valid for an > amino acid with a PTM which means the > amino acid needs to support the > ability to be modified in a flexible > manner. I spent the last year+ > developing a software suite for peptide > detection in MS data for > deuterium exchange where automated PTM > detection was important. Would > be great to get some focused attention on > the core to make sure we can > model nucleotides and amino acids with a > chemistry friendly API. > > Thanks > > Scooter > > On Thu, Feb 24, 2011 at 2:15 PM, George > Waldon > > wrote: > > Hello Peter& Andreas > > I effectively did some work on these > methods, mostly fixing and adding > the > ExPASy algorithm that was kindly > provided to me. I think it makes a > lot > of > sense to port all physico-chemical > property calculations related to > amino > acids and polypeptides to bj3, as > suggested by Andreas, and I > definitively > support the effort. We could smoothly > deprecate the bj1 package when > this is > done. Let me know how I could help. > > Thanks > George > > Quoting Peter > Troshin >: > > Hi Andreas, > > In fact I'd be happy to help with > the development of the tools for > simple > physico-chemical properties > calculation for peptides. We could > port > George?s > code (assuming he is happy with > this) from BioJava 1.8 but we can > also > provide a few other methods. A > couple of projects in the lab where I > work > would have benefited from having > these calculations readily > available. > > I was thinking about participation > in the Google Summer of Code > (GoSC) > this year as a mentor, and I think > this would be an easy project for > a > student. What do you think about this? > > Thank you for your prompt reply. > > Regards, > Peter > > > > On 24/02/2011 16:54, Andreas Prlic > wrote: > > Hi Peter, > > if you get a copy of biojava > 1.8, it is still there. > However I would > like to port this to biojava 3 > as well.. George do you want > to help > me > with that, since you are one > of the authors of this > package? The > basic > support for chemistry in > BioJava 3 is a bit better... > (e.g. Element > class) > > Andreas > > On Thu, Feb 24, 2011 at 7:33 > AM, Peter > Troshin > > wrote: > > Hi, > > I've noticed that BioJava > up to about version 1.7 had an > org.biojava.bio.proteomics > package, which had methods for > isoelectric > point > and molecular weight > calculations for peptides. > I could not find > this > package in the BioJava > 3.0.1 API. I?d like to use > these methods and > wonder > if there are any > equivalent methods > available in the latest > version > of > BioJava? > > Thank you for your help, > > Kind regards, > Peter > > Dr Peter Troshin > Bioinformatics Software > Developer > Phone: +44 (0)1382 388589 > Fax: +44 (0)1382 385764 > The Barton Group > College of Life Sciences > Medical Sciences Institute > University of Dundee > Dundee > DD1 5EH > UK > > > > _______________________________________________ > Biojava-l mailing list - > Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - > Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From fuyu12345 at gmail.com Wed Mar 9 09:53:19 2011 From: fuyu12345 at gmail.com (Richard Fu) Date: Wed, 9 Mar 2011 17:53:19 +0800 Subject: [Biojava-l] About translation to Chinese In-Reply-To: References: Message-ID: As a Chinese student interested in biojava, I noticed that there is not a Chinese version of documentation of biojava 3.0. Thus I would like to translate some of the javadoc or cookbook. Which part is the essential part and should be set priority to? Maybe I can help popularize biojava among more Chinese scientists and students who are enthusiastic about it. From flf.mib at gmail.com Wed Mar 9 20:12:46 2011 From: flf.mib at gmail.com (=?ISO-8859-1?Q?Fran=E7ois_Le_Fevre?=) Date: Wed, 09 Mar 2011 21:12:46 +0100 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature Message-ID: <4D77DF3E.4000609@gmail.com> Dear all, I would like to know if 2 proteins that have the same sequence of aminoacid should be equals? I was wandering that 2 Sequence that have exactly the same string signature should be the same. But it seems to be not the case. Is it normal? Thank for your help. Sequence protein1 = MKRISTTITTTITITTGNGAG hash=801818331 Sequence protein2 = MKRISTTITTTITITTGNGAG hash=700804192 protein1.equals(protein2) return false. Francois -- ---------------------- Francois LE FEVRE From ayates at ebi.ac.uk Wed Mar 9 20:27:39 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 9 Mar 2011 20:27:39 +0000 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: <4D77DF3E.4000609@gmail.com> References: <4D77DF3E.4000609@gmail.com> Message-ID: <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> Hi Francois, Neither Compounds nor Sequences have an over-ridden equals() & hashcode() method which is why you're seeing the current behaviour. Andy On 9 Mar 2011, at 20:12, Fran?ois Le Fevre wrote: > Dear all, > > I would like to know if 2 proteins that have the same sequence of aminoacid should be equals? > I was wandering that 2 Sequence that have exactly the same string signature should be the same. > But it seems to be not the case. > Is it normal? > > Thank for your help. > > > Sequence protein1 = MKRISTTITTTITITTGNGAG hash=801818331 > Sequence protein2 = MKRISTTITTTITITTGNGAG hash=700804192 > > protein1.equals(protein2) return false. > > Francois > > -- > ---------------------- > Francois LE FEVRE > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andreas at sdsc.edu Thu Mar 10 01:02:11 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 9 Mar 2011 17:02:11 -0800 Subject: [Biojava-l] About translation to Chinese In-Reply-To: References: Message-ID: Hi Richard, great that you want to work on the translation. I don't think there is any part that is more important than others, so ideally you would cover it all... The BioJava 1.7 (legacy) cookbook has been translated into several languages, including simplified Chinese. Probably the best maintained translation is the one by Sylvain Foisy into French. Thanks for volunteering, Andreas On Wed, Mar 9, 2011 at 1:53 AM, Richard Fu wrote: > As a Chinese student interested in biojava, I noticed that there is not a > Chinese version of documentation of biojava 3.0. Thus I would like to > translate some of the javadoc or cookbook. Which part is the essential part > and should be set priority to? Maybe I can help popularize biojava among > more Chinese scientists and students who are enthusiastic about it. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Thu Mar 10 01:04:55 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 9 Mar 2011 17:04:55 -0800 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> Message-ID: Hi Fran?ois, you could try to compare the string representation of the sequences... Andreas 2011/3/9 Andy Yates : > Hi Francois, > > Neither Compounds nor Sequences have an over-ridden equals() & hashcode() method which is why you're seeing the current behaviour. > > Andy > > On 9 Mar 2011, at 20:12, Fran?ois Le Fevre wrote: > >> Dear all, >> >> I would like to know if 2 proteins that have the same sequence of aminoacid should be equals? >> I was wandering that 2 Sequence that have exactly the same string signature should be the same. >> But it seems to be not the case. >> Is it normal? >> >> Thank for your help. >> >> >> Sequence protein1 = MKRISTTITTTITITTGNGAG ? ?hash=801818331 >> Sequence protein2 = MKRISTTITTTITITTGNGAG ? ?hash=700804192 >> >> protein1.equals(protein2) return false. >> >> Francois >> >> -- >> ---------------------- >> Francois LE FEVRE >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas.draeger at uni-tuebingen.de Thu Mar 10 06:42:57 2011 From: andreas.draeger at uni-tuebingen.de (Andreas Draeger) Date: Thu, 10 Mar 2011 07:42:57 +0100 Subject: [Biojava-l] About translation to Chinese In-Reply-To: References: Message-ID: <4D7872F1.6050706@uni-tuebingen.de> > great that you want to work on the translation. I don't think there is > any part that is more important than others, so ideally you would > cover it all... The BioJava 1.7 (legacy) cookbook has been translated > into several languages, including simplified Chinese. Probably the > best maintained translation is the one by Sylvain Foisy into French. Hi, Just one comment on that. Wouldn't it be of more benefit to try to exclude all String messages from BioJava, such as warnings or error messages, to gather these in XML files and to provide a Chinese version? Maybe, this could be a good starting point that others may volunteer and contribute an XML file for their language. In this way, messages from BioJava could be in multiple languages. The structure of these XML files would consist of key-value pairs: Some key to access the String and then the actual String. With the help of ResourceBundle it would be very easy to use it. Cheers Andreas -- Dipl.-Bioinform. Andreas Dr?ger University of T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-78982 Fax: +49-7071-29-5091 From ayates at ebi.ac.uk Thu Mar 10 09:17:45 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 10 Mar 2011 09:17:45 +0000 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> Message-ID: <9D4702B0-F592-4E79-B466-78A1C89F15FB@ebi.ac.uk> I cannot remember the reason why we decided to not include equality for these objects. It's not an unreasonable thing to want though. Assuming I have some time soon I can have a look into implementing it on AbstractCompound, AbstractSequence & the backing stores but it will be some time away. If anyone else wants to give it a shot ... :) Andy On 10 Mar 2011, at 01:04, Andreas Prlic wrote: > Hi Fran?ois, > > you could try to compare the string representation of the sequences... > > Andreas > > 2011/3/9 Andy Yates : >> Hi Francois, >> >> Neither Compounds nor Sequences have an over-ridden equals() & hashcode() method which is why you're seeing the current behaviour. >> >> Andy >> >> On 9 Mar 2011, at 20:12, Fran?ois Le Fevre wrote: >> >>> Dear all, >>> >>> I would like to know if 2 proteins that have the same sequence of aminoacid should be equals? >>> I was wandering that 2 Sequence that have exactly the same string signature should be the same. >>> But it seems to be not the case. >>> Is it normal? >>> >>> Thank for your help. >>> >>> >>> Sequence protein1 = MKRISTTITTTITITTGNGAG hash=801818331 >>> Sequence protein2 = MKRISTTITTTITITTGNGAG hash=700804192 >>> >>> protein1.equals(protein2) return false. >>> >>> Francois >>> >>> -- >>> ---------------------- >>> Francois LE FEVRE >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From flf.mib at gmail.com Thu Mar 10 12:22:24 2011 From: flf.mib at gmail.com (Francois Le Fevre) Date: Thu, 10 Mar 2011 13:22:24 +0100 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> <9D4702B0-F592-4E79-B466-78A1C89F15FB@ebi.ac.uk> Message-ID: This could be great. But for me equals means only s?quence identity and not features. Le 10 mars 2011 10:17, "Andy Yates" a ?crit : I cannot remember the reason why we decided to not include equality for these objects. It's not an unreasonable thing to want though. Assuming I have some time soon I can have a look into implementing it on AbstractCompound, AbstractSequence & the backing stores but it will be some time away. If anyone else wants to give it a shot ... :) Andy On 10 Mar 2011, at 01:04, Andreas Prlic wrote: > Hi Fran?ois, > > you could try to compare the st... -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1... From ayates at ebi.ac.uk Thu Mar 10 12:47:32 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 10 Mar 2011 12:47:32 +0000 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> <9D4702B0-F592-4E79-B466-78A1C89F15FB@ebi.ac.uk> Message-ID: This is where the subject becomes murky & will probably mean that any code written for equals() & hashcode() will have to take them into account where present. However Sequence compound identity would still be available from another method but this will require an extension of the Sequence interface Andy On 10 Mar 2011, at 12:22, Francois Le Fevre wrote: > This could be great. But for me equals means only s?quence identity and not features. > > >> Le 10 mars 2011 10:17, "Andy Yates" a ?crit : >> >> I cannot remember the reason why we decided to not include equality for these objects. It's not an unreasonable thing to want though. Assuming I have some time soon I can have a look into implementing it on AbstractCompound, AbstractSequence & the backing stores but it will be some time away. If anyone else wants to give it a shot ... :) >> >> Andy >> >> On 10 Mar 2011, at 01:04, Andreas Prlic wrote: >> >> > Hi Fran?ois, >> > >> > you could try to compare the st... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1... >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From rmb32 at cornell.edu Thu Mar 10 17:13:31 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 10 Mar 2011 12:13:31 -0500 Subject: [Biojava-l] update Google Summer of Code project ideas Message-ID: <4D7906BB.3030006@cornell.edu> Hi all, Please make sure the BioJava information is up to date for 2011 on both the OBF and BioJava wikis. The current page looks pretty good, just be aware that Google will be looking at it soon to evaluate whether OBF will be accepted again this year to GSoC. OBF wiki page: http://www.open-bio.org/wiki/Google_Summer_of_Code BioJava wiki: http://biojava.org/wiki/Google_Summer_of_Code Rob ---- Robert Buels (prospective) 2011 OBF GSoC Organization Admin From ayates at ebi.ac.uk Fri Mar 11 22:48:34 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 11 Mar 2011 22:48:34 +0000 Subject: [Biojava-l] equality of proteins based on their aminoacid sequence signature In-Reply-To: References: <4D77DF3E.4000609@gmail.com> <59FB45CB-ED7C-4AFC-A53D-9D66352F575A@ebi.ac.uk> <9D4702B0-F592-4E79-B466-78A1C89F15FB@ebi.ac.uk> Message-ID: <5FC99578-CB7E-4659-968E-01AC8A60FA3E@ebi.ac.uk> Hi Francois, So I've been thinking about this & if we add this to a small set of objects (compounds & compound sets) we can get sequence equality working. This will be done as part of the SequenceMixin class & we can do case sensitive & insensitive versions. We can also do some tricks WRT length and compound sets to reject a pair of sequences without the need to iterate through the sequence. The code will look like SequenceMixin.sequenceEquality(dnaOne, dnaTwo); or SequenceMixin.sequenceEqualityIgnoreCase(dnaOne, dnaTwo); Don't forget you can also use checksums like md5 & sha1 to calculate a value which should be very unlikely to clash (projects like InterPro use this technique to cache results against a very quick lookup). You can do this like: MessageDigest m = MessageDigest.getInstance("MD5"); for(Compound c: seq) { m.update(c.getShortName().getBytes()); } BigInteger i = new BigInteger(1,m.digest()); String md5checksum = String.format("%1$032X", i); HTH Andy On 10 Mar 2011, at 12:47, Andy Yates wrote: > This is where the subject becomes murky & will probably mean that any code written for equals() & hashcode() will have to take them into account where present. However Sequence compound identity would still be available from another method but this will require an extension of the Sequence interface > > Andy > > On 10 Mar 2011, at 12:22, Francois Le Fevre wrote: > >> This could be great. But for me equals means only s?quence identity and not features. >> >> >>> Le 10 mars 2011 10:17, "Andy Yates" a ?crit : >>> >>> I cannot remember the reason why we decided to not include equality for these objects. It's not an unreasonable thing to want though. Assuming I have some time soon I can have a look into implementing it on AbstractCompound, AbstractSequence & the backing stores but it will be some time away. If anyone else wants to give it a shot ... :) >>> >>> Andy >>> >>> On 10 Mar 2011, at 01:04, Andreas Prlic wrote: >>> >>>> Hi Fran?ois, >>>> >>>> you could try to compare the st... >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1... >>> >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From rmb32 at cornell.edu Fri Mar 18 19:25:10 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 18 Mar 2011 15:25:10 -0400 Subject: [Biojava-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4D83B196.2090403@cornell.edu> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2011 FAQ at http://bit.ly/hpoz8W Student applications are due April 8, 2011 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and whom to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2011 Administrator From andreas at sdsc.edu Fri Mar 18 20:52:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 18 Mar 2011 13:52:57 -0700 Subject: [Biojava-l] Problem with Multiple Sequence Alignment in BioJava In-Reply-To: References: Message-ID: Hi Udana, sounds like forester.jar is missing from your classpath.... Andreas On Thu, Feb 10, 2011 at 9:01 AM, udana chathuranga wrote: > hi all, > > When I was going through the biojava cookbook as I was interested in this > project. I tried the example in the page > http://biojava.org/wiki/BioJava:CookBook3:MSA and I got a classnotfound > exception for the line "Profile profile > = Alignments.getMultipleSequenceAlignment(lst);". > > Error Message: > > Exception in thread "main" java.lang.NoClassDefFoundError: > org/forester/phylogenyinference/DistanceMatrix > ? ?at > org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:176) > ? ?at CookbookMSA.multipleSequenceAlignment(CookbookMSA.java:29) > ? ?at CookbookMSA.main(CookbookMSA.java:18) > Caused by: java.lang.ClassNotFoundException: > org.forester.phylogenyinference.DistanceMatrix > ? ?at java.net.URLClassLoader$1.run(Unknown Source) > ? ?at java.security.AccessController.doPrivileged(Native Method) > ? ?at java.net.URLClassLoader.findClass(Unknown Source) > ? ?at java.lang.ClassLoader.loadClass(Unknown Source) > ? ?at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) > ? ?at java.lang.ClassLoader.loadClass(Unknown Source) > ? ?at java.lang.ClassLoader.loadClassInternal(Unknown Source) > > Is this a know issue or Am I doing something wrong with the code? > Help me on this I have attached the java source file that I have tried. > > Thanks > Regards > udana. > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From uchathuranga at gmail.com Sat Mar 19 08:17:26 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Sat, 19 Mar 2011 13:47:26 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D76369D.8060403@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> Message-ID: hi Peter, First of all congratulation to OBF for selecting as a mentoring organization in gsoc 2011.I was little busy last few days because my final year semester one exam and that's why I couldn't reply to your mail. Thanks for the guide in your reply.I have already study about Bio Java 3 API and looked into the tools in the web site you send. I 'd like to know what kind of things I should put in my proposal to make it a good proposal apart from the fact in the topic *When you apply *section in the web site http://www.open-bio.org/wiki/Google_Summer_of_Code.Any hints about that? Thanks Regards udana From shameerainfo at gmail.com Sat Mar 19 17:22:22 2011 From: shameerainfo at gmail.com (Shameera Rathnayaka) Date: Sat, 19 Mar 2011 23:22:22 +0600 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' Message-ID: hi, Im Shameera Rathnayaka, a third year Undergraduate of Department of Computer Science and Engineering University of Moratuwa ,Sri Lanka.I am interested in implementing "Amino acids physico-chemical properties calculation" project as my GSOC 2011 project.I have previous experience in working with Java, MySql and Algorithms from my university projects. As my most recent project i have developed a visual navigation (XVisualNavigator) plugin for openoffice 3.2.1 using java. I am Currently working as an intern at WSO2 which is an open source middle-ware development company. As a starting point to the project i would like to know about the Equations which are needed to these calculations, how can i get those and more about the project. -- Shameera Rathnayaka Undergraduate Department of Computer Science and Engineering University of Moratuwa. Sri Lanka. T.P. 0719221454 From paiualex12 at gmail.com Tue Mar 22 20:38:47 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Tue, 22 Mar 2011 22:38:47 +0200 Subject: [Biojava-l] Help! Gsoc Project Message-ID: Full Name : Alexandru Paiu Country : Romania E-mail : paiualex12 at gmail.com or paiualex12 at yahoo.com Phone Number : 40733924684 Hi . My name is Alexandru Paiu , I'm a student in third year at University Politehnica of Bucharest , Romania , and I really want to participate in Gsoc . I don't have any professional experience in programming because i've never worked in a company . So , I think that this is my chance , of course in gaining real professional experience and I want to practice everything I?ve learned in all this years of study . I was accepted for two summer internships at 2 companies , Java Programming Internships , but they don't offer any sallary so will be kinda hard to work in another town (they are in Bucharest , and I live in Constanta , another town) After all , i'd really like to be accepted in this Gsoc program to gain professional experience , work from home and make some money After searching on the Organisation list , and Project List i found Open Bioinformatics Foundation . It is the only organization that attracted my attention , because I consider that the proposed Projects from BioJava favors me . I want to apply to Amino acids physico-chemical properties calculation Project , because it will be implemented in Java , and especially it uses Threads .*I've never applied for a Gsoc project before so I need some help . I read that i must specify in my application , how may I improve the Project and how should I do it , but I really don't understand what exactly must I do , so I don't have my own Ideeas for now . I really need some informations about the Project . ** *I've started studying Java programming for about one and a half years , alone from Java Books , from internet ( http://download.oracle.com/javase/tutorial/java/index.html) and from school . I've implemented a lot of school projects in Java , and Java Books projects . Three of my Projects that I am proud of are : making a Lanchat Client-Server using TCP/IP , a Lanchat Peer-Peer using UDP multicasting and administrating a Database from a Java Applet with J/Connector and Mysql (you can see the tables and insert/delete/update selected rows from the selected table , and making some reports ). I used Threads in the lanchat projects and in Java game Projects like Brick or Ricochet . I really hope that this is the mail for BioJava and that i will get an answer soon . I haven't found an irc for this departament . Thank you very much for your time Alexandru Paiu From p.v.troshin at dundee.ac.uk Wed Mar 23 11:56:35 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 23 Mar 2011 11:56:35 +0000 Subject: [Biojava-l] Help! Gsoc Project In-Reply-To: References: Message-ID: <4D89DFF3.3040305@dundee.ac.uk> Hello Alexandru, Welcome to the BioJava list! It is good to have you interested in my project. >>> I really need some informations about the Project The ideas behind the project are really simple and most of the methods are not new. So you have a benefit of being able to compare the results of you implementation with others. To get a better understanding of what the project involve I?d suggest you to read more about each of the calculations involved. Google is your friend here. There are plenty of other methods which would be good to add to BioJava, to have a better idea of what that might be I'd suggest you read through the Expasy web site tools section http://expasy.org/tools/ . Please read the documentation for each method and associated papers to have a better idea of what they do. I hope this will get you started. I would also suggest paying a special attention to writing the project plan as through this you can demonstrate your understanding of programming and a task in hands. Kind regards, Peter On 22/03/2011 20:38, Alexandru Paiu wrote: > Full Name : Alexandru Paiu Country : Romania E-mail : > paiualex12 at gmail.com or paiualex12 at yahoo.com Phone Number : > 40733924684 > > Hi . My name is Alexandru Paiu , I'm a student in third year at > University Politehnica of Bucharest , Romania , and I really want to > participate in Gsoc . I don't have any professional experience in > programming because i've never worked in a company . So , I think > that this is my chance , of course in gaining real professional > experience and I want to practice everything I?ve learned in all this > years of study . I was accepted for two summer internships at 2 > companies , Java Programming Internships , but they don't offer any > sallary so will be kinda hard to work in another town (they are in > Bucharest , and I live in Constanta , another town) After all , i'd > really like to be accepted in this Gsoc program to gain professional > experience , work from home and make some money > > After searching on the Organisation list , and Project List i found > Open Bioinformatics Foundation . It is the only organization that > attracted my attention , because I consider that the proposed > Projects from BioJava favors me . I want to apply to Amino acids > physico-chemical properties calculation Project , because it will be > implemented in Java , and especially it uses Threads .*I've never > applied for a Gsoc project before so I need some help . I read that > i must specify in my application , how may I improve the Project and > how should I do it , but I really don't understand what exactly must > I do , so I don't have my own Ideeas for now . I really need some > informations about the Project . ** > > > *I've started studying Java programming for about one and a half > years , alone from Java Books , from internet ( > http://download.oracle.com/javase/tutorial/java/index.html) and from > school . I've implemented a lot of school projects in Java , and Java > Books projects . Three of my Projects that I am proud of are : making > a Lanchat Client-Server using TCP/IP , a Lanchat Peer-Peer using UDP > multicasting and administrating a Database from a Java Applet with > J/Connector and Mysql (you can see the tables and > insert/delete/update selected rows from the selected table , and > making some reports ). I used Threads in the lanchat projects and in > Java game Projects like Brick or Ricochet . > > I really hope that this is the mail for BioJava and that i will get > an answer soon . I haven't found an irc for this departament . > > Thank you very much for your time > > Alexandru Paiu > > _______________________________________________ Biojava-l mailing > list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Thu Mar 24 01:02:59 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 23 Mar 2011 18:02:59 -0700 Subject: [Biojava-l] Google Summer of Code 2011 Message-ID: Hi, As you probably already have heard, the Open Bioinformatics Foundation has been accepted as a mentoring organization for this year's Google Summer of Code. This means we will be able to offer mentoring through BioJava again this year. Accepted students will get a stipend of 5,000$ from Google. Participation is possible from most countries in the world, as long as you are eligible to work in the country in which you'll reside throughout the duration of the program. If you are interested in working on a BioJava related project, now is the time to start preparing and discussing your proposals. Last year we had many applications for the projects proposed by mentors. If you want to distinguish your application I recommend to propose your own project. Don't forget to discuss any proposal with us before you submit them. We will try to provide feedback and match you with a suitable Mentor. Also see http://biojava.org/wiki/Google_Summer_of_Code and Google's FAQs: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs The student application deadline is April 8th. Google will announce which proposals got accepted on April 25th. Andreas From p.v.troshin at dundee.ac.uk Fri Mar 25 13:15:07 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 25 Mar 2011 13:15:07 +0000 Subject: [Biojava-l] coding exercise for amino acids physico-chemical properties calculation GSoC project idea Message-ID: <4D8C955B.5000203@dundee.ac.uk> Dear prospective GSoC students, There have been a considerable interest in the project. To help selecting the best candidate I decided to make a short coding exercise http://biojava.org/wiki/Short_coding_exercise. The exercise is simple and should not take much of your time. That's more you have a complete freedom in devising a solution! Although it?s not required, I think it gives you an opportunity to show your skills so I'd recommend you to have a go at it, besides it should be fun. Happy coding! Peter From p.v.troshin at dundee.ac.uk Fri Mar 25 13:40:43 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 25 Mar 2011 13:40:43 +0000 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' In-Reply-To: References: <4D872DD2.3030903@dundee.ac.uk> <4D88954D.1050304@dundee.ac.uk> Message-ID: <4D8C9B5B.5040201@dundee.ac.uk> Dear Shameera, >>>As i felt it is required some knowledge about BioSQL so is it a good point to start my next step???? I am sorry but I do not see how BioSQL relates to this project. Perhaps more explanation from your side would have helped me to see the connection. >>>I have some problem with checkout and installing BioJava3 Me too, I was unable to checkout the BioJava Live from svn://code.open-bio.org/biojava/biojava-live/trunk due to the following error " A socket operation was attempted to an unreachable host. svn: Can't connect to host 'code.open-bio.org': A socket operation was attempted to an unreachable host. " However, a git read-only mirror http://svn.github.com/biojava/biojava.git seems to work fine, at least I was able to checkout the project. So I'd suggest you to try that. >>>2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the Biojava : started says i want to set the CLASSPATH variable as Have you downloaded biojava-all package from http://biojava.org/download/bj3.0.1/biojava3.0.1-all.tar.gz or was this something else? I cannot really comment on that one, as I have never tried this. I am coping this message to the BioJava list let?s see if someone else is aware of any problems with this package. I hope this helps. Regards, Peter On 25/03/2011 04:30, Shameera Rathnayaka wrote: > Sorry for the delay, > > > I went through some sources and got a basic understanding of the > project, and I referred some links in the cookbook also.As i felt it > is required some knowledge about BioSQL so is it a good point to start > my next step???? > > I have some problem with checkout and installing BioJava3 > > 1. I logged in using openid log in and tried to checkout via Developer > SVN, but it is asking password for three times and then shows "svn: > Network connection closed unexpectedly" then i used a USB modem but it > gives same result. > > 2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in > the Biojava : started says i want to set the CLASSPATH variable as > "export CLASSPATH=/home/thomas/ > biojava-live.jar:/home/thomas/bytecode.jar:/home/thomas/commons-cli.jar:/home/thomas/commons-collections- > 2.1.jar:/home/thomas/commons-dbcp-1.1.jar:/home/thomas/commons-pool-1.1.jar:." > change according to my directory, in UNIX Bourne-type > shells, but i couldn't find those .jar files in my directory. > > Thanks > > > On Tue, Mar 22, 2011 at 6:25 PM, Peter Troshin > > wrote: > > Hi Shameera, > > >>>Yes as you said, equations are not the case. But i needed to > know will the equations be provided or not :) > > No one asks you to devise the equations yourself all you need is > to be able to read the paper and get the equation out of it. > Please note that this is YOUR project and a mentor is there to > help but not to provide everything for you. I think the equations > are not hard to find besides if I just give you the equations you > will not understand them thoroughly. As I said before you need to > read the papers and by the source I meant the papers not the > source code. If this is something that sounds too hard for you > then, you may want to reconsider your decision contributing to > BioJava project. BioJava is not only about Java but about Biology > as well. Ability and will to find the information you need is a > crucial for the success in any project. > In addition to it there are plenty of implementations for the > method that you set to implement. My advice to you would be to > Google for molecular weight, extinction coefficient, instability > index etc. I am sure you will find plenty of information on these > topics. > > To get you started I am giving you the link to the software that > calculates half of the properties you need. There are formulas and > plenty of documentation http://expasy.org/tools/protparam-doc.html > > But please next time come back with a little more specific questions. > > P.S. I do not envisage any GUI interfaces for the library you want > to develop however, as a student you are free to propose this if > you think this would be of benefit to the project. > > Regards, > Peter > > > On 21/03/2011 17 :32, Shameera Rathnayaka > wrote: >> >> >> Yes as you said, equations are not the case. But i needed to know >> will the equations be provided or not :) >> And also i'm willing to see those equations. Could you please >> help me to find out those equations, >> then i could figure it out how to apply multi-threading >> technology to speed up the process. >> >> As i think i need to also develop a GUI except to the methods >> isn't it??? >> >> >> > understanding how these things work> >> >> Yes now im referring to biojava source code . But im in a little >> bit confusion about how to get a start and where the start should >> be? >> >> >> On Mon, Mar 21, 2011 at 4:52 PM, Peter Troshin >> > wrote: >> >> Hi Shameera, >> >> >> >>> As a >> starting point to >> >> the project i would like to know about the >> >> >> >> >>> Equations which are needed to these >> calculations, how >> >> can i get >> >> >> >> >>> those and more about the project. >> >> It is good that you are interested in the project. >> Did you try to find the formula yourself? I will be happy to >> help you if you cannot find them, but really >> this must not be hard. Also I believe that you are much >> better off reading the sources and >> understanding how these things work than just have a formula. >> >> Regards, >> Peter >> >> >> On 19/03/2011 17:22, Shameera Rathnayaka wrote: >> >> > hi, >> >> >> >> > >> >> >> >> > Im Shameera Rathnayaka, a third year Undergraduate >> of >> >> Department of >> >> >> >> > Computer Science and Engineering University of >> Moratuwa ,Sri >> >> Lanka.I >> >> >> >> > am interested in implementing "Amino acids >> physico-chemical >> >> >> >> > properties calculation" project as my GSOC 2011 >> project.I >> >> have >> >> >> >> > previous experience in working with Java, MySql and >> >> Algorithms from >> >> >> >> > my university projects. As my most recent project i >> have >> >> developed a >> >> >> >> > visual navigation >> >> >> >> >> > >> >> (XVisualNavigator) >> >> >> >> > >> >> >> >> > >> plugin for openoffice 3.2.1 using java. I am Currently >> working as an intern >> > at WSO2 >> >> >> which is an open source >> >> >> >> > middle-ware development company. >> >> >> >> > >> >> >> >> > As a starting point to the project i would like to >> know about >> >> the >> >> >> >> > Equations which are needed to these calculations, >> how can i >> >> get those >> >> >> >> > and more about the project. >> >> >> >> >> >> >> >> -- >> Shameera Rathnayaka >> Undergraduate >> Department of Computer Science and Engineering >> University of Moratuwa. >> Sri Lanka. >> T.P. 0719221454 > > > > > -- > Shameera Rathnayaka > Undergraduate > Department of Computer Science and Engineering > University of Moratuwa. > Sri Lanka. > T.P. 0719221454 From p.v.troshin at dundee.ac.uk Fri Mar 25 14:45:34 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 25 Mar 2011 14:45:34 +0000 Subject: [Biojava-l] Fwd: [Biojava-dev] Google Summer of Code 2011 Message-ID: <4D8CAA8E.10109@dundee.ac.uk> Dear Nirmal, Thanks for your interest in GSoC and the project! My name is Peter and I am the mentor for the project you are interested in. Andreas has kindly sent your email to me. >>>I participated in GSoC 2010 for Apache Derby (RDBMS in Java) project and successfully finished the project. I have recently finished a course module on Bio Informatics and have a basic understanding about few algorithms (Nussinov, Profile HMM, Needle-Wucsh etc.), which made me interested in this area of computer science. It is very good to have someone with prior experience in GSoC, Java and Bioinformatics, this must help you to develop a compelling proposal. >>>I would like to contribute to Bio-Java in this summer, would you please direct me to relevant sources which I should start reading on and also possible guidelines would be highly appreciated. The description of the idea, selection criteria and the coding exercise can be found here: http://biojava.org/wiki/Google_Summer_of_Code The guidelines for GSoC students is available from http://open-bio.org/wiki/Google_Summer_of_Code The ideas behind the project are really simple and most of the methods are not new. So you have a benefit of being able to compare the results of you implementation with others. To get a better understanding of what the project involve I?d suggest you to read more about each of the calculations involved. Google is your friend here. There are plenty of other methods which would be good to add to BioJava, to have a better idea of what that might be I'd suggest you read through the Expasy web site tools section http://expasy.org/tools/ . As you studied BioInformatics you should have no difficulties understanding these. I hope I addressed your questions. Regards, Peter -------- Original Message -------- Subject: Fwd: [Biojava-dev] Google Summer of Code 2011 Date: Thu, 24 Mar 2011 20:26:35 -0700 From: Andreas Prlic To: Peter Troshin Hi Peter, do you want to reply to this one? Although he is addressing me his question is about your project.. Thanks, Andreas ---------- Forwarded message ---------- From: Nirmal Fernando Date: Wed, Mar 23, 2011 at 7:44 PM Subject: Re: [Biojava-dev] Google Summer of Code 2011 To: Andreas Prlic Cc: Biojava , biojava-dev On Thu, Mar 24, 2011 at 6:32 AM, Andreas Prlic wrote: > Hi, > > As you probably already have heard, the Open Bioinformatics > Foundation has been accepted as a mentoring organization for this > year's Google Summer of Code. This means we will be able to offer > mentoring through BioJava again this year. Accepted students will get > a stipend of 5,000$ from Google. Participation is possible from most > countries in the world, as long as you are eligible to work in the > country in which you'll reside throughout the duration of the > program. > > If you are interested in working on a BioJava related project, now > is the time to start preparing and discussing your proposals. Last > year we had many applications for the projects proposed by mentors. > If you want to distinguish your application I recommend to propose > your own project. Don't forget to discuss any proposal with us before > you submit them. We will try to provide feedback and match you with > a suitable Mentor. > > Also see http://biojava.org/wiki/Google_Summer_of_Code and Google's > FAQs: > http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs > > > The student application deadline is April 8th. Google will announce > which proposals got accepted on April 25th. > > Andreas _______________________________________________ biojava-dev > mailing list biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > Hi Andreas, I'm an undergraduate at Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka, and I'm hoping to have an exciting summer with GSoC 2011. I participated in GSoC 2010 for Apache Derby (RDBMS in Java) project and successfully finished the project. This is a sample of the work (final output) which I've done for Derby last summer (http://nirmalfdo.blogspot.com/p/my-work-at-gsoc-2010.html). You can find my profile and recommendations at LinkedIn (http://www.linkedin.com/profile/view?id=54105394&trk=tab_pro). I have recently finished a course module on Bio Informatics and have a basic understanding about few algorithms (Nussinov, Profile HMM, Needle-Wucsh etc.), which made me interested in this area of computer science. While looking at your ideas page "Amino acids physico-chemical properties calculation" interested me most, since it involves implementation of algorithms. The sounding Java knowledge and the experiences of concurrent programming makes me more comfortable. I would like to contribute to Bio-Java in this summer, would you please direct me to relevant sources which I should start reading on and also possible guidelines would be highly appreciated. Thanks. -- Best Regards, Nirmal C.S.Nirmal J. Fernando Department of Computer Science & Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka. Blog: http://nirmalfdo.blogspot.com/ -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Fri Mar 25 15:41:34 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 25 Mar 2011 08:41:34 -0700 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' In-Reply-To: <4D8C9B5B.5040201@dundee.ac.uk> References: <4D872DD2.3030903@dundee.ac.uk> <4D88954D.1050304@dundee.ac.uk> <4D8C9B5B.5040201@dundee.ac.uk> Message-ID: I removed the checkout instructions from the broken anonymus SVN repository. The recommended way to check out the code is now via github, either using SVN or GIT... Andreas On Fri, Mar 25, 2011 at 6:40 AM, Peter Troshin wrote: > Dear Shameera, > >>>>As i felt it is required some knowledge about BioSQL so is it a good >>>> point to start my next step???? > > I am sorry but I do not see how BioSQL relates to this project. Perhaps more > explanation from your side would have helped me to see the connection. > >>>>I have some problem with checkout and installing BioJava3 > > Me too, I was unable to checkout the BioJava Live from > svn://code.open-bio.org/biojava/biojava-live/trunk due to the following > error > > " A socket operation was attempted to an unreachable host. > svn: Can't connect to host 'code.open-bio.org': A socket operation was > attempted to an unreachable host. " > > However, a git read-only mirror http://svn.github.com/biojava/biojava.git > seems to work fine, at least I was able to checkout the project. > So I'd suggest you to try that. > >>>>2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the >>>> Biojava : started says i want to set the CLASSPATH variable as > > Have you downloaded biojava-all package from > http://biojava.org/download/bj3.0.1/biojava3.0.1-all.tar.gz or was this > something else? I cannot really comment on that one, as I have never tried > this. I am coping this message to the BioJava list let?s see if someone else > is aware of any problems with this package. > > > I hope this helps. > > Regards, > Peter > > > On 25/03/2011 04:30, Shameera Rathnayaka wrote: >> >> Sorry for the delay, >> >> >> I went through some sources and got a basic understanding of the project, >> and I referred some links in the cookbook also.As i felt it is required some >> knowledge about BioSQL so is it a good point to start my next step???? >> >> I have some problem with checkout and installing BioJava3 >> >> 1. I logged in using openid log in and tried to checkout via Developer >> SVN, but it is asking password for three times and then shows "svn: Network >> connection closed unexpectedly" then i used a USB modem but it gives same >> result. >> >> 2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the >> Biojava : started says i want to set the CLASSPATH variable as "export >> CLASSPATH=/home/thomas/ >> >> biojava-live.jar:/home/thomas/bytecode.jar:/home/thomas/commons-cli.jar:/home/thomas/commons-collections- >> 2.1.jar:/home/thomas/commons-dbcp-1.1.jar:/home/thomas/commons-pool-1.1.jar:." >> change according to my directory, in UNIX Bourne-type shells, >> but i couldn't find those .jar files in my directory. >> >> Thanks >> >> >> On Tue, Mar 22, 2011 at 6:25 PM, Peter Troshin > > wrote: >> >> ? ?Hi Shameera, >> >> ? ?>>>Yes as you said, equations are not the case. But i needed to >> ? ?know will the equations be provided or not :) >> >> ? ?No one asks you to devise the equations yourself all you need is >> ? ?to be able to read the paper and get the equation out of it. >> ? ?Please note that this is YOUR project and a mentor is there to >> ? ?help but not to provide everything for you. I think the equations >> ? ?are not hard to find besides if I just give you the equations you >> ? ?will not understand them thoroughly. As I said before you need to >> ? ?read the papers and by the source I meant the papers not the >> ? ?source code. If this is something that sounds too hard for you >> ? ?then, you may want to reconsider your decision contributing to >> ? ?BioJava project. BioJava is not only about Java but about Biology >> ? ?as well. Ability and will to find the information you need is a >> ? ?crucial for the success in any project. >> ? ?In addition to it there are plenty of implementations for the >> ? ?method that you set to implement. My advice to you would be to >> ? ?Google for molecular weight, extinction coefficient, instability >> ? ?index etc. I am sure you will find plenty of information on these >> ? ?topics. >> >> ? ?To get you started I am giving you the link to the software that >> ? ?calculates half of the properties you need. There are formulas and >> ? ?plenty of documentation http://expasy.org/tools/protparam-doc.html >> >> ? ?But please next time come back with a little more specific questions. >> >> ? ?P.S. I do not envisage any GUI interfaces for the library you want >> ? ?to develop however, as a student you are free to propose this if >> ? ?you think this would be of benefit to the project. >> >> ? ?Regards, >> ? ?Peter >> >> >> ? ?On 21/03/2011 17 :32, Shameera Rathnayaka >> ? ?wrote: >>> >>> >>> ? ?Yes as you said, equations are not the case. But i needed to know >>> ? ?will the equations be provided or not :) >>> ? ?And also i'm willing to see those equations. Could you please >>> ? ?help me to find out those equations, >>> ? ?then i could figure it out how to apply multi-threading >>> ? ?technology to speed up the process. >>> >>> ? ?As i think i need to also develop a GUI except to the methods >>> ? ?isn't it??? >>> >>> >>> ? ?>> ? ?understanding how these things work> >>> >>> ? ?Yes now im referring to biojava source code . But im in a little >>> ? ?bit confusion about how to get a start and where the start should >>> ? ?be? >>> >>> >>> ? ?On Mon, Mar 21, 2011 at 4:52 PM, Peter Troshin >>> ? ?> wrote: >>> >>> ? ? ? ?Hi Shameera, >>> >>> >>> ? ? ? ?>>> As a >>> ? ? ? ?starting point to >>> >>> ? ? ? ?the project i would like to know about the >>> >>> >>> >>> ? ? ? ?>>> Equations which are needed to these >>> ? ? ? ?calculations, how >>> >>> ? ? ? ?can i get >>> >>> >>> >>> ? ? ? ?>>> those and more about the project. >>> >>> ? ? ? ?It is good that you are interested in the project. >>> ? ? ? ?Did you try to find the formula yourself? I will be happy to >>> ? ? ? ?help you if you cannot find them, but really >>> ? ? ? ?this must not be hard. Also I believe that you are much >>> ? ? ? ?better off reading the sources and >>> ? ? ? ?understanding how these things work than just have a formula. >>> >>> ? ? ? ?Regards, >>> ? ? ? ?Peter >>> >>> >>> ? ? ? ?On 19/03/2011 17:22, Shameera Rathnayaka wrote: >>> >>> ? ? ? ?> hi, >>> >>> >>> >>> ? ? ? ?> >>> >>> >>> >>> ? ? ? ?> Im Shameera Rathnayaka, a third year Undergraduate >>> ? ? ? ?of >>> >>> ? ? ? ?Department of >>> >>> >>> >>> ? ? ? ?> Computer Science and Engineering University of >>> ? ? ? ?Moratuwa ,Sri >>> >>> ? ? ? ?Lanka.I >>> >>> >>> >>> ? ? ? ?> am interested in implementing "Amino acids >>> ? ? ? ?physico-chemical >>> >>> >>> >>> ? ? ? ?> properties calculation" project as my GSOC 2011 >>> ? ? ? ?project.I >>> >>> ? ? ? ?have >>> >>> >>> >>> ? ? ? ?> previous experience in working with Java, MySql and >>> >>> ? ? ? ?Algorithms from >>> >>> >>> >>> ? ? ? ?> my university projects. As my most recent project i >>> ? ? ? ?have >>> >>> ? ? ? ?developed a >>> >>> >>> >>> ? ? ? ?> visual navigation >>> >>> >>> >>> >>> ? ? ? ?> >>> >>> >>> ?(XVisualNavigator) >>> >>> >>> >>> ? ? ? ?> >>> >>> >>> >>> ? ? ? ?> >>> ? ? ? ?plugin for openoffice 3.2.1 using java. I am Currently >>> ? ? ? ?working as an intern >>> ? ? ? ?> at WSO2 >>> >>> ? ? ? ? >>> ? ? ? ?which is an open source >>> >>> >>> >>> ? ? ? ?> middle-ware development company. >>> >>> >>> >>> ? ? ? ?> >>> >>> >>> >>> ? ? ? ?> As a starting point to the project i would like to >>> ? ? ? ?know about >>> >>> ? ? ? ?the >>> >>> >>> >>> ? ? ? ?> Equations which are needed to these calculations, >>> ? ? ? ?how can i >>> >>> ? ? ? ?get those >>> >>> >>> >>> ? ? ? ?> and more about the project. >>> >>> >>> >>> >>> >>> >>> >>> ? ?-- ? ? Shameera Rathnayaka >>> ? ?Undergraduate >>> ? ?Department of Computer Science and Engineering >>> ? ?University of Moratuwa. >>> ? ?Sri Lanka. >>> ? ?T.P. 0719221454 >> >> >> >> >> -- >> Shameera Rathnayaka >> Undergraduate >> Department of Computer Science and Engineering >> University of Moratuwa. >> Sri Lanka. >> T.P. 0719221454 > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From paiualex12 at gmail.com Fri Mar 25 18:06:37 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Fri, 25 Mar 2011 20:06:37 +0200 Subject: [Biojava-l] Gsoc Amino acids physico-chemical properties calculation Message-ID: Hi Peter It's me again Paiu Alexandru from Romania . I've started to study for 2 days about amino acids , but i didn't find anything in romanian about the properties I have to implement for this project . I found only about amino acids in general . I went to the university library to find books about amino acids but i wasn't so lucky . I'm really stucked in getting some information about that properties . I found on wikipedia formulas and some informations about every method , but I really can't understand exactly what is every formula trying to do . I'd need some concrete examples for every method I have to implement . There are to many abbreviations that I don't understand . I tried to get some help from some friends that are studying pharmacy , but they couldn't help me . I studied that tool http://expasy.org/tools/, but i haven't figured out yet exactly how those methods work (which are the inputs for each method and how they obtain those outputs ) I've started to work the goals on Selection criteria . I've finished the first 2 , now i'm working on using threads . I'll use threads for taking multiple lines from the input file . Each Thread will take a line at a time , will take the 2 strings separated by '\t' and will be applied StringOverlapFinder over the 2 Strings . In StringOverlapFinder i look for the last 5 characters of String 1 in String 2 . If it isn't found i print String 1 . If is found an overlap , then there are more cases to take care of . Some examples : Ex 1 : x = "abcdefghijklm" and y = "hijklmnopqrst" (your example) then the output is =abcdefg Ex 2 : x="asdcsadvsasevenfiveseven" and y="sevenfivesevenasdasdas" then the output is = asdcsadvsa Ex 3: x="ascdaeseven" and y="bseven" and the output is x So after this 3 examples i've found the "right" implementation . And for this I'll take Ex Nr 2 , because I think it's the most complex . I find that the the last 5 chars (seven) is found on index 0 . I take the substring from y starting with index 0 and ending with the index of the first overlap that is 0 too . So having a null substring the program should stop here and the output would be asdcsadvsasevenfive . But the real output shoud be asdcsadvsa . So , after finding that this is a posible output , i should look for a second overlap in y . It is found at index 9 ( the second seven) . I take the substring again from y that starts with index 0 and end with index 8 ( 9-1 ) and that is sevenfive . The lenght of this substring is 9 . Now i take another substring but this time from x . It starts from index (x.length()-5-9) and ends with index ( x.length()-5) . In this case this substrings are equall and the program will write the correct output that is asdcsadvsa . But before that , it should search for another possible overlap , but in this case there isn't one . In Ex Nr 3 it is found the "seven" string in y . It's taken the substring from y , that is "b" , and is compared with the substring from x that is "e" . They aren't equal so there is not an overlap . The program will look for another string "seven" in y , but there isn't one so the output will be x , because there wasn't found any overlap . I hope you understand my Ideeas . I'll send to you the jar when it's finished Is there a deadline for this selection criteria ? That's it for today Best Regards , Alex From paiualex12 at gmail.com Fri Mar 25 18:23:41 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Fri, 25 Mar 2011 20:23:41 +0200 Subject: [Biojava-l] Question for Peter Message-ID: Hi again ! Is there a way to talk in real time ? I noticed that there isn't listed any Irc channel on Biojava . I can find you somewhere on an Irc channel ? In Romania we have GMT : +2 , like in Cairo (Egipt) . From where are you , what's you GMT and on what channel can i find you That's all Thanks Alex From sunilthomas13 at gmail.com Sat Mar 26 14:17:50 2011 From: sunilthomas13 at gmail.com (Sunil Thomas) Date: Sat, 26 Mar 2011 19:47:50 +0530 Subject: [Biojava-l] GSoC 2011 Message-ID: Hello. I'm Sunil Thomas. I'm interested in the project proposal ' *Amino acids physico-chemical properties calculation'*. I'm a proficient Java programmer, and Java has been my main coding language for about six years now. I also happen to have experience in Multi threaded Java applications. I have done some reasonably complex algorithms such as efficient QR factorization of a matrix( although this was using C++). Mainly, i have a passion for creating the fastest running and most efficient code for any algorithm. Therefore i feel Java implementation of standard algorithms(as described in the ideas page) is right up my alley. I have already started working on the coding exercise given in the wiki. I would like to know if there is a deadline for the coding exercise(like, does it count after the official application period opens up?), as i have some exams in my college right now till next week. Hoping to hear more from the mentors Regards, Sunil Thomas Dept. of Electrical/Electronics Engineering BITS Pilani, Goa Campus INDIA From andreas at sdsc.edu Sat Mar 26 16:43:48 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 26 Mar 2011 09:43:48 -0700 Subject: [Biojava-l] GSoC 2011 In-Reply-To: References: Message-ID: Hi Sunil, > I have already started working on the coding exercise given in the wiki. > I would like to know if there is a deadline for the coding exercise(like, > does it count after the official application period opens up?), as i have The main criteria to rank student applications will be the proposals. As such I would put most of my energy into that. Last year we did coding tests with some of the top ranking proposals as an additional criteria for the ranking that will get submitted to Google, but that was only after all the proposals had been submitted. At this point I would really emphasize the need for a good project plan. GSoC applications are highly competitive and only a very well thought through proposal will have a chance. Andreas From uchathuranga at gmail.com Sun Mar 27 03:56:28 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Sun, 27 Mar 2011 09:26:28 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D872F38.20305@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> <4D872F38.20305@dundee.ac.uk> Message-ID: hi Peter, Thanks for the guide peter.As you mentioned I have to pay more attention to the project plan. I have started working on the coding exercise that you have posed in http://biojava.org/wiki/Short_coding_exercise.Is there a deadline to this or Do we have to submit it together with the proposal? I have some concerns regarding the BioJava 1.8 and BioJava 3 so I am planning to start a new thread in the mailing list.Hope you will help me there too. Thanks Regards udana From shameerainfo at gmail.com Mon Mar 28 09:56:32 2011 From: shameerainfo at gmail.com (Shameera Rathnayaka) Date: Mon, 28 Mar 2011 15:26:32 +0530 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' In-Reply-To: References: <4D872DD2.3030903@dundee.ac.uk> <4D88954D.1050304@dundee.ac.uk> <4D8C9B5B.5040201@dundee.ac.uk> Message-ID: Hi Peter and Andreas, I was able to check out the source using GIT. I have already implemented the short coding exercise for 'Amino acids physico-chemical properties calculation project' but as i noticed there is not any link given to submit the assignment. I need to know that where should i submit it , or do i have to submit it with my project proposal? Thanks in advance! On Fri, Mar 25, 2011 at 9:11 PM, Andreas Prlic wrote: > I removed the checkout instructions from the broken anonymus SVN > repository. The recommended way to check out the code is now via > github, either using SVN or GIT... > > Andreas > > On Fri, Mar 25, 2011 at 6:40 AM, Peter Troshin > wrote: > > Dear Shameera, > > > >>>>As i felt it is required some knowledge about BioSQL so is it a good > >>>> point to start my next step???? > > > > I am sorry but I do not see how BioSQL relates to this project. Perhaps > more > > explanation from your side would have helped me to see the connection. > > > >>>>I have some problem with checkout and installing BioJava3 > > > > Me too, I was unable to checkout the BioJava Live from > > svn://code.open-bio.org/biojava/biojava-live/trunk due to the following > > error > > > > " A socket operation was attempted to an unreachable host. > > svn: Can't connect to host 'code.open-bio.org': A socket operation was > > attempted to an unreachable host. " > > > > However, a git read-only mirror > http://svn.github.com/biojava/biojava.git > > seems to work fine, at least I was able to checkout the project. > > So I'd suggest you to try that. > > > >>>>2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the > >>>> Biojava : started says i want to set the CLASSPATH variable as > > > > Have you downloaded biojava-all package from > > http://biojava.org/download/bj3.0.1/biojava3.0.1-all.tar.gz or was this > > something else? I cannot really comment on that one, as I have never > tried > > this. I am coping this message to the BioJava list let?s see if someone > else > > is aware of any problems with this package. > > > > > > I hope this helps. > > > > Regards, > > Peter > > > > > > On 25/03/2011 04:30, Shameera Rathnayaka wrote: > >> > >> Sorry for the delay, > >> > >> > >> I went through some sources and got a basic understanding of the > project, > >> and I referred some links in the cookbook also.As i felt it is required > some > >> knowledge about BioSQL so is it a good point to start my next step???? > >> > >> I have some problem with checkout and installing BioJava3 > >> > >> 1. I logged in using openid log in and tried to checkout via Developer > >> SVN, but it is asking password for three times and then shows "svn: > Network > >> connection closed unexpectedly" then i used a USB modem but it gives > same > >> result. > >> > >> 2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, as in the > >> Biojava : started says i want to set the CLASSPATH variable as "export > >> CLASSPATH=/home/thomas/ > >> > >> > biojava-live.jar:/home/thomas/bytecode.jar:/home/thomas/commons-cli.jar:/home/thomas/commons-collections- > >> > 2.1.jar:/home/thomas/commons-dbcp-1.1.jar:/home/thomas/commons-pool-1.1.jar:." > >> change according to my directory, in UNIX Bourne-type > shells, > >> but i couldn't find those .jar files in my directory. > >> > >> Thanks > >> > >> > >> On Tue, Mar 22, 2011 at 6:25 PM, Peter Troshin < > p.v.troshin at dundee.ac.uk > >> > wrote: > >> > >> Hi Shameera, > >> > >> >>>Yes as you said, equations are not the case. But i needed to > >> know will the equations be provided or not :) > >> > >> No one asks you to devise the equations yourself all you need is > >> to be able to read the paper and get the equation out of it. > >> Please note that this is YOUR project and a mentor is there to > >> help but not to provide everything for you. I think the equations > >> are not hard to find besides if I just give you the equations you > >> will not understand them thoroughly. As I said before you need to > >> read the papers and by the source I meant the papers not the > >> source code. If this is something that sounds too hard for you > >> then, you may want to reconsider your decision contributing to > >> BioJava project. BioJava is not only about Java but about Biology > >> as well. Ability and will to find the information you need is a > >> crucial for the success in any project. > >> In addition to it there are plenty of implementations for the > >> method that you set to implement. My advice to you would be to > >> Google for molecular weight, extinction coefficient, instability > >> index etc. I am sure you will find plenty of information on these > >> topics. > >> > >> To get you started I am giving you the link to the software that > >> calculates half of the properties you need. There are formulas and > >> plenty of documentation http://expasy.org/tools/protparam-doc.html > >> > >> But please next time come back with a little more specific questions. > >> > >> P.S. I do not envisage any GUI interfaces for the library you want > >> to develop however, as a student you are free to propose this if > >> you think this would be of benefit to the project. > >> > >> Regards, > >> Peter > >> > >> > >> On 21/03/2011 17 :32, Shameera Rathnayaka > >> wrote: > >>> > >>> > >>> Yes as you said, equations are not the case. But i needed to know > >>> will the equations be provided or not :) > >>> And also i'm willing to see those equations. Could you please > >>> help me to find out those equations, > >>> then i could figure it out how to apply multi-threading > >>> technology to speed up the process. > >>> > >>> As i think i need to also develop a GUI except to the methods > >>> isn't it??? > >>> > >>> > >>> >>> understanding how these things work> > >>> > >>> Yes now im referring to biojava source code . But im in a little > >>> bit confusion about how to get a start and where the start should > >>> be? > >>> > >>> > >>> On Mon, Mar 21, 2011 at 4:52 PM, Peter Troshin > >>> > wrote: > >>> > >>> Hi Shameera, > >>> > >>> > >>> >>> As a > >>> starting point to > >>> > >>> the project i would like to know about the > >>> > >>> > >>> > >>> >>> Equations which are needed to these > >>> calculations, how > >>> > >>> can i get > >>> > >>> > >>> > >>> >>> those and more about the project. > >>> > >>> It is good that you are interested in the project. > >>> Did you try to find the formula yourself? I will be happy to > >>> help you if you cannot find them, but really > >>> this must not be hard. Also I believe that you are much > >>> better off reading the sources and > >>> understanding how these things work than just have a formula. > >>> > >>> Regards, > >>> Peter > >>> > >>> > >>> On 19/03/2011 17:22, Shameera Rathnayaka wrote: > >>> > >>> > hi, > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > Im Shameera Rathnayaka, a third year Undergraduate > >>> of > >>> > >>> Department of > >>> > >>> > >>> > >>> > Computer Science and Engineering University of > >>> Moratuwa ,Sri > >>> > >>> Lanka.I > >>> > >>> > >>> > >>> > am interested in implementing "Amino acids > >>> physico-chemical > >>> > >>> > >>> > >>> > properties calculation" project as my GSOC 2011 > >>> project.I > >>> > >>> have > >>> > >>> > >>> > >>> > previous experience in working with Java, MySql and > >>> > >>> Algorithms from > >>> > >>> > >>> > >>> > my university projects. As my most recent project i > >>> have > >>> > >>> developed a > >>> > >>> > >>> > >>> > visual navigation > >>> > >>> > >>> > >>> > >>> > > >>> > >>> > >>> (XVisualNavigator< > http://extensions.services.openoffice.org/en/project/XVN>) > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > > >>> plugin for openoffice 3.2.1 using java. I am Currently > >>> working as an intern > >>> > at WSO2 > >>> > >>> > >>> which is an open source > >>> > >>> > >>> > >>> > middle-ware development company. > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > As a starting point to the project i would like to > >>> know about > >>> > >>> the > >>> > >>> > >>> > >>> > Equations which are needed to these calculations, > >>> how can i > >>> > >>> get those > >>> > >>> > >>> > >>> > and more about the project. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- Shameera Rathnayaka > >>> Undergraduate > >>> Department of Computer Science and Engineering > >>> University of Moratuwa. > >>> Sri Lanka. > >>> T.P. 0719221454 > >> > >> > >> > >> > >> -- > >> Shameera Rathnayaka > >> Undergraduate > >> Department of Computer Science and Engineering > >> University of Moratuwa. > >> Sri Lanka. > >> T.P. 0719221454 > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > -- Shameera Rathnayaka Undergraduate Department of Computer Science and Engineering University of Moratuwa. Sri Lanka. T.P. 0719221454 From p.v.troshin at dundee.ac.uk Mon Mar 28 09:47:43 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 10:47:43 +0100 Subject: [Biojava-l] Gsoc Amino acids physico-chemical properties calculation In-Reply-To: References: Message-ID: <4D90593F.8080008@dundee.ac.uk> Dear Alex, The BioJava projects all have some elements of Chemistry and/or Molecular Biology in them. So some understanding of these areas is necessary. If you do not possess them, then it is going to be hard to produce a competitive project plan. So you may be better off applying to pure Java projects. Completing the coding exercise will not be enough to get through! Kind regards, Peter On 25/03/2011 18:06, Alexandru Paiu wrote: > Hi Peter > > It's me again Paiu Alexandru from Romania . > > I've started to study for 2 days about amino acids , but i didn't find > anything in romanian about the properties I have to implement for this > project . I found only about amino acids in general . I went to the > university library to find books about amino acids but i wasn't so lucky . > I'm really stucked in getting some information about that properties . > > I found on wikipedia formulas and some informations about every method , but > I really can't understand exactly what is every formula trying to do . I'd > need some concrete examples for every method I have to implement . There are > to many abbreviations that I don't understand . I tried to get some help > from some friends that are studying pharmacy , but they couldn't help me . > I studied that tool > http://expasy.org/tools/, > but i haven't figured out yet exactly how those methods work (which > are > the inputs for each method and how they obtain those outputs ) > > I've started to work the goals on Selection criteria . I've finished the > first 2 , now i'm working on using threads . > I'll use threads for taking multiple lines from the input file . Each Thread > will take a line at a time , will take the 2 strings separated by '\t' and > will be applied StringOverlapFinder over the 2 Strings . In > StringOverlapFinder i look for the last 5 characters of String 1 in String 2 > . If it isn't found i print String 1 . If is found an overlap , then there > are more cases to take care of . Some examples : > Ex 1 : x = "abcdefghijklm" and y = "hijklmnopqrst" (your example) then the > output is =abcdefg > Ex 2 : x="asdcsadvsasevenfiveseven" and y="sevenfivesevenasdasdas" then the > output is = asdcsadvsa > Ex 3: x="ascdaeseven" and y="bseven" and the output is x > > So after this 3 examples i've found the "right" implementation . And for > this I'll take Ex Nr 2 , because I think it's the most complex . > I find that the the last 5 chars (seven) is found on index 0 . I take the > substring from y starting with index 0 and ending with the index of the > first overlap that is 0 too . So having a null substring the program should > stop here and the output would be asdcsadvsasevenfive . But the real output > shoud be asdcsadvsa . > So , after finding that this is a posible output , i should look for a > second overlap in y . It is found at index 9 ( the second seven) . I take > the substring again from y that starts with index 0 and end with index 8 ( > 9-1 ) and that is sevenfive . The lenght of this substring is 9 . Now i take > another substring but this time from x . It starts from index > (x.length()-5-9) and ends with index ( x.length()-5) . In this case this > substrings are equall and the program will write the correct output that is > asdcsadvsa . But before that , it should search for another possible overlap > , but in this case there isn't one . > In Ex Nr 3 it is found the "seven" string in y . It's taken the substring > from y , that is "b" , and is compared with the substring from x that is "e" > . They aren't equal so there is not an overlap . The program will look for > another string "seven" in y , but there isn't one so the output will be x , > because there wasn't found any overlap . > I hope you understand my Ideeas . I'll send to you the jar when it's > finished > > Is there a deadline for this selection criteria ? > That's it for today > > Best Regards , > Alex > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Mon Mar 28 10:01:35 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 11:01:35 +0100 Subject: [Biojava-l] GSoC 2011 In-Reply-To: References: Message-ID: <4D905C7F.6090908@dundee.ac.uk> Hi Sunil, Thanks for the interest. I think your experience with Java and passion for the algorithm development should help you on this project. As for the coding exercise I can only agree with Andreas, the project plan is the first thing you should worry about. You are going to need to complete a coding exercise at later stages of selection process. Good luck with your application. Peter On 26/03/2011 14:17, Sunil Thomas wrote: > Hello. > > I'm Sunil Thomas. I'm interested in the project proposal ' *Amino acids > physico-chemical properties calculation'*. I'm a proficient Java programmer, > and Java has been my main coding language for about six years now. I also > happen to have experience in Multi threaded Java applications. > I have done some reasonably complex algorithms such as efficient QR > factorization of a matrix( although this was using C++). > Mainly, i have a passion for creating the fastest running and most efficient > code for any algorithm. > Therefore i feel Java implementation of standard algorithms(as described in > the ideas page) is right up my alley. > I have already started working on the coding exercise given in the wiki. > I would like to know if there is a deadline for the coding exercise(like, > does it count after the official application period opens up?), as i have > some exams in my college right now till next week. > Hoping to hear more from the mentors > > Regards, > > Sunil Thomas > Dept. of Electrical/Electronics Engineering > BITS Pilani, Goa Campus > INDIA > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Mon Mar 28 10:21:27 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 11:21:27 +0100 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> <4D872F38.20305@dundee.ac.uk> Message-ID: <4D906127.7020003@dundee.ac.uk> Hi Udana, You can submit your coding exercise shortly after the proposal. So, it would be very good to have your coding exercise no later than the 10 of April. Mentors do not have much time to evaluate proposals either, so the exercise must be available quickly after the proposal. We may be able to accommodate later submissions, but I would not guarantee that. >>>I have some concerns regarding the BioJava 1.8 and BioJava 3 so I am >>>planning to start a new thread in the mailing list.Hope you will help me >>>there too. Good idea. Taking into account that people on the list have other jobs to do it is best to give them some time to look at your questions, and not expect immediate answer. However, you may get an immediate answer too. That's the nature of the list; you just need to be lucky (:-)) Regards, Peter On 27/03/2011 04:56, udana chathuranga wrote: > hi Peter, > > Thanks for the guide peter.As you mentioned I have to pay more > attention to the project plan. > > I have started working on the coding exercise that you have posed in > http://biojava.org/wiki/Short_coding_exercise.Is there a deadline to > this or Do we have to submit it together with the proposal? > > I have some concerns regarding the BioJava 1.8 and BioJava 3 so I am > planning to start a new thread in the mailing list.Hope you will help > me there too. > > > Thanks > Regards > udana From p.v.troshin at dundee.ac.uk Mon Mar 28 10:37:39 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 11:37:39 +0100 Subject: [Biojava-l] GSOC 2011 - contributing to 'Amino acids physico-chemical properties calculation project' In-Reply-To: References: <4D872DD2.3030903@dundee.ac.uk> <4D88954D.1050304@dundee.ac.uk> <4D8C9B5B.5040201@dundee.ac.uk> Message-ID: <4D9064F3.5090303@dundee.ac.uk> Hi Shameera, You need to email the Jar file to me or to gsocexercise at gmail.com The exercise page is now updated too - http://biojava.org/wiki/Short_coding_exercise#Submission Thanks for the question! Regards, Peter On 28/03/2011 10:56, Shameera Rathnayaka wrote: > Hi Peter and Andreas, > > I was able to check out the source using GIT. > > I have already implemented the short coding exercise for 'Amino acids > physico-chemical properties calculation project' but as i noticed > there is not any link given to submit the assignment. I need to know > that where should i submit it , or do i have to submit it with my > project proposal? > > Thanks in advance! > > > > > > > > > > > On Fri, Mar 25, 2011 at 9:11 PM, Andreas Prlic > wrote: > > I removed the checkout instructions from the broken anonymus SVN > repository. The recommended way to check out the code is now via > github, either using SVN or GIT... > > Andreas > > On Fri, Mar 25, 2011 at 6:40 AM, Peter Troshin > > wrote: > > Dear Shameera, > > > >>>>As i felt it is required some knowledge about BioSQL so is it > a good > >>>> point to start my next step???? > > > > I am sorry but I do not see how BioSQL relates to this project. > Perhaps more > > explanation from your side would have helped me to see the > connection. > > > >>>>I have some problem with checkout and installing BioJava3 > > > > Me too, I was unable to checkout the BioJava Live from > > svn://code.open-bio.org/biojava/biojava-live/trunk > due to the > following > > error > > > > " A socket operation was attempted to an unreachable host. > > svn: Can't connect to host 'code.open-bio.org > ': A socket operation was > > attempted to an unreachable host. " > > > > However, a git read-only mirror > http://svn.github.com/biojava/biojava.git > > seems to work fine, at least I was able to checkout the project. > > So I'd suggest you to try that. > > > >>>>2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, > as in the > >>>> Biojava : started says i want to set the CLASSPATH variable as > > > > Have you downloaded biojava-all package from > > http://biojava.org/download/bj3.0.1/biojava3.0.1-all.tar.gz or > was this > > something else? I cannot really comment on that one, as I have > never tried > > this. I am coping this message to the BioJava list let?s see if > someone else > > is aware of any problems with this package. > > > > > > I hope this helps. > > > > Regards, > > Peter > > > > > > On 25/03/2011 04 :30, Shameera > Rathnayaka wrote: > >> > >> Sorry for the delay, > >> > >> > >> I went through some sources and got a basic understanding of > the project, > >> and I referred some links in the cookbook also.As i felt it is > required some > >> knowledge about BioSQL so is it a good point to start my next > step???? > >> > >> I have some problem with checkout and installing BioJava3 > >> > >> 1. I logged in using openid log in and tried to checkout via > Developer > >> SVN, but it is asking password for three times and then shows > "svn: Network > >> connection closed unexpectedly" then i used a USB modem but it > gives same > >> result. > >> > >> 2. And i downloaded biojava3(v3.0.1) tar.gz and extracted it, > as in the > >> Biojava : started says i want to set the CLASSPATH variable as > "export > >> CLASSPATH=/home/thomas/ > >> > >> > biojava-live.jar:/home/thomas/bytecode.jar:/home/thomas/commons-cli.jar:/home/thomas/commons-collections- > >> > 2.1.jar:/home/thomas/commons-dbcp-1.1.jar:/home/thomas/commons-pool-1.1.jar:." > >> change according to my directory, in UNIX > Bourne-type shells, > >> but i couldn't find those .jar files in my directory. > >> > >> Thanks > >> > >> > >> On Tue, Mar 22, 2011 at 6:25 PM, Peter Troshin > > >> >> wrote: > >> > >> Hi Shameera, > >> > >> >>>Yes as you said, equations are not the case. But i needed to > >> know will the equations be provided or not :) > >> > >> No one asks you to devise the equations yourself all you need is > >> to be able to read the paper and get the equation out of it. > >> Please note that this is YOUR project and a mentor is there to > >> help but not to provide everything for you. I think the > equations > >> are not hard to find besides if I just give you the > equations you > >> will not understand them thoroughly. As I said before you > need to > >> read the papers and by the source I meant the papers not the > >> source code. If this is something that sounds too hard for you > >> then, you may want to reconsider your decision contributing to > >> BioJava project. BioJava is not only about Java but about > Biology > >> as well. Ability and will to find the information you need is a > >> crucial for the success in any project. > >> In addition to it there are plenty of implementations for the > >> method that you set to implement. My advice to you would be to > >> Google for molecular weight, extinction coefficient, instability > >> index etc. I am sure you will find plenty of information on > these > >> topics. > >> > >> To get you started I am giving you the link to the software that > >> calculates half of the properties you need. There are > formulas and > >> plenty of documentation > http://expasy.org/tools/protparam-doc.html > >> > >> But please next time come back with a little more specific > questions. > >> > >> P.S. I do not envisage any GUI interfaces for the library > you want > >> to develop however, as a student you are free to propose this if > >> you think this would be of benefit to the project. > >> > >> Regards, > >> Peter > >> > >> > >> On 21/03/2011 17 > :32, Shameera Rathnayaka > >> wrote: > >>> > >>> > >>> Yes as you said, equations are not the case. But i needed > to know > >>> will the equations be provided or not :) > >>> And also i'm willing to see those equations. Could you please > >>> help me to find out those equations, > >>> then i could figure it out how to apply multi-threading > >>> technology to speed up the process. > >>> > >>> As i think i need to also develop a GUI except to the methods > >>> isn't it??? > >>> > >>> > >>> sources and > >>> understanding how these things work> > >>> > >>> Yes now im referring to biojava source code . But im in a > little > >>> bit confusion about how to get a start and where the start > should > >>> be? > >>> > >>> > >>> On Mon, Mar 21, 2011 at 4:52 PM, Peter Troshin > >>> > >> wrote: > >>> > >>> Hi Shameera, > >>> > >>> > >>> >>> As a > >>> starting point to > >>> > >>> the project i would like to know about the > >>> > >>> > >>> > >>> >>> Equations which are needed to these > >>> calculations, how > >>> > >>> can i get > >>> > >>> > >>> > >>> >>> those and more about the project. > >>> > >>> It is good that you are interested in the project. > >>> Did you try to find the formula yourself? I will be > happy to > >>> help you if you cannot find them, but really > >>> this must not be hard. Also I believe that you are much > >>> better off reading the sources and > >>> understanding how these things work than just have a > formula. > >>> > >>> Regards, > >>> Peter > >>> > >>> > >>> On 19/03/2011 17:22, Shameera Rathnayaka wrote: > >>> > >>> > hi, > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > Im Shameera Rathnayaka, a third year Undergraduate > >>> of > >>> > >>> Department of > >>> > >>> > >>> > >>> > Computer Science and Engineering University of > >>> Moratuwa ,Sri > >>> > >>> Lanka.I > >>> > >>> > >>> > >>> > am interested in implementing "Amino acids > >>> physico-chemical > >>> > >>> > >>> > >>> > properties calculation" project as my GSOC 2011 > >>> project.I > >>> > >>> have > >>> > >>> > >>> > >>> > previous experience in working with Java, MySql and > >>> > >>> Algorithms from > >>> > >>> > >>> > >>> > my university projects. As my most recent project i > >>> have > >>> > >>> developed a > >>> > >>> > >>> > >>> > visual navigation > >>> > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > (XVisualNavigator) > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > > >>> plugin for openoffice 3.2.1 using java. I am Currently > >>> working as an intern > >>> > at WSO2 > >>> > >>> > >>> which is an open source > >>> > >>> > >>> > >>> > middle-ware development company. > >>> > >>> > >>> > >>> > > >>> > >>> > >>> > >>> > As a starting point to the project i would like to > >>> know about > >>> > >>> the > >>> > >>> > >>> > >>> > Equations which are needed to these calculations, > >>> how can i > >>> > >>> get those > >>> > >>> > >>> > >>> > and more about the project. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- Shameera Rathnayaka > >>> Undergraduate > >>> Department of Computer Science and Engineering > >>> University of Moratuwa. > >>> Sri Lanka. > >>> T.P. 0719221454 > >> > >> > >> > >> > >> -- > >> Shameera Rathnayaka > >> Undergraduate > >> Department of Computer Science and Engineering > >> University of Moratuwa. > >> Sri Lanka. > >> T.P. 0719221454 > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > -- > Shameera Rathnayaka > Undergraduate > Department of Computer Science and Engineering > University of Moratuwa. > Sri Lanka. > T.P. 0719221454 From uchathuranga at gmail.com Mon Mar 28 12:44:03 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Mon, 28 Mar 2011 18:14:03 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns Message-ID: Hi all, I am starting a new thread regarding some concerns about BioJava 1.8 and it's implementation of isoelectric point calculation and calculation the mass of peptides.Also about mapping these to BioJava 3.0 1. Package org.biojava.bio.proteomics - Class_IsoelectricPointCalc In the method "getIsoelectricPoint(SymbolList peptide)" they have used a SymbolList type as the parameter ,if we are going to port to BioJava3.0 ,What are the possible use of parameter instead of SymbolList in BioJava1.8? SymbolList interface in org.biojava.bio.symbol not helpful in understanding the use of SymbolList in Isoelectric Point calculation. 2. Package org.biojava.bio.proteomics - Class_MassCalc In "calcTermMass" method they have added extra H if MH_PLUS is true.I am little bit confuse why we have to add a extra H when calculating term mass? and What is the importance of MH_PLUS prpperty? If there are sample codes or demos how to use these classes and method, can anyone guide me to those? Thanks Regards Udana From uchathuranga at gmail.com Mon Mar 28 13:00:25 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Mon, 28 Mar 2011 18:30:25 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: <4D906127.7020003@dundee.ac.uk> References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> <4D872F38.20305@dundee.ac.uk> <4D906127.7020003@dundee.ac.uk> Message-ID: Hi Peter, Thanks Peter, I have already looked in to the updated short coding exercise page http://biojava.org/wiki/Short_coding_exercise#Submission. In my proposal can I mentioned the use of existing methods to calculate Molecular weight and Isoelectric pointin BioJava1.8? Can I add more related methods to my proposal? One more question regarding the proposal. What will be the input to these methods is it file containing protein sequences or string of the protein?. Thanks Regards Udana From p.v.troshin at dundee.ac.uk Mon Mar 28 14:30:04 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 15:30:04 +0100 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> <4D6698E7.3080202@dundee.ac.uk> <20110224131506.17104xy7rpe7n30g@gator1273.hostgator.com> <4D6BBCB7.3010203@dundee.ac.uk> <4D760FD8.2010002@dundee.ac.uk> <4D762A46.3090204@dundee.ac.uk> <4D76369D.8060403@dundee.ac.uk> <4D872F38.20305@dundee.ac.uk> <4D906127.7020003@dundee.ac.uk> Message-ID: <4D909B6C.7050506@dundee.ac.uk> >>>>In my proposal can I mentioned the use of existing methods to calculate Molecular weight and Isoelectric pointin BioJava1.8? Yes, good idea. >>>Can I add more related methods to my proposal? You definitely should. This is YOUR project, so you decide what to add/remove. If your ideas are sound and compelling you will have a better chance in the competition! >>>What will be the input to these methods is it file containing protein sequences or string of the protein? You should put in the proposal what you think would be the most appropriate method to handle the sequences. This will show how well you understand the task. Whatever you decide will have to be discussed further when it comes to implementation. I hope that helps. Regards, Peter On 28/03/2011 14:00, udana chathuranga wrote: > Hi Peter, > > Thanks Peter, I have already looked in to the updated short coding > exercise page http://biojava.org/wiki/Short_coding_exercise#Submission. > > In my proposal can I mentioned the use of existing methods to > calculate Molecular weight and Isoelectric pointin BioJava1.8? Can I > add more related methods to my proposal? One more question regarding > the proposal. What will be the input to these methods is it file > containing protein sequences or string of the protein?. > > > > Thanks > Regards > Udana > From Wim.DeSmet at UGent.be Mon Mar 28 14:46:33 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Mon, 28 Mar 2011 16:46:33 +0200 Subject: [Biojava-l] aligning sequences with ambiguous bases Message-ID: <4D909F49.1060605@UGent.be> Hi (sorry if you get 2 copies, I sent this to -request by mistake) Apologies if this has come up before, a quick search didn't turn anything up. I'm attempting to do a pairwise alignment between two DNA sequences using biojava 3. When I try to construct a DNASequence from a string that contains an ambiguous base though (in this case 'y'), I get the following stacktrace. Exception in thread "main" org.biojava3.core.exceptions.CompoundNotFoundError: Compound not found for: Cannot find compound for: y at org.biojava3.core.sequence.storage.ArrayListSequenceReader.setContents(ArrayListSequenceReader.java:196) at org.biojava3.core.sequence.template.AbstractSequence.(AbstractSequence.java:88) at org.biojava3.core.sequence.DNASequence.(DNASequence.java:64) Should I attempt to mask them somehow? What's the best way to deal with these? cheers Wim -- Wim De Smet http://www.straininfo.net/ From khalil.elmazouari at gmail.com Mon Mar 28 16:07:15 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 28 Mar 2011 18:07:15 +0200 Subject: [Biojava-l] RichSequence.IOTools performance Message-ID: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> Hi, I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! How one could improve the RichSequence.IOTools performance? Thanks. khalil From andreas at sdsc.edu Mon Mar 28 16:08:28 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 28 Mar 2011 09:08:28 -0700 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns In-Reply-To: References: Message-ID: Hi Udana, > 1. Package org.biojava.bio.proteomics - Class_IsoelectricPointCalc > In the method "getIsoelectricPoint(SymbolList peptide)" they have used a > SymbolList type as the parameter ,if we are going to port to BioJava3.0 > ,What are the possible use of parameter instead of SymbolList in BioJava1.8? The counterpart in the 3.x series would be to use directly the sequence interface or pass in a string representation of a sequence. > 2. Package org.biojava.bio.proteomics - Class_MassCalc > In "calcTermMass" method they have added extra H if MH_PLUS is true.I am > little bit confuse why we have to add a extra H when calculating term mass? > and What is the importance of MH_PLUS prpperty? not sure, somebody else might be able to say more about this > If there are sample codes or demos how to use these classes and method, can > anyone guide me to those? Did you see the cookbook ? http://biojava.org/wiki/BioJava:CookBook1.7#Proteomics Andreas From paiualex12 at gmail.com Mon Mar 28 16:17:35 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Mon, 28 Mar 2011 19:17:35 +0300 Subject: [Biojava-l] Final Application (Paiu Alexandru ) (added project plan) Message-ID: *1. **1.Your complete contact information*, including full name, physical address, preferred email address, and telephone number, plus other pertinent contact information such as IRC handles, etc. Full Name : Paiu Alexandru Address : Country Romania , city Constanta , Bld. Aurel Vlaicu , Nr. 41 , Bl. Pc1 sc. B , Et 6 , Apt. 46 E-mail : paiualex12 at google.com or paiualex12 at yahoo.com Telephone number : 40733924684 *2. **2.Why you are interested in the p*roject you are proposing and are well-suited to undertake it. This project suits me perfectly , because the interested students should have a general knowledge of core Java programming, knowledge of multi-threaded programming . I?ve started learning Java for 1 and a half years , and I used a lot of Threads in applications and projects . This is the only project that I apply , because I haven?t found a more interesting project than this one . *3. **3.A summary of your programming *experience and skills. I?ve did a lot of miniproject and applications for school and for me . I?ve made projects like : a) Lanchat Client-Server using TCP/IP ? I wrote two applications : one for the client and one for the server . I used an JApplet for the client with Swing elements . I?ve used Threads especially in the server sider application , and sockets b) Lanchat Peer-to-Peer using UDP and multicasting . I wrote only a application for the client . I used Threads and multicast sockets . c) A project for administrating a database , using a JApplet with Connector/J and MySql . It has to applications , one for clients and for the administrator . *4. **4.Programs or projects you have previously* authored or contributed to available as open-source, including, if applicable, any past Summer of Code involvement. I haven?t worked yet for any open-source and a I haven?t any past experience with GSoc , and it?s the first time a apply for a open-source project . I haven?t either worked for a company . *5. **5. A project plan for the project* you are proposing, even if your proposed project is directly based on one of the proposed project ideas for member projects. I wish to apply to the project called *Amino Acids physic-chemical properties calculation .* I?ve been thinking since some time of a possible implementation and I stopped at a single one (that I think it?s the best) . I will use two main classes . One that will represent an atom of a substance ( for example He , H , O , etc ) , that will have params like : atom weight , name , abbreviation , valence . I?ll use the second class for constructing amino-acids from this class . So , the second class will extend the class of atoms . So for example I have to initiate a molecule of H20 (water) . I will have a constructor with a string param , that will build the substance . For example , let?s say that the second class it?s called Aminoacids , and the first one Atoms . Let?s say I choose from a Combo box H2O ( it?s only a example) . Then I sad the string ? H2O1? to the aminoacids class , to intiate an object of aminoacid . That constructor will be evaluated char by char . If it?s found a char or two chars that means that I have to initiate an atom of that char or chars . If it?s found a number , then that means that it?s the multiplier of that atom before it . So the class aminoacids will have a private Object [] array , in which will be number and objects called atoms . So for H20 the array will look like this : array[0] = atom of H (Hidrogen) , array[1]=2 , array[2]=O (Oxigen) , array[3]=1 . All the know substances will be in a file called atoms.txt with atom mass , name , abbreviation etc . The atoms class will have a method to add new atoms to the list . And for calculating the molecular weigth the algorithm is very simple . We already have array={H,2,O,1} , and the atoms will have as params the atoms weight so all we have to do is just : Mol. Weight=H.weight*2+O.weigth.*1 The plan for implementing : - May 20-June 20 ? implementing the two classes and the first two methods - June 20 ? 20 July ? Implementing the rest of the methods - 20 July ? until the final ? final retouching , docummentation for end users , and 1 method proposed by me - *6. **6.Any obligations, vacations, or plans* for the summer that may require scheduling during the GSoC work period. I will have School final exams during 20 May ? 20 June . So I won?t be able to work at maximum capacity . That?s all . I *7. PS * I hope you've got my short coding exercise program ( I received a kinda error for sending a mail will atachement) thanks From p.v.troshin at dundee.ac.uk Fri Mar 25 13:01:17 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 25 Mar 2011 13:01:17 +0000 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava In-Reply-To: References: <4D667A55.5040404@dundee.ac.uk> Message-ID: <4D8C921D.7040200@dundee.ac.uk> An HTML attachment was scrubbed... URL: From paiualex12 at gmail.com Sun Mar 27 17:59:13 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Sun, 27 Mar 2011 20:59:13 +0300 Subject: [Biojava-l] Alexandru Paiu(Coding exercise) Message-ID: I've attached to this mail my implementation of the Short Coding Exercise Thanks Alexandru Paiu -------------- next part -------------- A non-text attachment was scrubbed... Name: Paiu Alexandru.rar Type: application/rar Size: 22439 bytes Desc: not available URL: From uchathuranga at gmail.com Mon Mar 28 16:51:33 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Mon, 28 Mar 2011 22:21:33 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns In-Reply-To: References: Message-ID: Hi Andreas, Thanks for the help. Yes, I will look in to the CookBook1.7. Regards Udana From holland at eaglegenomics.com Mon Mar 28 16:15:09 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 28 Mar 2011 17:15:09 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> Message-ID: <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: > Hi, > > I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. > > When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! > > How one could improve the RichSequence.IOTools performance? > > Thanks. > > khalil > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From khalil.elmazouari at gmail.com Mon Mar 28 17:11:58 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 28 Mar 2011 19:11:58 +0200 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> Message-ID: Sequences objects are all in-memory. I agree, 10000 seq in ? 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am trying, as far as I can, to fine tune the app. Regards, khalil On 28 Mar 2011, at 18:15, Richard Holland wrote: > I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. > > > On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: > >> Hi, >> >> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >> >> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >> >> How one could improve the RichSequence.IOTools performance? >> >> Thanks. >> >> khalil >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > From holland at eaglegenomics.com Mon Mar 28 17:23:44 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 28 Mar 2011 18:23:44 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> Message-ID: <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> In which case you've got little option but to rewrite the GenbankFormat module to use NIO or other alternative methods for writing files. However before you do that I suggest you investigate the recent BioJava3 developments to see if they've already done anything in this area - Andy Yates is your man there. On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote: > Sequences objects are all in-memory. > I agree, 10000 seq in ? 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am trying, as far as I can, to fine tune the app. > > Regards, > > khalil > > On 28 Mar 2011, at 18:15, Richard Holland wrote: > >> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. >> >> >> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: >> >>> Hi, >>> >>> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >>> >>> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >>> >>> How one could improve the RichSequence.IOTools performance? >>> >>> Thanks. >>> >>> khalil >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From khalil.elmazouari at gmail.com Mon Mar 28 18:11:17 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 28 Mar 2011 20:11:17 +0200 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> Message-ID: <475BBAC9-2448-495F-85D1-D7BFFA2900D8@gmail.com> Hi, I just did some tests with FastaWriterHelper.writeSequence vs RichSequence.IOTools.writeFasta. Result was not expected!! FastaWriterHelper.writeSequence: 37% of the execution time. RichSequence.IOTools.writeFasta: 11.6% of the execution time. In this test, 9733 protein seq were annotated and 6599 seq out to multiple fasta file. In my case at lease, biojavax is more performant than Biojava3. Regards, Khalil PS: I AM NOT BENCHMARKING BIOJAVAX VS BIOJAVA3!!! On 28 Mar 2011, at 18:53, Scooter Willis wrote: > Khalil > > Biojava3 has significant speed improvements as a complete rewrite of > BioJava 1.7 but does not contain the same feature and functions. You > can do some testing with BioJava3 to see if you find the increased > performance and add port/write the code for the file formats you need > to support. > > Thanks > > Scooter > > On Mon, Mar 28, 2011 at 12:07 PM, Khalil El Mazouari > wrote: >> Hi, >> >> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >> >> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >> >> How one could improve the RichSequence.IOTools performance? >> >> Thanks. >> >> khalil >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> From p.v.troshin at dundee.ac.uk Mon Mar 28 19:44:06 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 28 Mar 2011 20:44:06 +0100 Subject: [Biojava-l] Final Application (Paiu Alexandru ) (added project plan) In-Reply-To: References: Message-ID: <4D90E506.9090305@dundee.ac.uk> Hi Alex, You should not send your proposal directly to me or OBF. Here is what Google said about it: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs#directly Here is the GSoC program http://www.google-melange.com/gsoc/program/home/google/gsoc2011 Please do not forget to send your proposal to Google! Regards, Peter On 28/03/2011 17:17, Alexandru Paiu wrote: > *1. **1.Your complete contact information*, including full name, > physical address, preferred email address, and telephone number, plus other > pertinent contact information such as IRC handles, etc. > > > > Full Name : Paiu Alexandru > > Address : Country Romania , city Constanta , Bld. Aurel Vlaicu , Nr. 41 , > Bl. Pc1 sc. B , Et 6 , Apt. 46 > > E-mail : paiualex12 at google.com or paiualex12 at yahoo.com > > Telephone number : 40733924684 > > > > *2. **2.Why you are interested in the p*roject you are proposing and > are well-suited to undertake it. > > > > This project suits me perfectly , because the interested students should > have a general knowledge of core Java programming, knowledge of > multi-threaded programming . I?ve started learning Java for 1 and a half > years , and I used a lot of Threads in applications and projects . > > This is the only project that I apply , because I haven?t found a more > interesting project than this one . > > > > *3. **3.A summary of your programming *experience and skills. > > > > I?ve did a lot of miniproject and applications for school and for me . I?ve > made projects like : > > a) Lanchat Client-Server using TCP/IP ? I wrote two applications : one > for the client and one for the server . I used an JApplet for the client > with Swing elements . I?ve used Threads especially in the server sider > application , and sockets > > b) Lanchat Peer-to-Peer using UDP and multicasting . I wrote only a > application for the client . I used Threads and multicast sockets . > > c) A project for administrating a database , using a JApplet with > Connector/J and MySql . It has to applications , one for clients and for the > administrator . > > > *4. **4.Programs or projects you have previously* authored or > contributed to available as open-source, including, if applicable, any past > Summer of Code involvement. > > > > I haven?t worked yet for any open-source and a I haven?t any past experience > with GSoc , and it?s the first time a apply for a open-source project . I > haven?t either worked for a company . > > > > *5. **5. A project plan for the project* you are proposing, even if > your proposed project is directly based on one of the proposed project ideas > for member projects. > > > > I wish to apply to the project called *Amino Acids physic-chemical > properties calculation .* > > I?ve been thinking since some time of a possible implementation and I > stopped at a single one (that I think it?s the best) . > > > > I will use two main classes . One that will represent an atom of a substance > ( for example He , H , O , etc ) , that will have params like : atom weight > , name , abbreviation , valence . I?ll use the second class for > constructing amino-acids from this class . So , the second class will extend > the class of atoms . So for example I have to initiate a molecule of H20 > (water) . I will have a constructor with a string param , that will build > the substance . For example , let?s say that the second class it?s called > Aminoacids , and the first one Atoms . > > Let?s say I choose from a Combo box H2O ( it?s only a example) . Then I sad > the string ? H2O1? to the aminoacids class , to intiate an object of > aminoacid . That constructor will be evaluated char by char . If it?s found > a char or two chars that means that I have to initiate an atom of that char > or chars . If it?s found a number , then that means that it?s the multiplier > of that atom before it . > > So the class aminoacids will have a private Object [] array , in which will > be number and objects called atoms . > > So for H20 the array will look like this : array[0] = atom of H (Hidrogen) , > array[1]=2 , array[2]=O (Oxigen) , array[3]=1 . > > All the know substances will be in a file called atoms.txt with atom mass , > name , abbreviation etc . The atoms class will have a method to add new > atoms to the list . > > > > And for calculating the molecular weigth the algorithm is very simple . We > already have array={H,2,O,1} , and the atoms will have as params the atoms > weight so all we have to do is just : > > Mol. Weight=H.weight*2+O.weigth.*1 > > > > The plan for implementing : > > > > - May 20-June 20 ? implementing the two classes and the first two > methods > > - June 20 ? 20 July ? Implementing the rest of the methods > > - 20 July ? until the final ? final retouching , docummentation for > end users , and 1 method proposed by me > > - > > *6. **6.Any obligations, vacations, or plans* for the summer that may > require scheduling during the GSoC work period. > > > > I will have School final exams during 20 May ? 20 June . So I won?t be able > to work at maximum capacity . That?s all . I > > > *7. PS * > > I hope you've got my short coding exercise program ( I received a kinda > error for sending a mail will atachement) > > thanks > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From ayates at ebi.ac.uk Mon Mar 28 21:39:54 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 28 Mar 2011 22:39:54 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> Message-ID: <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> Dang Rich :). At the moment we've not done anything WRT Genbank outputting but would accept anything to help us out with this. As for the performance difference between BJ3 & BJ what happens if you use the writer objects directly with a BufferedOutputStream writer? Have you got any profiling results? It would be very interesting to see where we've lost the performance ... Andy On 28 Mar 2011, at 18:23, Richard Holland wrote: > In which case you've got little option but to rewrite the GenbankFormat module to use NIO or other alternative methods for writing files. However before you do that I suggest you investigate the recent BioJava3 developments to see if they've already done anything in this area - Andy Yates is your man there. > > On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote: > >> Sequences objects are all in-memory. >> I agree, 10000 seq in ? 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am trying, as far as I can, to fine tune the app. >> >> Regards, >> >> khalil >> >> On 28 Mar 2011, at 18:15, Richard Holland wrote: >> >>> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. >>> >>> >>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: >>> >>>> Hi, >>>> >>>> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >>>> >>>> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >>>> >>>> How one could improve the RichSequence.IOTools performance? >>>> >>>> Thanks. >>>> >>>> khalil >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From chapman at cs.wisc.edu Tue Mar 29 08:06:48 2011 From: chapman at cs.wisc.edu (Mark Chapman) Date: Tue, 29 Mar 2011 03:06:48 -0500 Subject: [Biojava-l] aligning sequences with ambiguous bases In-Reply-To: <4D909F49.1060605@UGent.be> References: <4D909F49.1060605@UGent.be> Message-ID: <4D919318.8000602@cs.wisc.edu> Hi Wim, The use of ambiguous nucleotides requires you to use the AmbiguityDNACompoundSet when you create your DNASequence, which means any: new DNASequence() changes to: new DNASequence(, AmbiguityDNACompoundSet.getDNACompoundSet()) I hope that helps, Mark On 3/28/2011 9:46 AM, Wim De Smet wrote: > Hi > > (sorry if you get 2 copies, I sent this to -request by mistake) > > Apologies if this has come up before, a quick search didn't turn anything up. > > I'm attempting to do a pairwise alignment between two DNA sequences using > biojava 3. When I try to construct a DNASequence from a string that contains an > ambiguous base though (in this case 'y'), I get the following stacktrace. > > Exception in thread "main" org.biojava3.core.exceptions.CompoundNotFoundError: > Compound not found for: Cannot find compound for: y > at > org.biojava3.core.sequence.storage.ArrayListSequenceReader.setContents(ArrayListSequenceReader.java:196) > > at > org.biojava3.core.sequence.template.AbstractSequence.(AbstractSequence.java:88) > > at org.biojava3.core.sequence.DNASequence.(DNASequence.java:64) > > Should I attempt to mask them somehow? What's the best way to deal with these? > > cheers > Wim From Wim.DeSmet at UGent.be Tue Mar 29 08:22:16 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Tue, 29 Mar 2011 10:22:16 +0200 Subject: [Biojava-l] aligning sequences with ambiguous bases In-Reply-To: <4D909F49.1060605@UGent.be> References: <4D909F49.1060605@UGent.be> Message-ID: <4D9196B8.6020409@UGent.be> I believe I figured it out. The constructor of DNASequence can take a CompoundSet and passing an AmbiguityDNACompoundSet in there seems to work. There's not a lot of documentation in the javadoc, but it seems to give the behaviour I want. cheers Wim On 28-03-11 16:46, Wim De Smet wrote: > Hi > > (sorry if you get 2 copies, I sent this to -request by mistake) > > Apologies if this has come up before, a quick search didn't turn > anything up. > > I'm attempting to do a pairwise alignment between two DNA sequences > using biojava 3. When I try to construct a DNASequence from a string > that contains an ambiguous base though (in this case 'y'), I get the > following stacktrace. > > Exception in thread "main" > org.biojava3.core.exceptions.CompoundNotFoundError: Compound not found > for: Cannot find compound for: y > at > org.biojava3.core.sequence.storage.ArrayListSequenceReader.setContents(ArrayListSequenceReader.java:196) > > at > org.biojava3.core.sequence.template.AbstractSequence.(AbstractSequence.java:88) > > at org.biojava3.core.sequence.DNASequence.(DNASequence.java:64) > > Should I attempt to mask them somehow? What's the best way to deal with > these? > > cheers > Wim -- Wim De Smet http://www.straininfo.net/ From chapman at cs.wisc.edu Tue Mar 29 08:45:57 2011 From: chapman at cs.wisc.edu (Mark Chapman) Date: Tue, 29 Mar 2011 03:45:57 -0500 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns In-Reply-To: References: Message-ID: <4D919C45.4090102@cs.wisc.edu> Hi Udana, >> 2. Package org.biojava.bio.proteomics - Class_MassCalc >> In "calcTermMass" method they have added extra H if MH_PLUS is true.I am >> little bit confuse why we have to add a extra H when calculating term mass? >> and What is the importance of MH_PLUS prpperty? > > not sure, somebody else might be able to say more about this > Common mass spectrometry techniques use a proton transfer to ionize proteins (or fragments) for analysis. This results in peaks in the mass spectra at the mass of MH+ for each molecule M. -Mark From chapman at cs.wisc.edu Tue Mar 29 09:35:48 2011 From: chapman at cs.wisc.edu (Mark Chapman) Date: Tue, 29 Mar 2011 04:35:48 -0500 Subject: [Biojava-l] question regarding MSA In-Reply-To: References: Message-ID: <4D91A7F4.6010306@cs.wisc.edu> Hi Bo, A starting point for formatted output with conservation symbols is already implemented for pairwise alignments. You can try it out by removing one of the protein ID's on line 16 of the cookbook code and replacing line 30 with: System.out.println(profile.toString(Profile.StringFormat.CLUSTALW)); The code that would need updating for multiple alignments is in SimpleProfile.printConservation and around the call to it in the toString helper method. -Mark On 2/24/2011 2:06 AM, Andreas Prlic wrote: > Hi Bo Li, > > The printing method currently does not add those characters to the > display of the aligned sequences. If you need it you would have to > patch the printing method... > > Andreas > > On Tue, Feb 22, 2011 at 11:16 PM, Bo Li wrote: >> Hi, >> >> Sorry for the bothering. I tried the MSA feature by following the link: >> >> http://www.biojava.org/wiki/BioJava:CookBook3:MSA >> >> However, I can't see the symbols like ".", ":", and "*" like I can see from >> the output ClustalW. >> >> So is there a way for users to obtain such information in the output from >> MSA? >> >> Thanks, >> Bo Li >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From Wim.DeSmet at UGent.be Tue Mar 29 09:59:18 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Tue, 29 Mar 2011 11:59:18 +0200 Subject: [Biojava-l] aligning sequences with ambiguous bases In-Reply-To: <4D919318.8000602@cs.wisc.edu> References: <4D909F49.1060605@UGent.be> <4D919318.8000602@cs.wisc.edu> Message-ID: <4D91AD76.8060305@UGent.be> Hi Mark Thanks! I guess I should have pressed my Get Mail button before sending my second message. Good to know I chose the "correct" solution. cheers Wim On 29-03-11 10:06, Mark Chapman wrote: > Hi Wim, > > The use of ambiguous nucleotides requires you to use the > AmbiguityDNACompoundSet when you create your DNASequence, which means any: > > new DNASequence() > > changes to: > > new DNASequence(, AmbiguityDNACompoundSet.getDNACompoundSet()) > > I hope that helps, > Mark > > > On 3/28/2011 9:46 AM, Wim De Smet wrote: >> Hi >> >> (sorry if you get 2 copies, I sent this to -request by mistake) >> >> Apologies if this has come up before, a quick search didn't turn >> anything up. >> >> I'm attempting to do a pairwise alignment between two DNA sequences using >> biojava 3. When I try to construct a DNASequence from a string that >> contains an >> ambiguous base though (in this case 'y'), I get the following stacktrace. >> >> Exception in thread "main" >> org.biojava3.core.exceptions.CompoundNotFoundError: >> Compound not found for: Cannot find compound for: y >> at >> org.biojava3.core.sequence.storage.ArrayListSequenceReader.setContents(ArrayListSequenceReader.java:196) >> >> >> at >> org.biojava3.core.sequence.template.AbstractSequence.(AbstractSequence.java:88) >> >> >> at org.biojava3.core.sequence.DNASequence.(DNASequence.java:64) >> >> Should I attempt to mask them somehow? What's the best way to deal >> with these? >> >> cheers >> Wim > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Wim De Smet http://www.straininfo.net/ From khalil.elmazouari at gmail.com Tue Mar 29 14:41:13 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 29 Mar 2011 16:41:13 +0200 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> Message-ID: <7AACD1B8-6215-4682-9BE6-6646BC6C66CC@gmail.com> Hi, using nio, the app performance improved well. App tested for 6599 annotated genbank seq. 1. RichSequence.IOTools.writeGenbank(myFileOutputStream, mySeq, null): 57% of app exec time. 2. writing mySeq -> byteArrayOutputStream -> byteBuffer -> fileChannel (code below): 31% of exec time. ByteArrayOutputStream baos = new ByteArrayOutputStream(); RichSequence.IOTools.writeGenbank(baos, mySeq, null); ByteBuffer buf = ByteBuffer.wrap(baos.toByteArray()); fileChannel.write(buf); any suggestion on how to improve the performance (further ;-)) is welcome. Regards, khalil On 28 Mar 2011, at 23:39, Andy Yates wrote: > Dang Rich :). > > At the moment we've not done anything WRT Genbank outputting but would accept anything to help us out with this. > > As for the performance difference between BJ3 & BJ what happens if you use the writer objects directly with a BufferedOutputStream writer? Have you got any profiling results? It would be very interesting to see where we've lost the performance ... > > Andy > > On 28 Mar 2011, at 18:23, Richard Holland wrote: > >> In which case you've got little option but to rewrite the GenbankFormat module to use NIO or other alternative methods for writing files. However before you do that I suggest you investigate the recent BioJava3 developments to see if they've already done anything in this area - Andy Yates is your man there. >> >> On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote: >> >>> Sequences objects are all in-memory. >>> I agree, 10000 seq in ? 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am trying, as far as I can, to fine tune the app. >>> >>> Regards, >>> >>> khalil >>> >>> On 28 Mar 2011, at 18:15, Richard Holland wrote: >>> >>>> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too. >>>> >>>> >>>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: >>>> >>>>> Hi, >>>>> >>>>> I am developing a sequence annotation app. It should handle ? 100.000 sequence per run. >>>>> >>>>> When profiling the app (with 10.000 seq), the total execution time was ? 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >>>>> >>>>> How one could improve the RichSequence.IOTools performance? >>>>> >>>>> Thanks. >>>>> >>>>> khalil >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>> >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > From khalil.elmazouari at gmail.com Tue Mar 29 21:47:37 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 29 Mar 2011 23:47:37 +0200 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> Message-ID: <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> Hi I am using netbeans profiler. The total exec time was ? 20s (macbook pro i7, 4GB, SSD) for ? 10.000 seq. By writing the RichSequence object to ByteArrayOutputStream -> FileChannel, where appropriate, the total exec time dropped to 7s. Huge improvement, for the app I am developing. The app will be used to analyze ? 100,000 sequence per run. Regards, khalil On 29 Mar 2011, at 22:13, Scooter Willis wrote: > Instead of percentage metrics can you get the time before and after the write execution for comparison without profiling. What profiler are you using? > > >> On Mar 28, 2011 5:39 PM, "Andy Yates" wrote: >> >> Dang Rich :). >> >> At the moment we've not done anything WRT Genbank outputting but would accept anything to help us out with this. >> >> As for the performance difference between BJ3 & BJ what happens if you use the writer objects directly with a BufferedOutputStream writer? Have you got any profiling results? It would be very interesting to see where we've lost the performance ... >> >> Andy >> >> On 28 Mar 2011, at 18:23, Richard Holland wrote: >> >> > In which case you've got little option but to r... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open... >> > From uchathuranga at gmail.com Wed Mar 30 03:05:11 2011 From: uchathuranga at gmail.com (udana chathuranga) Date: Wed, 30 Mar 2011 08:35:11 +0530 Subject: [Biojava-l] Isoelectric point and molecular weight calculations with BioJava - Conserns In-Reply-To: <4D919C45.4090102@cs.wisc.edu> References: <4D919C45.4090102@cs.wisc.edu> Message-ID: Hi Mark, Thanks Mark. Really appreciate your help, If I have any more questions, I will post to this thread. Thanks, Regards, Udana. From rmb32 at cornell.edu Tue Mar 29 21:20:41 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 29 Mar 2011 14:20:41 -0700 Subject: [Biojava-l] Announcing OBF Summer of Code - please forward! Message-ID: <4D924D29.3020707@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 8! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2011 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2011 Applications due 19:00 UTC, April 8, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 28 through Friday, April 8th, 2011. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2011 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs From paiualex12 at gmail.com Wed Mar 30 12:50:08 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Wed, 30 Mar 2011 15:50:08 +0300 Subject: [Biojava-l] Short Coding exercise version 2 ( improved version ) (Alexandru Paiu) Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: runmev2.rar Type: application/rar Size: 29223 bytes Desc: not available URL: From ayates at ebi.ac.uk Thu Mar 31 07:57:33 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 31 Mar 2011 08:57:33 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> Message-ID: <1D956153-515B-4E0B-AC95-428426A31BEE@ebi.ac.uk> Makes a lot of sense. There's no way of knowing if a stream is buffered unless the top level object given was an instance of BufferedOutputStream. Does this mean that by some fluke we could buffer a buffered stream? TBH I'm more glad that we've got the speed back :). Andy On 30 Mar 2011, at 20:38, Scooter Willis wrote: > Khalil > > For BioJava3 FastaWriter was simply using an OutputStream where its > use was wrapped by FastaWriterHelper which was not using a > BufferedOutputStream. I made changes to FastaWriter to check if the > OutputStream is an instance of BufferedOutputStream and if not create > one locally and the close when returning. The writing of 10,000 > sequences or 4.5MB of data went from 15 seconds to .6 seconds. I > checked in the code change if you wanted to test using your code. > > Thanks > > Scooter > > On Tue, Mar 29, 2011 at 5:47 PM, Khalil El Mazouari > wrote: >> Hi >> I am using netbeans profiler. >> The total exec time was ? 20s (macbook pro i7, 4GB, SSD) for ? 10.000 seq. >> By writing the RichSequence object to ByteArrayOutputStream -> FileChannel, >> where appropriate, the total exec time dropped to 7s. Huge improvement, for >> the app I am developing. The app will be used to analyze ? 100,000 sequence >> per run. >> Regards, >> khalil >> >> On 29 Mar 2011, at 22:13, Scooter Willis wrote: >> >> Instead of percentage metrics can you get the time before and after the >> write execution for comparison without profiling. What profiler are you >> using? >> >> On Mar 28, 2011 5:39 PM, "Andy Yates" wrote: >> >> Dang Rich :). >> >> At the moment we've not done anything WRT Genbank outputting but would >> accept anything to help us out with this. >> >> As for the performance difference between BJ3 & BJ what happens if you use >> the writer objects directly with a BufferedOutputStream writer? Have you got >> any profiling results? It would be very interesting to see where we've lost >> the performance ... >> >> Andy >> >> On 28 Mar 2011, at 18:23, Richard Holland wrote: >> >>> In which case you've got little option but to r... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open... >> >> -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Thu Mar 31 11:01:33 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 31 Mar 2011 12:01:33 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> <1D956153-515B-4E0B-AC95-428426A31BEE@ebi.ac.uk> Message-ID: Won't that close down the underlying stream which was given in the first place? Not sure if anyone would notice it TBH but it could look odd that the level responsible for creating the original (file) stream isn't responsible for closing it Andy On 31 Mar 2011, at 11:59, Scooter Willis wrote: > Andy > > I check if OutputStream is an instance of BufferedOutputStrem. If it is don't do anything. If not created a local BufferedOutputStream use it then close it and return. > > Scooter > > >> On Mar 31, 2011 3:57 AM, "Andy Yates" wrote: >> >> Makes a lot of sense. There's no way of knowing if a stream is buffered unless the top level object given was an instance of BufferedOutputStream. Does this mean that by some fluke we could buffer a buffered stream? >> >> TBH I'm more glad that we've got the speed back :). >> >> Andy >> >> On 30 Mar 2011, at 20:38, Scooter Willis wrote: >> >> > Khalil >> > >> > For BioJava3 FastaWriter was simply ... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1... >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From holland at eaglegenomics.com Thu Mar 31 11:14:25 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 31 Mar 2011 12:14:25 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> <1D956153-515B-4E0B-AC95-428426A31BEE@ebi.ac.uk> Message-ID: <47AE0504-62D4-41EF-B234-7F1932026DAF@eaglegenomics.com> Closing BufferedOutputStream will close the parent stream too, as it inherits this behaviour from FilterOutputStream. On 31 Mar 2011, at 12:07, Scooter Willis wrote: > I don't think closing the BufferedOutputStream by contract should close a parent stream. Those that open should be responsible for closing. I have seen cases where you don't close BufferedOutputStream you don't get a flush of all data if you just close parent OutputStream. I will test it. > > >> On Mar 31, 2011 7:01 AM, "Andy Yates" wrote: >> >> Won't that close down the underlying stream which was given in the first place? Not sure if anyone would notice it TBH but it could look odd that the level responsible for creating the original (file) stream isn't responsible for closing it >> >> Andy >> >> On 31 Mar 2011, at 11:59, Scooter Willis wrote: >> >> > Andy >> > >> > I check if OutputStream is an instance... >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)12... >> > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Thu Mar 31 12:28:40 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 31 Mar 2011 13:28:40 +0100 Subject: [Biojava-l] RichSequence.IOTools performance In-Reply-To: References: <85071D61-DC7A-4A32-AF88-0EA633CEBD6E@gmail.com> <209DB3C7-E480-48A1-BC16-50503FD3CA28@eaglegenomics.com> <3D29F154-CCCF-4108-95EF-C1A3ED22171E@eaglegenomics.com> <308AD3B4-FCA8-455F-BBE5-DE4B4649FAB9@ebi.ac.uk> <0BAEA5F0-B298-417D-9416-207321E1AC2D@gmail.com> <1D956153-515B-4E0B-AC95-428426A31BEE@ebi.ac.uk> <47AE0504-62D4-41EF-B234-7F1932026DAF@eaglegenomics.com> Message-ID: Really it should require the minimum necessary, i.e. a simple OutputStream. If the user wants to improve that performance then they can pass in something better, e.g. BufferedOutputStream. By requiring BufferedOutputStream from the start you rule out users who want to use other mechanisms, e.g. FileOutputStream, ByteArrayOutputStream, etc. On 31 Mar 2011, at 13:23, Scooter Willis wrote: > Richard > > Just tested and it does look like it gets closed. I removed the close > of the BufferedOutputStream and replaced with a flush. In looking > through the BufferedOutputStream.java it looks like it should be > garbage collected as it doesn't pass any references to itself > anywhere. I checked in the change. > > The other option is to only allow FastaWriter to take a > BufferedOutputStream or change the code in FastaWriterHelper to use > BufferedOutputStream. Any suggestions on best practice for the API to > protect the innocent who want speed? Can you think of any reason we > would not use a BufferedOutputStream when writing? > > Scooter > > On Thu, Mar 31, 2011 at 7:14 AM, Richard Holland > wrote: >> Closing BufferedOutputStream will close the parent stream too, as it inherits this behaviour from FilterOutputStream. >> >> On 31 Mar 2011, at 12:07, Scooter Willis wrote: >> >>> I don't think closing the BufferedOutputStream by contract should close a parent stream. Those that open should be responsible for closing. I have seen cases where you don't close BufferedOutputStream you don't get a flush of all data if you just close parent OutputStream. I will test it. >>> >>> >>>> On Mar 31, 2011 7:01 AM, "Andy Yates" wrote: >>>> >>>> Won't that close down the underlying stream which was given in the first place? Not sure if anyone would notice it TBH but it could look odd that the level responsible for creating the original (file) stream isn't responsible for closing it >>>> >>>> Andy >>>> >>>> On 31 Mar 2011, at 11:59, Scooter Willis wrote: >>>> >>>>> Andy >>>>> >>>>> I check if OutputStream is an instance... >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)12... >>>> >>> >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From rmb32 at cornell.edu Thu Mar 31 21:58:52 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 31 Mar 2011 14:58:52 -0700 Subject: [Biojava-l] Reminder: GSoC proposals due in 1 week Message-ID: <4D94F91C.1080005@cornell.edu> Hi all, Just a reminder, Google Summer of Code student applications are due April 8! If you're a student planning to apply to GSoC with OBF, it's very much in your best interest to write your proposal *early*, like now, and get it into the hands of the developers and mentors on your subproject (BioPerl/Ruby/Python/etc) so that they can give you some feedback on it. The final proposals must, of course, still be submitted to Google through the GSoC web application, as described on the main GSoC site (http://www.google-melange.com/gsoc/homepage/google/gsoc2011). Rob Buels OBF GSoC 2011 Administrator From rmb32 at cornell.edu Thu Mar 31 22:04:49 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 31 Mar 2011 15:04:49 -0700 Subject: [Biojava-l] GSoC call for mentors Message-ID: <4D94FA81.5090701@cornell.edu> Hi all, For current developers on OBF projects: If you would not mind being a mentor to a Summer of Code student this summer, please make sure you sign up as an OBF mentor in the GSoC web app. There's a link under "mentors: apply now!" midway down the page at http://www.google-melange.com/. If you didn't do last year's summer of code, it would be a good idea to drop me an email introducing yourself, as well, or I won't know whether to approve your request. :-) Being signed up as an OBF GSoC mentor will give you access to the student proposals, as they come in, and the ability to comment on them and assign scores to the ones you think show the most promise. If you sign up as a mentor, please also add yourself to the two OBF GSoC mailing lists: OBF-GSoC and OBF-GSoC-mentors OBF-GSoC list: http://lists.open-bio.org/mailman/listinfo/gsoc OBF mentors: http://lists.open-bio.org/mailman/listinfo/gsoc-mentors Thanks in advance! Rob --- Robert Buels OBF GSoC 2011 Administrator