From rmb32 at cornell.edu Sun Aug 1 15:17:14 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 01 Aug 2010 12:17:14 -0700 Subject: [Biojava-l] GMOD Evo Hackathon Open Call for Participation Message-ID: <4C55C83A.3060700@cornell.edu> We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From darnells at dnastar.com Mon Aug 16 18:26:13 2010 From: darnells at dnastar.com (Steve Darnell) Date: Mon, 16 Aug 2010 17:26:13 -0500 Subject: [Biojava-l] SITE records in PDBFileReader Message-ID: I'm sorry for reposting this message. I accidentally sent the previous one as HTML. ________________________________________ From: Steve Darnell Sent: Monday, August 16, 2010 5:19 PM To: 'biojava-l at lists.open-bio.org' Subject: SITE records in PDBFileReader Greetings, I am interested in parsing SITE records from a PDB file. ?I looked over the org.biojava.bio.structure API, but I was unable to find reference to this functionality. ?Does the PDBFileReader in BioJava extract SITE record information?? If not, would it be possible to add this capability to PDBFileReader and the Structure class? SITE record format at wwPDB: http://www.wwpdb.org/documentation/format32/sect7.html Regards, Steve Darnell From darnells at dnastar.com Mon Aug 16 18:19:28 2010 From: darnells at dnastar.com (Steve Darnell) Date: Mon, 16 Aug 2010 17:19:28 -0500 Subject: [Biojava-l] SITE records in PDBFileReader Message-ID: Greetings, I am interested in parsing SITE records from a PDB file. I looked over the org.biojava.bio.structure API, but I was unable to find reference to this functionality. Does the PDBFileReader in BioJava extract SITE record information? If not, would it be possible to add this capability to PDBFileReader and the Structure class? SITE record format at wwPDB: http://www.wwpdb.org/documentation/format32/sect7.html Regards, Steve Darnell From andreas at sdsc.edu Mon Aug 16 18:49:56 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 16 Aug 2010 15:49:56 -0700 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Hi Steve, thanks for the feature request. I will probably be able to add this at some point in September. If you need it already before that, I will be happy to commit a patch if somebody else provides it... Andreas On Mon, Aug 16, 2010 at 3:26 PM, Steve Darnell wrote: > I'm sorry for reposting this message. I accidentally sent the previous one > as HTML. > > ________________________________________ > From: Steve Darnell > Sent: Monday, August 16, 2010 5:19 PM > To: 'biojava-l at lists.open-bio.org' > Subject: SITE records in PDBFileReader > > Greetings, > > I am interested in parsing SITE records from a PDB file. I looked over the > org.biojava.bio.structure API, but I was unable to find reference to this > functionality. Does the PDBFileReader in BioJava extract SITE record > information? If not, would it be possible to add this capability to > PDBFileReader and the Structure class? > > SITE record format at wwPDB: > http://www.wwpdb.org/documentation/format32/sect7.html > > Regards, > Steve Darnell > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Mon Aug 16 19:58:48 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 16 Aug 2010 16:58:48 -0700 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: - Take a look at PDBFileParser.java and at http://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the new PDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. Hope that makes sense... Andreas - On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary wrote: > If you like It would be my pleasure to do it for you, > Just tell me where to start (in the code). > > Amr > > > -------------------------------------------------- > From: "Andreas Prlic" > Sent: Tuesday, August 17, 2010 12:49 AM > To: "Steve Darnell" > Cc: > Subject: Re: [Biojava-l] SITE records in PDBFileReader > > > Hi Steve, >> >> thanks for the feature request. I will probably be able to add this at >> some >> point in September. If you need it already before that, I will be happy to >> commit a patch if somebody else provides it... >> >> Andreas >> >> >> On Mon, Aug 16, 2010 at 3:26 PM, Steve Darnell >> wrote: >> >> I'm sorry for reposting this message. I accidentally sent the previous >>> one >>> as HTML. >>> >>> ________________________________________ >>> From: Steve Darnell >>> Sent: Monday, August 16, 2010 5:19 PM >>> To: 'biojava-l at lists.open-bio.org' >>> Subject: SITE records in PDBFileReader >>> >>> Greetings, >>> >>> I am interested in parsing SITE records from a PDB file. I looked over >>> the >>> org.biojava.bio.structure API, but I was unable to find reference to this >>> functionality. Does the PDBFileReader in BioJava extract SITE record >>> information? If not, would it be possible to add this capability to >>> PDBFileReader and the Structure class? >>> >>> SITE record format at wwPDB: >>> http://www.wwpdb.org/documentation/format32/sect7.html >>> >>> Regards, >>> Steve Darnell >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From amr_alhossary at hotmail.com Mon Aug 16 19:48:18 2010 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Tue, 17 Aug 2010 01:48:18 +0200 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: If you like It would be my pleasure to do it for you, Just tell me where to start (in the code). Amr -------------------------------------------------- From: "Andreas Prlic" Sent: Tuesday, August 17, 2010 12:49 AM To: "Steve Darnell" Cc: Subject: Re: [Biojava-l] SITE records in PDBFileReader > Hi Steve, > > thanks for the feature request. I will probably be able to add this at > some > point in September. If you need it already before that, I will be happy to > commit a patch if somebody else provides it... > > Andreas > > > On Mon, Aug 16, 2010 at 3:26 PM, Steve Darnell > wrote: > >> I'm sorry for reposting this message. I accidentally sent the previous >> one >> as HTML. >> >> ________________________________________ >> From: Steve Darnell >> Sent: Monday, August 16, 2010 5:19 PM >> To: 'biojava-l at lists.open-bio.org' >> Subject: SITE records in PDBFileReader >> >> Greetings, >> >> I am interested in parsing SITE records from a PDB file. I looked over >> the >> org.biojava.bio.structure API, but I was unable to find reference to this >> functionality. Does the PDBFileReader in BioJava extract SITE record >> information? If not, would it be possible to add this capability to >> PDBFileReader and the Structure class? >> >> SITE record format at wwPDB: >> http://www.wwpdb.org/documentation/format32/sect7.html >> >> Regards, >> Steve Darnell >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jbdundas at gmail.com Mon Aug 16 22:43:02 2010 From: jbdundas at gmail.com (jitesh dundas) Date: Tue, 17 Aug 2010 08:13:02 +0530 Subject: [Biojava-l] BioJava 3Proposal tasks Message-ID: Dear All, Sorry I am sending this again ,but I don't see it in the list anywhere.please post it. I went through the BioJava3 proposal as you mentioned earlier..There are a few things that I could take up without much worries... I can find out how Hibernate can be best deployed for BioJava. PLease note that I suggest we use only hibernate3 or higher versions. HIbernate2 has implementation and performace issues.. I can also look at Spring after this task is done.. I can find out the architectural and implementation issues in Biojava. I am strong in Analysis and could do all this reasonably well.. I just want someone to share my concerns with and validate the findings.. Analyse how BioJava is being used by the community. See the UsageAnalysis page. I can do these.. To start from scratch, creating a number of smaller jars as sub-projects within an umbrella BioJava3 project. Each jar would provide tools for a specific purpose. Additional jars would provide cross-purpose tools such as format converters or text-to-object interfaces. Possibly built using Maven instead of Ant. Although starting from scratch, much existing code could be reused or refactored to suit the new design. We would take full advantage of Java 6, including generics, (@)annotations, the built-in property change support. Everything would be a bean - absolutely everything. We would aim to be fully Java EE compliant, with the majority of components fully reusable as a bean in any other application, just like Spring's components are. We would adhere rigidly to a common coding style and heavily comment the code. We should make it able to focus on any aspect the user requires and keep its efficiency, removing its dependency on everything being sequence-related. SymbolLists and Alphabets to be rethought as these are the most common stumbling block. Make methods parallel-aware and take advantage of this when possible, and provide a global variable to specify how much parallelisation can take place. - I am very interested in this and would liek to take this up asap Sir.. JDK 1.5 has parallel programming extension to use and we can define a common method or mode for executing existing code or functionalities..However, impact analysis will be needed as NOT ALL CODE CAN BE MADE PARALLEL COMPLIANT DUE TO IMPLEMENTATION ISSUES>>WILL NEED THOROUGH CHECKING...i can do this.. Please reply and advise which i should take up first ..Points in bold are of particular interest to me..Even those beyond those list are welcome ... Regards, JD From darnells at dnastar.com Tue Aug 17 12:00:33 2010 From: darnells at dnastar.com (Steve Darnell) Date: Tue, 17 Aug 2010 11:00:33 -0500 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Andreas and Amr, Thank you very much for agreeing to add this feature. May I make one additional refinement to my request? REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER code in use in the SITE records. Could the site name also be associated with the site identifier and residues? There is precedence for parsing REMARK records in BioJava (e.g. experiment type, resolution), but this is a special case where REMARK 800 and SITE records are dependent on one another and physically separated in the header. Regards, Steve ________________________________________ From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Monday, August 16, 2010 6:59 PM To: Amr AL-Hossary Cc: Steve Darnell; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader - Take a look at PDBFileParser.java and at?http://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the new?PDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. Hope that makes sense... Andreas -? On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary wrote: If you like It would be my pleasure to do it for you, Just tell me where to start (in the code). Amr From amr_alhossary at hotmail.com Tue Aug 17 13:36:55 2010 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Tue, 17 Aug 2010 19:36:55 +0200 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: I'll see it in a couple of days. I have first to be able to check out & in the source code. All I found till now is anonymous access. Amr -------------------------------------------------- From: "Steve Darnell" Sent: Tuesday, August 17, 2010 6:00 PM To: "Andreas Prlic" ; "Amr AL-Hossary" Cc: Subject: RE: [Biojava-l] SITE records in PDBFileReader > Andreas and Amr, > > Thank you very much for agreeing to add this feature. May I make one > additional refinement to my request? > > REMARK 800 provides a very useful SITE_DESCRIPTION for each > SITE_IDENTIFIER code in use in the SITE records. Could the site name also > be associated with the site identifier and residues? There is precedence > for parsing REMARK records in BioJava (e.g. experiment type, resolution), > but this is a special case where REMARK 800 and SITE records are dependent > on one another and physically separated in the header. > > Regards, > Steve > > ________________________________________ > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf > Of Andreas Prlic > Sent: Monday, August 16, 2010 6:59 PM > To: Amr AL-Hossary > Cc: Steve Darnell; biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SITE records in PDBFileReader > > - Take a look at PDBFileParser.java and > athttp://www.wwpdb.org/documentation/format32/sect7.html > - It needs a new Handler method for the Site records that builds up the > data containers. > - Create a new bean that will contain the data for the SITE record > - Instead of having fields for insertion code residue nr and chain IDs, > you can use the newPDBResidueNumber.java class to group this together. > - Add a get/set method for the Site beans to the Structure class > - Create a junit test that make sure the parsing works ok. > > Hope that makes sense... > Andreas > > > - > On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary > wrote: > If you like It would be my pleasure to do it for you, > Just tell me where to start (in the code). > > Amr > > From andreas at sdsc.edu Tue Aug 17 14:04:19 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 17 Aug 2010 11:04:19 -0700 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Hi Amr, thanks for taking this on. For a first time contributor, it is probably best to post your patches to the list, so somebody else can take a look at them first and commit them for you. Andreas On Tue, Aug 17, 2010 at 10:36 AM, Amr AL-Hossary wrote: > I'll see it in a couple of days. I have first to be able to check out & in > the source code. > All I found till now is anonymous access. > > Amr > > -------------------------------------------------- > From: "Steve Darnell" > Sent: Tuesday, August 17, 2010 6:00 PM > To: "Andreas Prlic" ; "Amr AL-Hossary" < > amr_alhossary at hotmail.com> > Cc: > Subject: RE: [Biojava-l] SITE records in PDBFileReader > > Andreas and Amr, >> >> Thank you very much for agreeing to add this feature. May I make one >> additional refinement to my request? >> >> REMARK 800 provides a very useful SITE_DESCRIPTION for each >> SITE_IDENTIFIER code in use in the SITE records. Could the site name also >> be associated with the site identifier and residues? There is precedence >> for parsing REMARK records in BioJava (e.g. experiment type, resolution), >> but this is a special case where REMARK 800 and SITE records are dependent >> on one another and physically separated in the header. >> >> Regards, >> Steve >> >> ________________________________________ >> From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf >> Of Andreas Prlic >> Sent: Monday, August 16, 2010 6:59 PM >> To: Amr AL-Hossary >> Cc: Steve Darnell; biojava-l at lists.open-bio.org >> Subject: Re: [Biojava-l] SITE records in PDBFileReader >> >> - Take a look at PDBFileParser.java and athttp:// >> www.wwpdb.org/documentation/format32/sect7.html >> >> - It needs a new Handler method for the Site records that builds up the >> data containers. >> - Create a new bean that will contain the data for the SITE record >> - Instead of having fields for insertion code residue nr and chain IDs, >> you can use the newPDBResidueNumber.java class to group this together. >> >> - Add a get/set method for the Site beans to the Structure class >> - Create a junit test that make sure the parsing works ok. >> >> Hope that makes sense... >> Andreas >> >> >> - >> On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary < >> amr_alhossary at hotmail.com> wrote: >> If you like It would be my pleasure to do it for you, >> Just tell me where to start (in the code). >> >> Amr >> >> >> -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Wed Aug 18 14:26:23 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 18 Aug 2010 11:26:23 -0700 Subject: [Biojava-l] Last week of Google Summer of Code Message-ID: Hi, This is the last week of this year's Google Summer of Code project and I am happy to announce that our two students Mark Chapman and Jianjiong Gao did an amazing job on their two projects "All Java Multiple Sequence Alignment" (MSA) and "Identification and Classification of Posttranslational Modification of Proteins" (PTM). For Multiple Sequence Alignments we?now have a flexible and multi-threaded MSA implementation that works in linear space and that, as an option, allows the users to define anchors that are used in the build up of the multiple alignment. The code is available as part of the new biojava3-alignment module. The Posttranslational Modification module (biojava3-protmod) can detect three different types of protein modifications in protein structures. It comes with an XML file & Java data structures to store information about different types of protein modifications, and contains entries from RESID, PDBCC and PSI-MOD. There is also a visualisation component to display cross linked PTM on a sequence viewer. Both Mark and Jianjiong have expressed their interest in maintaining and further developing their modules and I am looking forward to interacting more with them in the future. I want to thank the Mentors and Co-Mentors Peter Rose, Kyle Ellrott and Scooter Willis for their help and guidance for the projects, without them this would not have been possible. Thanks also to Robert Buels and the ?Open Bioinformatics Foundation for organizing our applications for GSoC and last, but not least, Google for sponsoring this Summer of Code. Happy BioJava-ing, Andreas From andrew.mcsweeny at rockets.utoledo.edu Wed Aug 18 18:53:54 2010 From: andrew.mcsweeny at rockets.utoledo.edu (McSweeny, Andrew J) Date: Wed, 18 Aug 2010 22:53:54 +0000 Subject: [Biojava-l] Annotations question Message-ID: <469B4CD3D7690A418E8F96B7BA4585F81202C15E@BL2PRD0103MB052.prod.exchangelabs.com> Hello, I am interested in using BioJava to determine which features are located where on the assembled chromosome 21 (chr21.fa) from the UCSC genome browser website. An example of something I would like to do is to pick a position at random (1-48,129,895) and then determine whether there are any exons or introns on the plus or minus strand. What classes do I need to be familiar with to do this? -Andrew From rmb32 at cornell.edu Thu Aug 19 13:09:45 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 10:09:45 -0700 Subject: [Biojava-l] reminder: Aug 25 deadline for GMOD Hackathon application Message-ID: <4C6D6559.3080809@cornell.edu> Hi all, This is your one-week reminder: the deadline for open applications to the GMOD Evo hackathon is Wednesday, August 25th. Rob ======================================== We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From amr_alhossary at hotmail.com Fri Aug 27 07:57:16 2010 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Fri, 27 Aug 2010 13:57:16 +0200 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: I sent the updated code as an attachment to the group, as well as to Andreas Prlic; Steve Darnell; jacobsen at ebi.ac.uk; to be reviewed for submission. It seems that the group daemon prevents attachments whatever small is their size. Please feed me back if it wasn't delivered correctly. This submitted updates handle dealing with "SITE" records to a sufficient degree (but didn't handle REMARK 800 yet) to achieve this goal I had to create a new bean called "Residue". It is implemented as a static inner class inside PDBSite (and it can be extracted to be a top level class if needed). I created it because I couldn't use any of the subclasses of Group class (e.g. HOH is neither an amino acid, nor a nucleotide). I guess this should be discussed on the biojava-dev mail list if any body is interested and if it suits the list policy. I also have some comments on the already present code that needs to be discussed. to whom shall I address my comments? Regards Amr From: Andreas Prlic Sent: Tuesday, August 17, 2010 8:04 PM To: Amr AL-Hossary Cc: Steve Darnell ; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader Hi Amr, thanks for taking this on. For a first time contributor, it is probably best to post your patches to the list, so somebody else can take a look at them first and commit them for you. Andreas On Tue, Aug 17, 2010 at 10:36 AM, Amr AL-Hossary wrote: I'll see it in a couple of days. I have first to be able to check out & in the source code. All I found till now is anonymous access. Amr -------------------------------------------------- From: "Steve Darnell" Sent: Tuesday, August 17, 2010 6:00 PM To: "Andreas Prlic" ; "Amr AL-Hossary" Cc: Subject: RE: [Biojava-l] SITE records in PDBFileReader Andreas and Amr, Thank you very much for agreeing to add this feature. May I make one additional refinement to my request? REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER code in use in the SITE records. Could the site name also be associated with the site identifier and residues? There is precedence for parsing REMARK records in BioJava (e.g. experiment type, resolution), but this is a special case where REMARK 800 and SITE records are dependent on one another and physically separated in the header. Regards, Steve ________________________________________ From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Monday, August 16, 2010 6:59 PM To: Amr AL-Hossary Cc: Steve Darnell; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader - Take a look at PDBFileParser.java and athttp://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the newPDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. Hope that makes sense... Andreas - On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary wrote: If you like It would be my pleasure to do it for you, Just tell me where to start (in the code). Amr -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jbdundas at gmail.com Fri Aug 27 10:44:46 2010 From: jbdundas at gmail.com (jitesh dundas) Date: Fri, 27 Aug 2010 20:14:46 +0530 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Hi, Thanks & nice work.I think you need to tell your module lead about that.. Hibernate inclusion isnot a good idea for BioJava.It is slow & XML based, thus big data files will be affected. I think we need a plugin framework with better features that deploy functionalities ,which biologists look for.. I have been doing analysis on the BioJava 3 proposal and have some concerns on this, besides the other analysis that is present. I will be sending it to my lead, Andreas Sir (not Andreas Prilic) on this. Regards, JD On 8/27/10, Amr AL-Hossary wrote: > I sent the updated code as an attachment to the group, as well as to Andreas > Prlic; Steve Darnell; > jacobsen at ebi.ac.uk; to be reviewed for submission. > > It seems that the group daemon prevents attachments whatever small is their > size. > Please feed me back if it wasn't delivered correctly. > > This submitted updates handle dealing with "SITE" records to a sufficient > degree (but didn't handle REMARK 800 yet) > > to achieve this goal I had to create a new bean called "Residue". It is > implemented as a static inner class inside PDBSite (and it can be extracted > to be a top level class if needed). > > I created it because I couldn't use any of the subclasses of Group class > (e.g. HOH is neither an amino acid, nor a nucleotide). > > I guess this should be discussed on the biojava-dev mail list if any body is > interested and if it suits the list policy. > I also have some comments on the already present code that needs to be > discussed. to whom shall I address my comments? > > Regards > > Amr > From: Andreas Prlic > Sent: Tuesday, August 17, 2010 8:04 PM > To: Amr AL-Hossary > Cc: Steve Darnell ; biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SITE records in PDBFileReader > > > Hi Amr, > > thanks for taking this on. For a first time contributor, it is probably > best to post your patches to the list, so somebody else can take a look at > them first and commit them for you. > > Andreas > > > > On Tue, Aug 17, 2010 at 10:36 AM, Amr AL-Hossary > wrote: > > I'll see it in a couple of days. I have first to be able to check out & in > the source code. > All I found till now is anonymous access. > > Amr > > -------------------------------------------------- > From: "Steve Darnell" > Sent: Tuesday, August 17, 2010 6:00 PM > To: "Andreas Prlic" ; "Amr AL-Hossary" > > Cc: > Subject: RE: [Biojava-l] SITE records in PDBFileReader > > > Andreas and Amr, > > Thank you very much for agreeing to add this feature. May I make one > additional refinement to my request? > > REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER > code in use in the SITE records. Could the site name also be associated > with the site identifier and residues? There is precedence for parsing > REMARK records in BioJava (e.g. experiment type, resolution), but this is a > special case where REMARK 800 and SITE records are dependent on one another > and physically separated in the header. > > Regards, > Steve > > ________________________________________ > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of > Andreas Prlic > Sent: Monday, August 16, 2010 6:59 PM > To: Amr AL-Hossary > Cc: Steve Darnell; biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SITE records in PDBFileReader > > > - Take a look at PDBFileParser.java and > athttp://www.wwpdb.org/documentation/format32/sect7.html > > - It needs a new Handler method for the Site records that builds up the data > containers. > - Create a new bean that will contain the data for the SITE record > > - Instead of having fields for insertion code residue nr and chain IDs, you > can use the newPDBResidueNumber.java class to group this together. > > - Add a get/set method for the Site beans to the Structure class > - Create a junit test that make sure the parsing works ok. > > Hope that makes sense... > Andreas > > > - > On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary > wrote: > If you like It would be my pleasure to do it for you, > Just tell me where to start (in the code). > > Amr > > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From amr_alhossary at hotmail.com Fri Aug 27 04:55:11 2010 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Fri, 27 Aug 2010 08:55:11 -0000 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Dear all, Please, some body revise the attached code & checks it in if it is OK, or contact me back for any inquiry. This submitted updates handle dealing with "SITE" records to a sufficient degree (but didn't handle REMARK 800 yet) to achieve this goal I had to create a new bean called "Residue". It is implemented as a static inner class inside PDBSite (and it can be extracted to be a top level class if needed). Why I created it? because I couldn't use any of the subclasses of Group class (e.g. HOH is neither an amino acid, nor a neucleotide). in case some body has another idea, let's open the discussion about it. Regards Amr From: Andreas Prlic Sent: Tuesday, August 17, 2010 8:04 PM To: Amr AL-Hossary Cc: Steve Darnell ; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader Hi Amr, thanks for taking this on. For a first time contributor, it is probably best to post your patches to the list, so somebody else can take a look at them first and commit them for you. Andreas On Tue, Aug 17, 2010 at 10:36 AM, Amr AL-Hossary wrote: I'll see it in a couple of days. I have first to be able to check out & in the source code. All I found till now is anonymous access. Amr -------------------------------------------------- From: "Steve Darnell" Sent: Tuesday, August 17, 2010 6:00 PM To: "Andreas Prlic" ; "Amr AL-Hossary" Cc: Subject: RE: [Biojava-l] SITE records in PDBFileReader Andreas and Amr, Thank you very much for agreeing to add this feature. May I make one additional refinement to my request? REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER code in use in the SITE records. Could the site name also be associated with the site identifier and residues? There is precedence for parsing REMARK records in BioJava (e.g. experiment type, resolution), but this is a special case where REMARK 800 and SITE records are dependent on one another and physically separated in the header. Regards, Steve ________________________________________ From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Monday, August 16, 2010 6:59 PM To: Amr AL-Hossary Cc: Steve Darnell; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader - Take a look at PDBFileParser.java and athttp://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the newPDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. Hope that makes sense... Andreas - On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary wrote: If you like It would be my pleasure to do it for you, Just tell me where to start (in the code). Amr -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: SITE-specific commits.zip Type: application/x-zip-compressed Size: 34069 bytes Desc: not available URL: From sheoran143 at gmail.com Thu Aug 19 20:45:29 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 20 Aug 2010 00:45:29 -0000 Subject: [Biojava-l] Required Correction in GenbankLocationParser class Message-ID: <4C6DD03C.1080909@gmail.com> Their is problem with GenbankLocationParser class, this class don't process genbank record with Accession: M32882. LocationParser class fails at following line in genbank record: gene join((8298.8300)..10206,1..855) /gene="bcn" mRNA join((8298.8300)..10206,1..855) /gene="bcn" /note="alternative transcript" Exception stack trace is as follows: Could not understand position: 10206,1..855 org.biojava.bio.seq.io.ParseException: Could not understand position: 10206,1..855 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:277) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:244) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) I did some investigation in following matter, and found the defect in regular expression named as "gp" in GenbankLocationParser class. This error can be fixed by applying attached patch. And then for testing I have created a method which proves that it can now understand all the possible combination of location. This test class is also attached so that you can test my patch before and after its application. I don't have access to svn so please apply this patch for me, and let me know if you approve this patch or not. Thanks Deepak Sheoran -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankLocationParser.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: LocationParserTest.java URL: From sheoran143 at gmail.com Thu Aug 19 20:48:23 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 20 Aug 2010 00:48:23 -0000 Subject: [Biojava-l] Required Correction in GenbankLocationParser class Message-ID: <4C6DD0E8.8070704@gmail.com> Their is problem with GenbankLocationParser class, this class don't process genbank record with Accession: M32882. LocationParser class fails at following line in genbank record: gene join((8298.8300)..10206,1..855) /gene="bcn" mRNA join((8298.8300)..10206,1..855) /gene="bcn" /note="alternative transcript" Exception stack trace is as follows: Could not understand position: 10206,1..855 org.biojava.bio.seq.io.ParseException: Could not understand position: 10206,1..855 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:277) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:244) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) I did some investigation in following matter, and found the defect in regular expression named as "gp" in GenbankLocationParser class. This error can be fixed by applying attached patch. And then for testing I have created a method which proves that it can now understand all the possible combination of location. This test class is also attached so that you can test my patch before and after its application. I don't have access to svn so please apply this patch for me, and let me know if you approve this patch or not. Thanks Deepak Sheoran -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankLocationParser.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: LocationParserTest.java URL: From rmb32 at cornell.edu Sun Aug 1 19:17:14 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 01 Aug 2010 12:17:14 -0700 Subject: [Biojava-l] GMOD Evo Hackathon Open Call for Participation Message-ID: <4C55C83A.3060700@cornell.edu> We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From darnells at dnastar.com Mon Aug 16 22:26:13 2010 From: darnells at dnastar.com (Steve Darnell) Date: Mon, 16 Aug 2010 17:26:13 -0500 Subject: [Biojava-l] SITE records in PDBFileReader Message-ID: I'm sorry for reposting this message. I accidentally sent the previous one as HTML. ________________________________________ From: Steve Darnell Sent: Monday, August 16, 2010 5:19 PM To: 'biojava-l at lists.open-bio.org' Subject: SITE records in PDBFileReader Greetings, I am interested in parsing SITE records from a PDB file. ?I looked over the org.biojava.bio.structure API, but I was unable to find reference to this functionality. ?Does the PDBFileReader in BioJava extract SITE record information?? If not, would it be possible to add this capability to PDBFileReader and the Structure class? SITE record format at wwPDB: http://www.wwpdb.org/documentation/format32/sect7.html Regards, Steve Darnell From darnells at dnastar.com Mon Aug 16 22:19:28 2010 From: darnells at dnastar.com (Steve Darnell) Date: Mon, 16 Aug 2010 17:19:28 -0500 Subject: [Biojava-l] SITE records in PDBFileReader Message-ID: Greetings, I am interested in parsing SITE records from a PDB file. I looked over the org.biojava.bio.structure API, but I was unable to find reference to this functionality. Does the PDBFileReader in BioJava extract SITE record information? If not, would it be possible to add this capability to PDBFileReader and the Structure class? SITE record format at wwPDB: http://www.wwpdb.org/documentation/format32/sect7.html Regards, Steve Darnell From andreas at sdsc.edu Mon Aug 16 22:49:56 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 16 Aug 2010 15:49:56 -0700 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Hi Steve, thanks for the feature request. I will probably be able to add this at some point in September. If you need it already before that, I will be happy to commit a patch if somebody else provides it... Andreas On Mon, Aug 16, 2010 at 3:26 PM, Steve Darnell wrote: > I'm sorry for reposting this message. I accidentally sent the previous one > as HTML. > > ________________________________________ > From: Steve Darnell > Sent: Monday, August 16, 2010 5:19 PM > To: 'biojava-l at lists.open-bio.org' > Subject: SITE records in PDBFileReader > > Greetings, > > I am interested in parsing SITE records from a PDB file. I looked over the > org.biojava.bio.structure API, but I was unable to find reference to this > functionality. Does the PDBFileReader in BioJava extract SITE record > information? If not, would it be possible to add this capability to > PDBFileReader and the Structure class? > > SITE record format at wwPDB: > http://www.wwpdb.org/documentation/format32/sect7.html > > Regards, > Steve Darnell > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Mon Aug 16 23:58:48 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 16 Aug 2010 16:58:48 -0700 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: - Take a look at PDBFileParser.java and at http://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the new PDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. Hope that makes sense... Andreas - On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary wrote: > If you like It would be my pleasure to do it for you, > Just tell me where to start (in the code). > > Amr > > > -------------------------------------------------- > From: "Andreas Prlic" > Sent: Tuesday, August 17, 2010 12:49 AM > To: "Steve Darnell" > Cc: > Subject: Re: [Biojava-l] SITE records in PDBFileReader > > > Hi Steve, >> >> thanks for the feature request. I will probably be able to add this at >> some >> point in September. If you need it already before that, I will be happy to >> commit a patch if somebody else provides it... >> >> Andreas >> >> >> On Mon, Aug 16, 2010 at 3:26 PM, Steve Darnell >> wrote: >> >> I'm sorry for reposting this message. I accidentally sent the previous >>> one >>> as HTML. >>> >>> ________________________________________ >>> From: Steve Darnell >>> Sent: Monday, August 16, 2010 5:19 PM >>> To: 'biojava-l at lists.open-bio.org' >>> Subject: SITE records in PDBFileReader >>> >>> Greetings, >>> >>> I am interested in parsing SITE records from a PDB file. I looked over >>> the >>> org.biojava.bio.structure API, but I was unable to find reference to this >>> functionality. Does the PDBFileReader in BioJava extract SITE record >>> information? If not, would it be possible to add this capability to >>> PDBFileReader and the Structure class? >>> >>> SITE record format at wwPDB: >>> http://www.wwpdb.org/documentation/format32/sect7.html >>> >>> Regards, >>> Steve Darnell >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From amr_alhossary at hotmail.com Mon Aug 16 23:48:18 2010 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Tue, 17 Aug 2010 01:48:18 +0200 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: If you like It would be my pleasure to do it for you, Just tell me where to start (in the code). Amr -------------------------------------------------- From: "Andreas Prlic" Sent: Tuesday, August 17, 2010 12:49 AM To: "Steve Darnell" Cc: Subject: Re: [Biojava-l] SITE records in PDBFileReader > Hi Steve, > > thanks for the feature request. I will probably be able to add this at > some > point in September. If you need it already before that, I will be happy to > commit a patch if somebody else provides it... > > Andreas > > > On Mon, Aug 16, 2010 at 3:26 PM, Steve Darnell > wrote: > >> I'm sorry for reposting this message. I accidentally sent the previous >> one >> as HTML. >> >> ________________________________________ >> From: Steve Darnell >> Sent: Monday, August 16, 2010 5:19 PM >> To: 'biojava-l at lists.open-bio.org' >> Subject: SITE records in PDBFileReader >> >> Greetings, >> >> I am interested in parsing SITE records from a PDB file. I looked over >> the >> org.biojava.bio.structure API, but I was unable to find reference to this >> functionality. Does the PDBFileReader in BioJava extract SITE record >> information? If not, would it be possible to add this capability to >> PDBFileReader and the Structure class? >> >> SITE record format at wwPDB: >> http://www.wwpdb.org/documentation/format32/sect7.html >> >> Regards, >> Steve Darnell >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jbdundas at gmail.com Tue Aug 17 02:43:02 2010 From: jbdundas at gmail.com (jitesh dundas) Date: Tue, 17 Aug 2010 08:13:02 +0530 Subject: [Biojava-l] BioJava 3Proposal tasks Message-ID: Dear All, Sorry I am sending this again ,but I don't see it in the list anywhere.please post it. I went through the BioJava3 proposal as you mentioned earlier..There are a few things that I could take up without much worries... I can find out how Hibernate can be best deployed for BioJava. PLease note that I suggest we use only hibernate3 or higher versions. HIbernate2 has implementation and performace issues.. I can also look at Spring after this task is done.. I can find out the architectural and implementation issues in Biojava. I am strong in Analysis and could do all this reasonably well.. I just want someone to share my concerns with and validate the findings.. Analyse how BioJava is being used by the community. See the UsageAnalysis page. I can do these.. To start from scratch, creating a number of smaller jars as sub-projects within an umbrella BioJava3 project. Each jar would provide tools for a specific purpose. Additional jars would provide cross-purpose tools such as format converters or text-to-object interfaces. Possibly built using Maven instead of Ant. Although starting from scratch, much existing code could be reused or refactored to suit the new design. We would take full advantage of Java 6, including generics, (@)annotations, the built-in property change support. Everything would be a bean - absolutely everything. We would aim to be fully Java EE compliant, with the majority of components fully reusable as a bean in any other application, just like Spring's components are. We would adhere rigidly to a common coding style and heavily comment the code. We should make it able to focus on any aspect the user requires and keep its efficiency, removing its dependency on everything being sequence-related. SymbolLists and Alphabets to be rethought as these are the most common stumbling block. Make methods parallel-aware and take advantage of this when possible, and provide a global variable to specify how much parallelisation can take place. - I am very interested in this and would liek to take this up asap Sir.. JDK 1.5 has parallel programming extension to use and we can define a common method or mode for executing existing code or functionalities..However, impact analysis will be needed as NOT ALL CODE CAN BE MADE PARALLEL COMPLIANT DUE TO IMPLEMENTATION ISSUES>>WILL NEED THOROUGH CHECKING...i can do this.. Please reply and advise which i should take up first ..Points in bold are of particular interest to me..Even those beyond those list are welcome ... Regards, JD From darnells at dnastar.com Tue Aug 17 16:00:33 2010 From: darnells at dnastar.com (Steve Darnell) Date: Tue, 17 Aug 2010 11:00:33 -0500 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Andreas and Amr, Thank you very much for agreeing to add this feature. May I make one additional refinement to my request? REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER code in use in the SITE records. Could the site name also be associated with the site identifier and residues? There is precedence for parsing REMARK records in BioJava (e.g. experiment type, resolution), but this is a special case where REMARK 800 and SITE records are dependent on one another and physically separated in the header. Regards, Steve ________________________________________ From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Monday, August 16, 2010 6:59 PM To: Amr AL-Hossary Cc: Steve Darnell; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader - Take a look at PDBFileParser.java and at?http://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the new?PDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. Hope that makes sense... Andreas -? On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary wrote: If you like It would be my pleasure to do it for you, Just tell me where to start (in the code). Amr From amr_alhossary at hotmail.com Tue Aug 17 17:36:55 2010 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Tue, 17 Aug 2010 19:36:55 +0200 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: I'll see it in a couple of days. I have first to be able to check out & in the source code. All I found till now is anonymous access. Amr -------------------------------------------------- From: "Steve Darnell" Sent: Tuesday, August 17, 2010 6:00 PM To: "Andreas Prlic" ; "Amr AL-Hossary" Cc: Subject: RE: [Biojava-l] SITE records in PDBFileReader > Andreas and Amr, > > Thank you very much for agreeing to add this feature. May I make one > additional refinement to my request? > > REMARK 800 provides a very useful SITE_DESCRIPTION for each > SITE_IDENTIFIER code in use in the SITE records. Could the site name also > be associated with the site identifier and residues? There is precedence > for parsing REMARK records in BioJava (e.g. experiment type, resolution), > but this is a special case where REMARK 800 and SITE records are dependent > on one another and physically separated in the header. > > Regards, > Steve > > ________________________________________ > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf > Of Andreas Prlic > Sent: Monday, August 16, 2010 6:59 PM > To: Amr AL-Hossary > Cc: Steve Darnell; biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SITE records in PDBFileReader > > - Take a look at PDBFileParser.java and > athttp://www.wwpdb.org/documentation/format32/sect7.html > - It needs a new Handler method for the Site records that builds up the > data containers. > - Create a new bean that will contain the data for the SITE record > - Instead of having fields for insertion code residue nr and chain IDs, > you can use the newPDBResidueNumber.java class to group this together. > - Add a get/set method for the Site beans to the Structure class > - Create a junit test that make sure the parsing works ok. > > Hope that makes sense... > Andreas > > > - > On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary > wrote: > If you like It would be my pleasure to do it for you, > Just tell me where to start (in the code). > > Amr > > From andreas at sdsc.edu Tue Aug 17 18:04:19 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 17 Aug 2010 11:04:19 -0700 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Hi Amr, thanks for taking this on. For a first time contributor, it is probably best to post your patches to the list, so somebody else can take a look at them first and commit them for you. Andreas On Tue, Aug 17, 2010 at 10:36 AM, Amr AL-Hossary wrote: > I'll see it in a couple of days. I have first to be able to check out & in > the source code. > All I found till now is anonymous access. > > Amr > > -------------------------------------------------- > From: "Steve Darnell" > Sent: Tuesday, August 17, 2010 6:00 PM > To: "Andreas Prlic" ; "Amr AL-Hossary" < > amr_alhossary at hotmail.com> > Cc: > Subject: RE: [Biojava-l] SITE records in PDBFileReader > > Andreas and Amr, >> >> Thank you very much for agreeing to add this feature. May I make one >> additional refinement to my request? >> >> REMARK 800 provides a very useful SITE_DESCRIPTION for each >> SITE_IDENTIFIER code in use in the SITE records. Could the site name also >> be associated with the site identifier and residues? There is precedence >> for parsing REMARK records in BioJava (e.g. experiment type, resolution), >> but this is a special case where REMARK 800 and SITE records are dependent >> on one another and physically separated in the header. >> >> Regards, >> Steve >> >> ________________________________________ >> From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf >> Of Andreas Prlic >> Sent: Monday, August 16, 2010 6:59 PM >> To: Amr AL-Hossary >> Cc: Steve Darnell; biojava-l at lists.open-bio.org >> Subject: Re: [Biojava-l] SITE records in PDBFileReader >> >> - Take a look at PDBFileParser.java and athttp:// >> www.wwpdb.org/documentation/format32/sect7.html >> >> - It needs a new Handler method for the Site records that builds up the >> data containers. >> - Create a new bean that will contain the data for the SITE record >> - Instead of having fields for insertion code residue nr and chain IDs, >> you can use the newPDBResidueNumber.java class to group this together. >> >> - Add a get/set method for the Site beans to the Structure class >> - Create a junit test that make sure the parsing works ok. >> >> Hope that makes sense... >> Andreas >> >> >> - >> On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary < >> amr_alhossary at hotmail.com> wrote: >> If you like It would be my pleasure to do it for you, >> Just tell me where to start (in the code). >> >> Amr >> >> >> -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Wed Aug 18 18:26:23 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 18 Aug 2010 11:26:23 -0700 Subject: [Biojava-l] Last week of Google Summer of Code Message-ID: Hi, This is the last week of this year's Google Summer of Code project and I am happy to announce that our two students Mark Chapman and Jianjiong Gao did an amazing job on their two projects "All Java Multiple Sequence Alignment" (MSA) and "Identification and Classification of Posttranslational Modification of Proteins" (PTM). For Multiple Sequence Alignments we?now have a flexible and multi-threaded MSA implementation that works in linear space and that, as an option, allows the users to define anchors that are used in the build up of the multiple alignment. The code is available as part of the new biojava3-alignment module. The Posttranslational Modification module (biojava3-protmod) can detect three different types of protein modifications in protein structures. It comes with an XML file & Java data structures to store information about different types of protein modifications, and contains entries from RESID, PDBCC and PSI-MOD. There is also a visualisation component to display cross linked PTM on a sequence viewer. Both Mark and Jianjiong have expressed their interest in maintaining and further developing their modules and I am looking forward to interacting more with them in the future. I want to thank the Mentors and Co-Mentors Peter Rose, Kyle Ellrott and Scooter Willis for their help and guidance for the projects, without them this would not have been possible. Thanks also to Robert Buels and the ?Open Bioinformatics Foundation for organizing our applications for GSoC and last, but not least, Google for sponsoring this Summer of Code. Happy BioJava-ing, Andreas From andrew.mcsweeny at rockets.utoledo.edu Wed Aug 18 22:53:54 2010 From: andrew.mcsweeny at rockets.utoledo.edu (McSweeny, Andrew J) Date: Wed, 18 Aug 2010 22:53:54 +0000 Subject: [Biojava-l] Annotations question Message-ID: <469B4CD3D7690A418E8F96B7BA4585F81202C15E@BL2PRD0103MB052.prod.exchangelabs.com> Hello, I am interested in using BioJava to determine which features are located where on the assembled chromosome 21 (chr21.fa) from the UCSC genome browser website. An example of something I would like to do is to pick a position at random (1-48,129,895) and then determine whether there are any exons or introns on the plus or minus strand. What classes do I need to be familiar with to do this? -Andrew From rmb32 at cornell.edu Thu Aug 19 17:09:45 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 19 Aug 2010 10:09:45 -0700 Subject: [Biojava-l] reminder: Aug 25 deadline for GMOD Hackathon application Message-ID: <4C6D6559.3080809@cornell.edu> Hi all, This is your one-week reminder: the deadline for open applications to the GMOD Evo hackathon is Wednesday, August 25th. Rob ======================================== We are seeking participants for the GMOD Tools for Evolutionary Biology Hackathon, held November 8-12, 2010 at the US National Evolutionary Synthesis Center (NESCent) in Durham, NC. This hackathon targets three critical gaps in the capabilities of the GMOD toolbox that currently limit its utility for evolutionary research: 1. Visualization of comparative genomics data 2. Visualization of phylogenetic data and trees 3. Support for population diversity and phenotype data If you are interested in these areas and have relevant expertise, you are strongly encouraged to apply. Relevant areas of expertise include more than just software development: if you are a GMOD power user, visualization guru, domain expert (comparative, phylogenetics, population, ...), or documentation wizard, then your skills are needed! How To Apply: Fill out the online application form at http://bit.ly/gmodevohack. Applications are due August 25. About GMOD: GMOD is an intercompatible suite of open-source software components for storing, managing, analyzing, and visualizing genome-scale data. GMOD includes many widely-used software components: GBrowse and JBrowse, both genome viewers; GBrowse_syn, a comparative genomics viewer; Chado, a generic and modular database schema; CMap, a comparative map viewer; as well as many other components including Apollo, MAKER, BioMart, InterMine, and Galaxy. We hope to extend the functionality of existing GMOD components, and integrate new components as well. About Hackathons: A hackathon is an intense event at which a group of programmers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole. The mix of people will include domain experts and computer-savvy end-users. More details about the event, its motivation, organization, procedures, and attendees, as well as URLs to the hackathon and related websites are included below. Sincerely, The GMOD EvoHack Organizing Committee (and project affiliations as relevant): Nicole Washington, Chair (LBNL, modENCODE, Phenote) Robert Buels (SGN, Chado NatDiv) Scott Cain (OICR, GMOD) Dave Clements (NESCent, GMOD) Hilmar Lapp (NESCent, Phenoscape, Chado NatDiv) Sheldon McKay (University of Arizona, iPlant, GBrowse_syn) ----------------------------- About the GMOD Evo Hackathon Overview We are organizing a hackathon to fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation. The event will be hosted at NESCent and bring together a group of about 20+ software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements. The event will provide a unique opportunity to infuse the GMOD developer community with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components. Before the Event Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion. During the Event Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest. Deliverables / Event Results The meeting's attendance, working groups, and outcomes will be fully logged and documented on the GMOD wiki (http://gmod.org). Each working group during the event will typically have its own wiki page, linked from the main EvoHack page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically. NESCent This event is sponsored by the US National Evolutionary Synthesis Center (NESCent, http://www.nescent.org) through its Informatics Whitepapers program (http://www.nescent.org/informatics/whitepapers.php). NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries (see http://www.nescent.org/science/proposals.php). Links Main GMOD EvoHack page, and full proposal: http://gmod.org/wiki/GMOD_Evo_Hackathon NESCent: http://www.nescent.org/ GMOD: http://gmod.org Similar past NESCent events, see: http://hackathon.nescent.org/ GMOD hackathon application: http://bit.ly/gmodevohack -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/GMOD_Europe_2010 http://gmod.org/wiki/Help_Desk_Feedback From amr_alhossary at hotmail.com Fri Aug 27 11:57:16 2010 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Fri, 27 Aug 2010 13:57:16 +0200 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: I sent the updated code as an attachment to the group, as well as to Andreas Prlic; Steve Darnell; jacobsen at ebi.ac.uk; to be reviewed for submission. It seems that the group daemon prevents attachments whatever small is their size. Please feed me back if it wasn't delivered correctly. This submitted updates handle dealing with "SITE" records to a sufficient degree (but didn't handle REMARK 800 yet) to achieve this goal I had to create a new bean called "Residue". It is implemented as a static inner class inside PDBSite (and it can be extracted to be a top level class if needed). I created it because I couldn't use any of the subclasses of Group class (e.g. HOH is neither an amino acid, nor a nucleotide). I guess this should be discussed on the biojava-dev mail list if any body is interested and if it suits the list policy. I also have some comments on the already present code that needs to be discussed. to whom shall I address my comments? Regards Amr From: Andreas Prlic Sent: Tuesday, August 17, 2010 8:04 PM To: Amr AL-Hossary Cc: Steve Darnell ; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader Hi Amr, thanks for taking this on. For a first time contributor, it is probably best to post your patches to the list, so somebody else can take a look at them first and commit them for you. Andreas On Tue, Aug 17, 2010 at 10:36 AM, Amr AL-Hossary wrote: I'll see it in a couple of days. I have first to be able to check out & in the source code. All I found till now is anonymous access. Amr -------------------------------------------------- From: "Steve Darnell" Sent: Tuesday, August 17, 2010 6:00 PM To: "Andreas Prlic" ; "Amr AL-Hossary" Cc: Subject: RE: [Biojava-l] SITE records in PDBFileReader Andreas and Amr, Thank you very much for agreeing to add this feature. May I make one additional refinement to my request? REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER code in use in the SITE records. Could the site name also be associated with the site identifier and residues? There is precedence for parsing REMARK records in BioJava (e.g. experiment type, resolution), but this is a special case where REMARK 800 and SITE records are dependent on one another and physically separated in the header. Regards, Steve ________________________________________ From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Monday, August 16, 2010 6:59 PM To: Amr AL-Hossary Cc: Steve Darnell; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader - Take a look at PDBFileParser.java and athttp://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the newPDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. Hope that makes sense... Andreas - On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary wrote: If you like It would be my pleasure to do it for you, Just tell me where to start (in the code). Amr -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jbdundas at gmail.com Fri Aug 27 14:44:46 2010 From: jbdundas at gmail.com (jitesh dundas) Date: Fri, 27 Aug 2010 20:14:46 +0530 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Hi, Thanks & nice work.I think you need to tell your module lead about that.. Hibernate inclusion isnot a good idea for BioJava.It is slow & XML based, thus big data files will be affected. I think we need a plugin framework with better features that deploy functionalities ,which biologists look for.. I have been doing analysis on the BioJava 3 proposal and have some concerns on this, besides the other analysis that is present. I will be sending it to my lead, Andreas Sir (not Andreas Prilic) on this. Regards, JD On 8/27/10, Amr AL-Hossary wrote: > I sent the updated code as an attachment to the group, as well as to Andreas > Prlic; Steve Darnell; > jacobsen at ebi.ac.uk; to be reviewed for submission. > > It seems that the group daemon prevents attachments whatever small is their > size. > Please feed me back if it wasn't delivered correctly. > > This submitted updates handle dealing with "SITE" records to a sufficient > degree (but didn't handle REMARK 800 yet) > > to achieve this goal I had to create a new bean called "Residue". It is > implemented as a static inner class inside PDBSite (and it can be extracted > to be a top level class if needed). > > I created it because I couldn't use any of the subclasses of Group class > (e.g. HOH is neither an amino acid, nor a nucleotide). > > I guess this should be discussed on the biojava-dev mail list if any body is > interested and if it suits the list policy. > I also have some comments on the already present code that needs to be > discussed. to whom shall I address my comments? > > Regards > > Amr > From: Andreas Prlic > Sent: Tuesday, August 17, 2010 8:04 PM > To: Amr AL-Hossary > Cc: Steve Darnell ; biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SITE records in PDBFileReader > > > Hi Amr, > > thanks for taking this on. For a first time contributor, it is probably > best to post your patches to the list, so somebody else can take a look at > them first and commit them for you. > > Andreas > > > > On Tue, Aug 17, 2010 at 10:36 AM, Amr AL-Hossary > wrote: > > I'll see it in a couple of days. I have first to be able to check out & in > the source code. > All I found till now is anonymous access. > > Amr > > -------------------------------------------------- > From: "Steve Darnell" > Sent: Tuesday, August 17, 2010 6:00 PM > To: "Andreas Prlic" ; "Amr AL-Hossary" > > Cc: > Subject: RE: [Biojava-l] SITE records in PDBFileReader > > > Andreas and Amr, > > Thank you very much for agreeing to add this feature. May I make one > additional refinement to my request? > > REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER > code in use in the SITE records. Could the site name also be associated > with the site identifier and residues? There is precedence for parsing > REMARK records in BioJava (e.g. experiment type, resolution), but this is a > special case where REMARK 800 and SITE records are dependent on one another > and physically separated in the header. > > Regards, > Steve > > ________________________________________ > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of > Andreas Prlic > Sent: Monday, August 16, 2010 6:59 PM > To: Amr AL-Hossary > Cc: Steve Darnell; biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SITE records in PDBFileReader > > > - Take a look at PDBFileParser.java and > athttp://www.wwpdb.org/documentation/format32/sect7.html > > - It needs a new Handler method for the Site records that builds up the data > containers. > - Create a new bean that will contain the data for the SITE record > > - Instead of having fields for insertion code residue nr and chain IDs, you > can use the newPDBResidueNumber.java class to group this together. > > - Add a get/set method for the Site beans to the Structure class > - Create a junit test that make sure the parsing works ok. > > Hope that makes sense... > Andreas > > > - > On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary > wrote: > If you like It would be my pleasure to do it for you, > Just tell me where to start (in the code). > > Amr > > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From amr_alhossary at hotmail.com Fri Aug 27 08:55:11 2010 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Fri, 27 Aug 2010 08:55:11 -0000 Subject: [Biojava-l] SITE records in PDBFileReader In-Reply-To: References: Message-ID: Dear all, Please, some body revise the attached code & checks it in if it is OK, or contact me back for any inquiry. This submitted updates handle dealing with "SITE" records to a sufficient degree (but didn't handle REMARK 800 yet) to achieve this goal I had to create a new bean called "Residue". It is implemented as a static inner class inside PDBSite (and it can be extracted to be a top level class if needed). Why I created it? because I couldn't use any of the subclasses of Group class (e.g. HOH is neither an amino acid, nor a neucleotide). in case some body has another idea, let's open the discussion about it. Regards Amr From: Andreas Prlic Sent: Tuesday, August 17, 2010 8:04 PM To: Amr AL-Hossary Cc: Steve Darnell ; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader Hi Amr, thanks for taking this on. For a first time contributor, it is probably best to post your patches to the list, so somebody else can take a look at them first and commit them for you. Andreas On Tue, Aug 17, 2010 at 10:36 AM, Amr AL-Hossary wrote: I'll see it in a couple of days. I have first to be able to check out & in the source code. All I found till now is anonymous access. Amr -------------------------------------------------- From: "Steve Darnell" Sent: Tuesday, August 17, 2010 6:00 PM To: "Andreas Prlic" ; "Amr AL-Hossary" Cc: Subject: RE: [Biojava-l] SITE records in PDBFileReader Andreas and Amr, Thank you very much for agreeing to add this feature. May I make one additional refinement to my request? REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER code in use in the SITE records. Could the site name also be associated with the site identifier and residues? There is precedence for parsing REMARK records in BioJava (e.g. experiment type, resolution), but this is a special case where REMARK 800 and SITE records are dependent on one another and physically separated in the header. Regards, Steve ________________________________________ From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Monday, August 16, 2010 6:59 PM To: Amr AL-Hossary Cc: Steve Darnell; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] SITE records in PDBFileReader - Take a look at PDBFileParser.java and athttp://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the newPDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. Hope that makes sense... Andreas - On Mon, Aug 16, 2010 at 4:48 PM, Amr AL-Hossary wrote: If you like It would be my pleasure to do it for you, Just tell me where to start (in the code). Amr -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: SITE-specific commits.zip Type: application/x-zip-compressed Size: 34069 bytes Desc: not available URL: From sheoran143 at gmail.com Fri Aug 20 00:45:29 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 20 Aug 2010 00:45:29 -0000 Subject: [Biojava-l] Required Correction in GenbankLocationParser class Message-ID: <4C6DD03C.1080909@gmail.com> Their is problem with GenbankLocationParser class, this class don't process genbank record with Accession: M32882. LocationParser class fails at following line in genbank record: gene join((8298.8300)..10206,1..855) /gene="bcn" mRNA join((8298.8300)..10206,1..855) /gene="bcn" /note="alternative transcript" Exception stack trace is as follows: Could not understand position: 10206,1..855 org.biojava.bio.seq.io.ParseException: Could not understand position: 10206,1..855 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:277) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:244) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) I did some investigation in following matter, and found the defect in regular expression named as "gp" in GenbankLocationParser class. This error can be fixed by applying attached patch. And then for testing I have created a method which proves that it can now understand all the possible combination of location. This test class is also attached so that you can test my patch before and after its application. I don't have access to svn so please apply this patch for me, and let me know if you approve this patch or not. Thanks Deepak Sheoran -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankLocationParser.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: LocationParserTest.java URL: From sheoran143 at gmail.com Fri Aug 20 00:48:23 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 20 Aug 2010 00:48:23 -0000 Subject: [Biojava-l] Required Correction in GenbankLocationParser class Message-ID: <4C6DD0E8.8070704@gmail.com> Their is problem with GenbankLocationParser class, this class don't process genbank record with Accession: M32882. LocationParser class fails at following line in genbank record: gene join((8298.8300)..10206,1..855) /gene="bcn" mRNA join((8298.8300)..10206,1..855) /gene="bcn" /note="alternative transcript" Exception stack trace is as follows: Could not understand position: 10206,1..855 org.biojava.bio.seq.io.ParseException: Could not understand position: 10206,1..855 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:277) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:244) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) I did some investigation in following matter, and found the defect in regular expression named as "gp" in GenbankLocationParser class. This error can be fixed by applying attached patch. And then for testing I have created a method which proves that it can now understand all the possible combination of location. This test class is also attached so that you can test my patch before and after its application. I don't have access to svn so please apply this patch for me, and let me know if you approve this patch or not. Thanks Deepak Sheoran -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankLocationParser.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: LocationParserTest.java URL: