From pamedeo at jcvi.org Mon Mar 1 14:59:50 2010 From: pamedeo at jcvi.org (Amedeo, Paolo) Date: Mon, 1 Mar 2010 14:59:50 -0500 Subject: [BioSQL-l] Quick question about tools using BioSQL Message-ID: I'm evaluating the possibility of using BioSQL for a new genome annotation project that also involves loading annotated genomes from GenBank. So far, I have successfully deployed a working copy of the database under MySQL on my machine and loaded a couple of genomes from GenBank files, using the script load_seqdatabase.pl found in the scripts directory of BioPerl-db-1.6.0. Browsing the database, however, I have noticed a few things that concern me a little bit. First, the seqfeature_relationship table is completely empty and I couldn't identify any obvious way to investigate parent/child relationships between entities stored in seqfeature (e.g. in the case of overlapping genes, or genes embedded in introns of other genes, how one could determine to which gene a given CDS belongs?). Second, I was unable to find a dedicated script to populate the ontology table and I was somehow surprised that this table got somehow populated with the keywords present in the GenBank files. Third, once I have loaded a genome without first populating the taxon table and, as a result I have noticed that the values assigned to taxon.left_value and taxon.right_value described a narrow interval that didn't include at all the taxon_id of the genome loaded in the database. I then tried to use the script bioentry2flat.pl to try to write back to a gbf file the genome that I had loaded in the database. Unfortunately I couldn't find any documentation for this script and I've tried to use as values of the various parameters the same strings that I used with the other script. I had to edit the code to get rid of hard-coded values, but still I couldn't get the script to run successfully. I suspect that there is some problem with matching correctly the accession. Obviously I'm doing one or more things wrong and/or I'm not using the proper set of tools for doing what I need to do. I would really appreciate if somebody could point me to a set of tools that would allow me to load gbf files into the database and extract the individual accessions in both gbf and asn.1 (sqn) format, or teaches me how should I correctly use these two scripts, so that the bioentry_relationship table is populated correctly. Thanks for your consideration! Paolo Amedeo Senior Bioinformatics Engineer J. Craig Venter Institute 9704 Medical Center Dr. Rockville, MD 20850 From biopython at maubp.freeserve.co.uk Tue Mar 2 06:06:08 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Mar 2010 11:06:08 +0000 Subject: [BioSQL-l] Quick question about tools using BioSQL In-Reply-To: References: Message-ID: <320fb6e01003020306y317f2d00h78b9399a94e05b28@mail.gmail.com> On Mon, Mar 1, 2010 at 7:59 PM, Amedeo, Paolo wrote: > > I'm evaluating the possibility of using BioSQL for a new genome > annotation project that also involves loading annotated genomes from > GenBank. We looked at this too, but one major drawback of the BioSQL schema is it does not attempt to handle revisions (e.g. biologists manually curating the annotation). I think my colleague is planning on trying Artemis with a Chado database for this. > So far, I have successfully deployed a working copy of the database > under MySQL on my machine and loaded a couple of genomes from > GenBank files, using the script load_seqdatabase.pl found in the scripts > directory of BioPerl-db-1.6.0. You can also load GenBank files into BioSQL using BioJava, Biopython etc. They should give almost the same result in the database (since the BioPerl implementation is the de facto reference implementation). > Browsing the database, however, I have noticed a few things that > concern me a little bit. > > First, the seqfeature_relationship table is completely empty and I > couldn't identify any obvious way to investigate parent/child > relationships between entities stored in seqfeature (e.g. in the case of > overlapping genes, or genes embedded in introns of other genes, > how one could determine to which gene a given CDS belongs?). GenBank files do not hold that information, so why do you expect the parser to store it in the database? GFF files do hold this kind of information, and I am aware of a BioPerl script to turn GenBank files into GFF (GFF3?) but to do this requires a lot of inference about the gene model etc to generate the parent/child information. You could load the basic GenBank file into the database and then run your own analysis to populate the relationship tables with this kind of information. > Second, I was unable to find a dedicated script to populate the ontology > table and I was somehow surprised that this table got somehow populated > with the keywords present in the GenBank files. The BioPerl implementation (and others since) use an "ad hoc" ontology, basically a nasty hack. I don't think any of the Bio* binding for BioSQL implement a strict ontology check (but it would be nice). > Third, once I have loaded a genome without first populating the taxon > table and, as a result I have noticed that the values assigned to > taxon.left_value and taxon.right_value described a narrow interval that > didn't include at all the taxon_id of the genome loaded in the database. There are two IDs for each entry in the taxon table, the database key and the NCBI taxon id, which can be different. Biopython ignores the left/right values, so I haven't looked at this recently. Could you clarify with an example? > I then tried to use the script bioentry2flat.pl to try to write back to > a gbf file the genome that I had loaded in the database. Unfortunately I > couldn't find any documentation for this script and I've tried to use as > values of the various parameters the same strings that I used with the > other script. ?I had to edit the code to get rid of hard-coded values, > but still I couldn't get the script to run successfully. I suspect that > there is some problem with matching correctly the accession. I've not used that script (I would use Biopython, just a few lines to retrieve the entry from the BioSQL database, then give it to our SeqIO module for output as a GenBank file). > Obviously I'm doing one or more things wrong and/or I'm not using the > proper set of tools for doing what I need to do. I'm not sure if BioSQL does exactly what you expect. > I would really appreciate if somebody could point me to a set of tools > that would allow me to load gbf files into the database and extract the > individual accessions in both gbf and asn.1 (sqn) format, or teaches me > how should I correctly use these two scripts, so that the > bioentry_relationship table is populated correctly. When you say gbf do you mean GenBank? Most people use the shorthand gb or gbk (based on the typical file extensions). Could you give an example of the sort information you are hoping to find in the bioentry_relationship table? Peter From biopython at maubp.freeserve.co.uk Thu Mar 4 18:15:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Mar 2010 23:15:12 +0000 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> Message-ID: <320fb6e01003041515w19cd91bfq2842a915993902fb@mail.gmail.com> On Wed, Jan 13, 2010 at 5:06 PM, Hilmar Lapp wrote: > > Hi Peter, yes, I know I'm remiss on doing that. Will do shortly. Please > don't stop pestering if I seem to have forgotten :-) > > ? ? ? ?-hilmar Hi Hilmar, This is more a gentle reminder than pestering, but could you review Brad's BioSQL schema for SQLite for committing to the SVN? We've not had any issues reported and it has had testing (via the Biopython unit tests) on Linux, Windows and Mac OS X. Thanks, Peter For anyone interested in the previous posts on the thread, http://lists.open-bio.org/pipermail/biosql-l/2010-January/001668.html http://lists.open-bio.org/pipermail/biosql-l/2009-December/001660.html etc From pprahul at gmail.com Tue Mar 30 02:53:15 2010 From: pprahul at gmail.com (Rahul Krishnan) Date: Tue, 30 Mar 2010 12:23:15 +0530 Subject: [BioSQL-l] GSoC Student Proposal Message-ID: <85fe38611003292353j2d6c1d5bj493a926305665567@mail.gmail.com> Hi, I am Rahul Krishnan, an undergrad student of Computer Science. I was going through the different mentoring organizations when I noticed the open bio project, and the BioSQL project interested me specifically. And I am new to this community :) I have more than an year's experience building and maintaining websites using various CMS including drupal, wp etc, which includes programming in php, sql and other web based tools and frameworks. I am also good at programming in various languages including c, c++, asp, java and am passionate about learning new technologies head on. Apart from that, I've been an open source contributor for over 2 years. This includes developing a java application for Android OS ( http://code.google.com/p/tictactoe4android), contributions to Haiku OS ( http://dev.osdrawer.net/users/224) and open solaris. I am also well versed with interacting with the community, using IRC, svn / git repos, and other essential tools for developing applications. I consider this as an opportunity to get introduced to a new community and further my skills by contributing to open source development. I read through the various possible enhancements listed at http://biosql.org/wiki/Enhancement_Requests. I would like to know if I should stick on to these (as I don't have deep knowledge about the openSQL source code). I would like to know if a combination of these enhancements would amount to a good GSoC proposal that would add value to openBio project. Cheers! -- Rahul Krishnan Amrita University '12 http://rahulkrishnanblogs.wordpress.com From alice.garcia at labri.fr Tue Mar 23 07:31:34 2010 From: alice.garcia at labri.fr (Alice Garcia) Date: Tue, 23 Mar 2010 11:31:34 -0000 Subject: [BioSQL-l] Article about BioSQL Message-ID: <4BA8A1E5.9000607@labri.fr> Dear all, I'm working in the LaBRI in Bordeaux (France) in a team using BioSQL. For a future article, I want to know if there is any article on BioSQL to cite it. I could not find any information on the web. Thank you for your help. All the best, Alice Garcia From rmb32 at cornell.edu Thu Mar 18 17:28:07 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 18 Mar 2010 21:28:07 -0000 Subject: [BioSQL-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4BA29706.8040606@cornell.edu> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2010 Administrator From rmb32 at cornell.edu Fri Mar 26 03:44:15 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 07:44:15 -0000 Subject: [BioSQL-l] GSoC mentors mailing list Message-ID: <4BAC65C9.307@cornell.edu> Hi all, If you have volunteered to be a possible GSoC mentor, and have not already been subscribed to the (mentors-only) gsoc-mentors mailing list, send me an email and I'll subscribe you. Rob Buels OBF GSoC 2010 Admin From rmb32 at cornell.edu Fri Mar 26 12:30:51 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 16:30:51 -0000 Subject: [BioSQL-l] Announcing OBF Summer of Code - please forward! Message-ID: <4BACE126.1030500@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 9! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2010 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010 Applications due 19:00 UTC, April 9, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 29 through Friday, April 9th, 2010. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2010 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs From lokeshkadyan858 at gmail.com Fri Mar 26 07:18:18 2010 From: lokeshkadyan858 at gmail.com (lokesh kadyan) Date: Fri, 26 Mar 2010 11:18:18 -0000 Subject: [BioSQL-l] GSOC project Message-ID: <9be8f3c21003260418v5270038esa3c3c71cdf4d5efd@mail.gmail.com> Hi, Sir, i am s student of India pursuing MSc. Biological Sciences along with computers. I want to take part in this project. And another thing i want to ask is i have also applied for BioJava project. Is there any problem if take part in two projects simultaneously? Regards Lokesh Kadyan From pprahul at gmail.com Tue Mar 30 02:49:00 2010 From: pprahul at gmail.com (Rahul Krishnan) Date: Tue, 30 Mar 2010 06:49:00 -0000 Subject: [BioSQL-l] GSoC Student Proposal Message-ID: <85fe38611003292348u38a2b685md1cb8bad69713a6a@mail.gmail.com> Hi, I am Rahul Krishnan, an undergrad student of Computer Science. I was going through the different mentoring organizations when I noticed the open bio project, and the BioSQL project interested me specifically. And I am new to this community :) I have more than an year's experience building and maintaining websites using various CMS including drupal, wp etc, which includes programming in php, sql and other web based tools and frameworks. I am also good at programming in various languages including c, c++, asp, java and am passionate about learning new technologies head on. Apart from that, I've been an open source contributor for over 2 years. This includes developing a java application for Android OS ( http://code.google.com/p/tictactoe4android), contributions to Haiku OS ( http://dev.osdrawer.net/users/224) and open solaris. I am also well versed with interacting with the community, using IRC, svn / git repos, and other essential tools for developing applications. I consider this as an opportunity to get introduced to a new community and further my skills by contributing to open source development. I read through the various possible enhancements listed at http://biosql.org/wiki/Enhancement_Requests. I would like to know if I should stick on to these (as I don't have deep knowledge about the openSQL source code). I would like to know if a combination of these enhancements would amount to a good GSoC proposal that would add value to openBio project. Cheers! -- Rahul Krishnan Amrita University '12 http://rahulkrishnanblogs.wordpress.com From robfsouza at gmail.com Wed Mar 17 21:04:47 2010 From: robfsouza at gmail.com (Robson de Souza) Date: Thu, 18 Mar 2010 01:04:47 -0000 Subject: [BioSQL-l] biosql and ontologies Message-ID: Hi! I have the following scenario to solve: I'm thinking of storing the annotations of genes, genome regions, proteins, etc in a BioSQL database but, since some customized ontologies we are developing are expected to evolve in paralllel with the annotation process and more than one person in my group should be able to edit the ontologies, I need to be able to modify the ontology iteratively in BioSQL. Therefore, my question is: are there any OBOEdit plugins to read/write ontologies directly in BioSQL?? Or Chado? Or some other scheme for community editing of an ontology? Or the same functionality in some other ontology editor? I am not sure whether downloading and reloading the ontology is an option. How does updating of ontogies via load_ont6ology.pl work? Could it be done simultaneously by two people? Cheers! Robson From pamedeo at jcvi.org Mon Mar 1 19:59:50 2010 From: pamedeo at jcvi.org (Amedeo, Paolo) Date: Mon, 1 Mar 2010 14:59:50 -0500 Subject: [BioSQL-l] Quick question about tools using BioSQL Message-ID: I'm evaluating the possibility of using BioSQL for a new genome annotation project that also involves loading annotated genomes from GenBank. So far, I have successfully deployed a working copy of the database under MySQL on my machine and loaded a couple of genomes from GenBank files, using the script load_seqdatabase.pl found in the scripts directory of BioPerl-db-1.6.0. Browsing the database, however, I have noticed a few things that concern me a little bit. First, the seqfeature_relationship table is completely empty and I couldn't identify any obvious way to investigate parent/child relationships between entities stored in seqfeature (e.g. in the case of overlapping genes, or genes embedded in introns of other genes, how one could determine to which gene a given CDS belongs?). Second, I was unable to find a dedicated script to populate the ontology table and I was somehow surprised that this table got somehow populated with the keywords present in the GenBank files. Third, once I have loaded a genome without first populating the taxon table and, as a result I have noticed that the values assigned to taxon.left_value and taxon.right_value described a narrow interval that didn't include at all the taxon_id of the genome loaded in the database. I then tried to use the script bioentry2flat.pl to try to write back to a gbf file the genome that I had loaded in the database. Unfortunately I couldn't find any documentation for this script and I've tried to use as values of the various parameters the same strings that I used with the other script. I had to edit the code to get rid of hard-coded values, but still I couldn't get the script to run successfully. I suspect that there is some problem with matching correctly the accession. Obviously I'm doing one or more things wrong and/or I'm not using the proper set of tools for doing what I need to do. I would really appreciate if somebody could point me to a set of tools that would allow me to load gbf files into the database and extract the individual accessions in both gbf and asn.1 (sqn) format, or teaches me how should I correctly use these two scripts, so that the bioentry_relationship table is populated correctly. Thanks for your consideration! Paolo Amedeo Senior Bioinformatics Engineer J. Craig Venter Institute 9704 Medical Center Dr. Rockville, MD 20850 From biopython at maubp.freeserve.co.uk Tue Mar 2 11:06:08 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Mar 2010 11:06:08 +0000 Subject: [BioSQL-l] Quick question about tools using BioSQL In-Reply-To: References: Message-ID: <320fb6e01003020306y317f2d00h78b9399a94e05b28@mail.gmail.com> On Mon, Mar 1, 2010 at 7:59 PM, Amedeo, Paolo wrote: > > I'm evaluating the possibility of using BioSQL for a new genome > annotation project that also involves loading annotated genomes from > GenBank. We looked at this too, but one major drawback of the BioSQL schema is it does not attempt to handle revisions (e.g. biologists manually curating the annotation). I think my colleague is planning on trying Artemis with a Chado database for this. > So far, I have successfully deployed a working copy of the database > under MySQL on my machine and loaded a couple of genomes from > GenBank files, using the script load_seqdatabase.pl found in the scripts > directory of BioPerl-db-1.6.0. You can also load GenBank files into BioSQL using BioJava, Biopython etc. They should give almost the same result in the database (since the BioPerl implementation is the de facto reference implementation). > Browsing the database, however, I have noticed a few things that > concern me a little bit. > > First, the seqfeature_relationship table is completely empty and I > couldn't identify any obvious way to investigate parent/child > relationships between entities stored in seqfeature (e.g. in the case of > overlapping genes, or genes embedded in introns of other genes, > how one could determine to which gene a given CDS belongs?). GenBank files do not hold that information, so why do you expect the parser to store it in the database? GFF files do hold this kind of information, and I am aware of a BioPerl script to turn GenBank files into GFF (GFF3?) but to do this requires a lot of inference about the gene model etc to generate the parent/child information. You could load the basic GenBank file into the database and then run your own analysis to populate the relationship tables with this kind of information. > Second, I was unable to find a dedicated script to populate the ontology > table and I was somehow surprised that this table got somehow populated > with the keywords present in the GenBank files. The BioPerl implementation (and others since) use an "ad hoc" ontology, basically a nasty hack. I don't think any of the Bio* binding for BioSQL implement a strict ontology check (but it would be nice). > Third, once I have loaded a genome without first populating the taxon > table and, as a result I have noticed that the values assigned to > taxon.left_value and taxon.right_value described a narrow interval that > didn't include at all the taxon_id of the genome loaded in the database. There are two IDs for each entry in the taxon table, the database key and the NCBI taxon id, which can be different. Biopython ignores the left/right values, so I haven't looked at this recently. Could you clarify with an example? > I then tried to use the script bioentry2flat.pl to try to write back to > a gbf file the genome that I had loaded in the database. Unfortunately I > couldn't find any documentation for this script and I've tried to use as > values of the various parameters the same strings that I used with the > other script. ?I had to edit the code to get rid of hard-coded values, > but still I couldn't get the script to run successfully. I suspect that > there is some problem with matching correctly the accession. I've not used that script (I would use Biopython, just a few lines to retrieve the entry from the BioSQL database, then give it to our SeqIO module for output as a GenBank file). > Obviously I'm doing one or more things wrong and/or I'm not using the > proper set of tools for doing what I need to do. I'm not sure if BioSQL does exactly what you expect. > I would really appreciate if somebody could point me to a set of tools > that would allow me to load gbf files into the database and extract the > individual accessions in both gbf and asn.1 (sqn) format, or teaches me > how should I correctly use these two scripts, so that the > bioentry_relationship table is populated correctly. When you say gbf do you mean GenBank? Most people use the shorthand gb or gbk (based on the typical file extensions). Could you give an example of the sort information you are hoping to find in the bioentry_relationship table? Peter From biopython at maubp.freeserve.co.uk Thu Mar 4 23:15:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Mar 2010 23:15:12 +0000 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> Message-ID: <320fb6e01003041515w19cd91bfq2842a915993902fb@mail.gmail.com> On Wed, Jan 13, 2010 at 5:06 PM, Hilmar Lapp wrote: > > Hi Peter, yes, I know I'm remiss on doing that. Will do shortly. Please > don't stop pestering if I seem to have forgotten :-) > > ? ? ? ?-hilmar Hi Hilmar, This is more a gentle reminder than pestering, but could you review Brad's BioSQL schema for SQLite for committing to the SVN? We've not had any issues reported and it has had testing (via the Biopython unit tests) on Linux, Windows and Mac OS X. Thanks, Peter For anyone interested in the previous posts on the thread, http://lists.open-bio.org/pipermail/biosql-l/2010-January/001668.html http://lists.open-bio.org/pipermail/biosql-l/2009-December/001660.html etc From pprahul at gmail.com Tue Mar 30 06:53:15 2010 From: pprahul at gmail.com (Rahul Krishnan) Date: Tue, 30 Mar 2010 12:23:15 +0530 Subject: [BioSQL-l] GSoC Student Proposal Message-ID: <85fe38611003292353j2d6c1d5bj493a926305665567@mail.gmail.com> Hi, I am Rahul Krishnan, an undergrad student of Computer Science. I was going through the different mentoring organizations when I noticed the open bio project, and the BioSQL project interested me specifically. And I am new to this community :) I have more than an year's experience building and maintaining websites using various CMS including drupal, wp etc, which includes programming in php, sql and other web based tools and frameworks. I am also good at programming in various languages including c, c++, asp, java and am passionate about learning new technologies head on. Apart from that, I've been an open source contributor for over 2 years. This includes developing a java application for Android OS ( http://code.google.com/p/tictactoe4android), contributions to Haiku OS ( http://dev.osdrawer.net/users/224) and open solaris. I am also well versed with interacting with the community, using IRC, svn / git repos, and other essential tools for developing applications. I consider this as an opportunity to get introduced to a new community and further my skills by contributing to open source development. I read through the various possible enhancements listed at http://biosql.org/wiki/Enhancement_Requests. I would like to know if I should stick on to these (as I don't have deep knowledge about the openSQL source code). I would like to know if a combination of these enhancements would amount to a good GSoC proposal that would add value to openBio project. Cheers! -- Rahul Krishnan Amrita University '12 http://rahulkrishnanblogs.wordpress.com From alice.garcia at labri.fr Tue Mar 23 11:31:34 2010 From: alice.garcia at labri.fr (Alice Garcia) Date: Tue, 23 Mar 2010 11:31:34 -0000 Subject: [BioSQL-l] Article about BioSQL Message-ID: <4BA8A1E5.9000607@labri.fr> Dear all, I'm working in the LaBRI in Bordeaux (France) in a team using BioSQL. For a future article, I want to know if there is any article on BioSQL to cite it. I could not find any information on the web. Thank you for your help. All the best, Alice Garcia From rmb32 at cornell.edu Thu Mar 18 21:28:07 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 18 Mar 2010 21:28:07 -0000 Subject: [BioSQL-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4BA29706.8040606@cornell.edu> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2010 Administrator From rmb32 at cornell.edu Fri Mar 26 07:44:15 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 07:44:15 -0000 Subject: [BioSQL-l] GSoC mentors mailing list Message-ID: <4BAC65C9.307@cornell.edu> Hi all, If you have volunteered to be a possible GSoC mentor, and have not already been subscribed to the (mentors-only) gsoc-mentors mailing list, send me an email and I'll subscribe you. Rob Buels OBF GSoC 2010 Admin From rmb32 at cornell.edu Fri Mar 26 16:30:51 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 16:30:51 -0000 Subject: [BioSQL-l] Announcing OBF Summer of Code - please forward! Message-ID: <4BACE126.1030500@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 9! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2010 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010 Applications due 19:00 UTC, April 9, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 29 through Friday, April 9th, 2010. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2010 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs From lokeshkadyan858 at gmail.com Fri Mar 26 11:18:18 2010 From: lokeshkadyan858 at gmail.com (lokesh kadyan) Date: Fri, 26 Mar 2010 11:18:18 -0000 Subject: [BioSQL-l] GSOC project Message-ID: <9be8f3c21003260418v5270038esa3c3c71cdf4d5efd@mail.gmail.com> Hi, Sir, i am s student of India pursuing MSc. Biological Sciences along with computers. I want to take part in this project. And another thing i want to ask is i have also applied for BioJava project. Is there any problem if take part in two projects simultaneously? Regards Lokesh Kadyan From pprahul at gmail.com Tue Mar 30 06:49:00 2010 From: pprahul at gmail.com (Rahul Krishnan) Date: Tue, 30 Mar 2010 06:49:00 -0000 Subject: [BioSQL-l] GSoC Student Proposal Message-ID: <85fe38611003292348u38a2b685md1cb8bad69713a6a@mail.gmail.com> Hi, I am Rahul Krishnan, an undergrad student of Computer Science. I was going through the different mentoring organizations when I noticed the open bio project, and the BioSQL project interested me specifically. And I am new to this community :) I have more than an year's experience building and maintaining websites using various CMS including drupal, wp etc, which includes programming in php, sql and other web based tools and frameworks. I am also good at programming in various languages including c, c++, asp, java and am passionate about learning new technologies head on. Apart from that, I've been an open source contributor for over 2 years. This includes developing a java application for Android OS ( http://code.google.com/p/tictactoe4android), contributions to Haiku OS ( http://dev.osdrawer.net/users/224) and open solaris. I am also well versed with interacting with the community, using IRC, svn / git repos, and other essential tools for developing applications. I consider this as an opportunity to get introduced to a new community and further my skills by contributing to open source development. I read through the various possible enhancements listed at http://biosql.org/wiki/Enhancement_Requests. I would like to know if I should stick on to these (as I don't have deep knowledge about the openSQL source code). I would like to know if a combination of these enhancements would amount to a good GSoC proposal that would add value to openBio project. Cheers! -- Rahul Krishnan Amrita University '12 http://rahulkrishnanblogs.wordpress.com From robfsouza at gmail.com Thu Mar 18 01:04:47 2010 From: robfsouza at gmail.com (Robson de Souza) Date: Thu, 18 Mar 2010 01:04:47 -0000 Subject: [BioSQL-l] biosql and ontologies Message-ID: Hi! I have the following scenario to solve: I'm thinking of storing the annotations of genes, genome regions, proteins, etc in a BioSQL database but, since some customized ontologies we are developing are expected to evolve in paralllel with the annotation process and more than one person in my group should be able to edit the ontologies, I need to be able to modify the ontology iteratively in BioSQL. Therefore, my question is: are there any OBOEdit plugins to read/write ontologies directly in BioSQL?? Or Chado? Or some other scheme for community editing of an ontology? Or the same functionality in some other ontology editor? I am not sure whether downloading and reloading the ontology is an option. How does updating of ontogies via load_ont6ology.pl work? Could it be done simultaneously by two people? Cheers! Robson