From biopython at maubp.freeserve.co.uk Thu Apr 15 13:54:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Apr 2010 18:54:56 +0100 Subject: [BioSQL-l] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: Hi, I've CC'd this to the BioSQL mailing list for cross project discussion. On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland wrote: > Thanks Deepak. > > I've had a look at the code and I believe its due to the > different ways in which BioJava and BioPerl load the > taxon table. > > BioJava sets the ncbi_taxon_id and parent_taxon_id > columns based on the values from the NCBI taxonomy > file. The taxon_id column in BioJava is a meaningless > auto-generated value that is never used. > > BioPerl however is generating taxon_id values and > linking them by setting parent_taxon_id to the > generated value. The parent value from the NCBI > taxonomy file is therefore replaced with the BioPerl > generated parent ID, meaning that instead of linking > from parent_taxon_id to ncbi_taxon_id as per BioJava, > the link is to taxon_id instead. (I'm basing this > comment on looking at load_ncbi_taxonomy.pl from > the BioSQL archives.) Note that old versions of load_ncbi_taxonomy.pl (which is part of BioSQL, not part of BioPerl) would set taxon_id equal to ncbi_taxon_id, see: http://bugzilla.open-bio.org/show_bug.cgi?id=2470 This may help explain the confusion. > I believe if you load the taxonomy table using BioJava, > you should see BioJava giving correct behaviour. > Likewise if you load it using BioPerl, BioPerl will > behave correctly. But if you load with one then query > with the other, you'll get incorrect results. > > This sounds like a case for discussion on both lists - > a matter of standardisation between the two projects. > Not quickly/easily solvable for now. Its not just two projects (BioPerl & BioJava) (grin). Its at least five projects (BioSQL itself plus BioRuby and Biopython). I'm not sure about BioRuby's implementation, but currently I think BioJava is the odd one out - BioPerl, Biopython, and the BioSQL's load_ncbi_taxonomy.pl all make entries in parent_taxon_id reference the automatically generated taxon_id (please correct me if I am wrong). My personal view is that bioperl-db is the reference implementation and should be followed in the event of any ambiguity within BioSQL. In this particular case, there is actually a BioSQL script to check against too (load_ncbi_taxonomy.pl). Hopefully Hilmar can give us an official verdict... Peter From rmb32 at cornell.edu Sat Apr 3 16:09:27 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 03 Apr 2010 13:09:27 -0700 Subject: [BioSQL-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4BB7A077.4070802@cornell.edu> Hi all, Reminder: GSoC student proposals must be submitted to Google by April 9th, 19:00 UTC. That's less than a week away. Students: you should ALREADY be working with mentors on the project mailing lists, they can help you get your proposal into shape. So far, we have 5 proposals submitted to our org in Google's web app. Keep them coming, and let's see some really good ones! Rob Buels OBF GSoC 2010 Administrator From rmb32 at cornell.edu Sun Apr 4 00:37:38 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 03 Apr 2010 21:37:38 -0700 Subject: [BioSQL-l] Reminder: GSoC student applications due April 9, 19:00 UTC Message-ID: <4BB81792.8060001@cornell.edu> Hi all, Sending this again with a different subject line, just in case. GSoC student proposals must be submitted to Google through their web application by *April 9th, 19:00 UTC*. That's less than a week away. Students: you should ALREADY be working with mentors on the project mailing lists, they can help you get your proposal into shape. So far, we have 6 proposals submitted to our org in Google's web app. Keep them coming, and keep them good! Rob Buels OBF GSoC 2010 Administrator From rohitrrj at gmail.com Mon Apr 5 14:14:30 2010 From: rohitrrj at gmail.com (Rohit Jadhav) Date: Mon, 5 Apr 2010 23:44:30 +0530 Subject: [BioSQL-l] Student internship program for open-source projects Message-ID: Dear Sir/Madam, This has reference to Google summar code programme. I am Rohit Jadhav, a Masters student in Bioinformatics at Indiana University Purdue University Indianapolis (USA). I am looking for a Co-op/Internship position for the Summer 2010. My areas of interest include Bioinformatics, Data Mining and Systems Biology. It is worth mentioning something about some of my courses I took recently in the Fall semester of 2009, which have contributed in inspiring me to work in the areas of data mining and systems biology. The Introduction courses in Informatics and Bioinformatics were instrumental for me in getting the current state of the art knowledge in these areas. The advance course in Biostatistics helped me in solving the statistical questions problems addressed in most of the bioinformatics papers. In the spring 2010 I am taking a course on biological database management which will help me in improving my knowledge in biological databases. The course on computational systems biology is a research oriented course which will help me in keeping up with the current advances in the area. The translational bioinformatics course will help me build on my current knowledge on dealing the high-throughput techniques like microarrays. I have also worked as a web developer at the university?s information technology services department, where I was a part of the web tech services team. It was really a valuable experience as it made me work and get experience on almost all the stages of the website development life cycle right from understanding the complex problem, data retrieval, designing, development and testing, also implementing my knowledge in perl, C#, Java and other languages. I have an undergraduate degree in Bioinformatics and am currently pursuing my further studies in the field as a graduate student. I am an Indian national with Indian citizenship. I had already applied on line and I'll be glad to furnish any more details about the projects I had undertaken. I am Looking forward to hear from you. Sincerely, -- Rohit Jadhav -------------- next part -------------- A non-text attachment was scrubbed... Name: RohitJadhav-resume.doc Type: application/msword Size: 48128 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: RohitJadhav-SOP.pdf Type: application/pdf Size: 27547 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Thu Apr 15 14:23:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Apr 2010 19:23:52 +0100 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> Message-ID: On Wed, Jan 13, 2010 at 6:06 PM, Hilmar Lapp wrote: > > Hi Peter, yes, I know I'm remiss on doing that. Will do shortly. Please > don't stop pestering if I seem to have forgotten :-) > > ? ? ? ?-hilmar Cough cough ;-) Peter From biopython at maubp.freeserve.co.uk Thu Apr 15 14:34:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Apr 2010 19:34:26 +0100 Subject: [BioSQL-l] Student internship program for open-source projects In-Reply-To: References: Message-ID: On Mon, Apr 5, 2010 at 7:14 PM, Rohit Jadhav wrote: > Dear Sir/Madam, > This has reference to Google summar code programme. > Hi Rohit, It seems you (and a few other students) had tried emailing the BioSQL mailing list without first subscribing, and your messages were held in a moderation queue until recently. Hopefully Robert or Hilmar replied to you directly since the GSoC application deadline has now passed. Peter From rmb32 at cornell.edu Tue Apr 27 01:52:57 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 26 Apr 2010 22:52:57 -0700 Subject: [BioSQL-l] Google Summer of Code - accepted students Message-ID: <4BD67BB9.3000804@cornell.edu> Hi all, I'm pleased to announce the acceptance of OBF's 2010 Google Summer of Code students, listed in alphabetical order with their project titles and primary mentors: Mark Chapman (PM Andreas Prlic) - Improvements to BioJava including Implementation of Multiple Sequence Alignment Algorithms Jianjiong Gao (PM Peter Rose) - BioJava Packages for Identification, Classification, and Visualization of Posttranslational Modification of Proteins Kazuhiro Hayashi (PM Naohisa Goto) - Ruby 1.9.2 support of BioRuby Sara Rayburn (PM Christian Zmasek) - Implementing Speciation & Duplication Inference Algorithm for Binary and Non-binary Species Tree Joao Pedro Garcia Lopes Maia Rodrigues (PM Eric Talevich) - Extending Bio.PDB: broadening the usefulness of BioPython's Structural Biology module Jun Yin (PM Chris Fields) - BioPerl Alignment Subsystem Refactoring Congratulations to our accepted students! All told, we had 52 applications submitted for the 6 slots (5 originally assigned, plus 1 extra) allotted to us by Google. Proposals were extremely competitive: 6 out of 52 translates to an 11.5% acceptance rate. We received a lot of really excellent proposals, the decisions were not easy. Thanks very much to all the students who applied, we very much appreciate your hard work. Here's to a great 2010 Summer of Code, I'm sure these students will do some wonderful work. Rob Buels OBF GSoC 2010 Administrator From sheoran143 at gmail.com Fri Apr 16 14:43:55 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 16 Apr 2010 18:43:55 -0000 Subject: [BioSQL-l] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: <4BC8AFEF.70107@gmail.com> What my experience says on this issue we should make use of taxon_id because its a unique key in a local instance of biosql. ncbi_taxon_id should only be used for mapping purpose only so that a person can map his local taxon_id to a ncbi_taxon_id otherwise it defeat the sole purpose of having taxon_id as primary key in taxon table. The main goal which I think when biosql is designed is to make it independent of any other organization like genbank or NCBI but its a feature so that we can map a number(ncbi_taxon_id) given by a know authority to a local number (taxon_id). Deepak Sheoran On 4/15/2010 12:54 PM, Peter wrote: > Hi, > > I've CC'd this to the BioSQL mailing list for cross project > discussion. > > On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland wrote: > >> Thanks Deepak. >> >> I've had a look at the code and I believe its due to the >> different ways in which BioJava and BioPerl load the >> taxon table. >> >> BioJava sets the ncbi_taxon_id and parent_taxon_id >> columns based on the values from the NCBI taxonomy >> file. The taxon_id column in BioJava is a meaningless >> auto-generated value that is never used. >> >> BioPerl however is generating taxon_id values and >> linking them by setting parent_taxon_id to the >> generated value. The parent value from the NCBI >> taxonomy file is therefore replaced with the BioPerl >> generated parent ID, meaning that instead of linking >> from parent_taxon_id to ncbi_taxon_id as per BioJava, >> the link is to taxon_id instead. (I'm basing this >> comment on looking at load_ncbi_taxonomy.pl from >> the BioSQL archives.) >> > Note that old versions of load_ncbi_taxonomy.pl > (which is part of BioSQL, not part of BioPerl) would > set taxon_id equal to ncbi_taxon_id, see: > http://bugzilla.open-bio.org/show_bug.cgi?id=2470 > > This may help explain the confusion. > > >> I believe if you load the taxonomy table using BioJava, >> you should see BioJava giving correct behaviour. >> Likewise if you load it using BioPerl, BioPerl will >> behave correctly. But if you load with one then query >> with the other, you'll get incorrect results. >> >> This sounds like a case for discussion on both lists - >> a matter of standardisation between the two projects. >> Not quickly/easily solvable for now. >> > Its not just two projects (BioPerl& BioJava) (grin). > Its at least five projects (BioSQL itself plus BioRuby > and Biopython). > > I'm not sure about BioRuby's implementation, but > currently I think BioJava is the odd one out - BioPerl, > Biopython, and the BioSQL's load_ncbi_taxonomy.pl > all make entries in parent_taxon_id reference the > automatically generated taxon_id (please correct > me if I am wrong). > > My personal view is that bioperl-db is the reference > implementation and should be followed in the event > of any ambiguity within BioSQL. In this particular > case, there is actually a BioSQL script to check > against too (load_ncbi_taxonomy.pl). > > Hopefully Hilmar can give us an official verdict... > > Peter > From rmb32 at cornell.edu Mon Apr 26 18:54:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 26 Apr 2010 22:54:52 -0000 Subject: [BioSQL-l] Google Summer of Code - accepted students Message-ID: <4BD60D63.1040400@cornell.edu> Hi all, I'm pleased to announce the acceptance of OBF's 2010 Google Summer of Code students, listed in alphabetical order with their project titles and primary mentors: Mark Chapman (PM Andreas Prlic) - Improvements to BioJava including Implementation of Multiple Sequence Alignment Algorithms Jianjiong Gao (PM Peter Rose) - BioJava Packages for Identification, Classification, and Visualization of Posttranslational Modification of Proteins Kazuhiro Hayashi (PM Naohisa Goto) - Ruby 1.9.2 support of BioRuby Sara Rayburn (PM Christian Zmasek) - Implementing Speciation & Duplication Inference Algorithm for Binary and Non-binary Species Tree Joao Pedro Garcia Lopes Maia Rodrigues (PM Eric Talevich) - Extending Bio.PDB: broadening the usefulness of BioPython's Structural Biology module Jun Yin (PM Chris Fields) - BioPerl Alignment Subsystem Refactoring Congratulations to our accepted students! All told, we had 52 applications submitted for the 6 slots (5 originally assigned, plus 1 extra) allotted to us by Google. Proposals were extremely competitive: 6 out of 52 translates to an 11.5% acceptance rate. We received a lot of really excellent proposals, the decisions were not easy. Thanks very much to all the students who applied, we very much appreciate your hard work. Here's to a great 2010 Summer of Code, I'm sure these students will do some wonderful work. Rob Buels OBF GSoC 2010 Administrator From biopython at maubp.freeserve.co.uk Thu Apr 15 17:54:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Apr 2010 18:54:56 +0100 Subject: [BioSQL-l] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: Hi, I've CC'd this to the BioSQL mailing list for cross project discussion. On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland wrote: > Thanks Deepak. > > I've had a look at the code and I believe its due to the > different ways in which BioJava and BioPerl load the > taxon table. > > BioJava sets the ncbi_taxon_id and parent_taxon_id > columns based on the values from the NCBI taxonomy > file. The taxon_id column in BioJava is a meaningless > auto-generated value that is never used. > > BioPerl however is generating taxon_id values and > linking them by setting parent_taxon_id to the > generated value. The parent value from the NCBI > taxonomy file is therefore replaced with the BioPerl > generated parent ID, meaning that instead of linking > from parent_taxon_id to ncbi_taxon_id as per BioJava, > the link is to taxon_id instead. (I'm basing this > comment on looking at load_ncbi_taxonomy.pl from > the BioSQL archives.) Note that old versions of load_ncbi_taxonomy.pl (which is part of BioSQL, not part of BioPerl) would set taxon_id equal to ncbi_taxon_id, see: http://bugzilla.open-bio.org/show_bug.cgi?id=2470 This may help explain the confusion. > I believe if you load the taxonomy table using BioJava, > you should see BioJava giving correct behaviour. > Likewise if you load it using BioPerl, BioPerl will > behave correctly. But if you load with one then query > with the other, you'll get incorrect results. > > This sounds like a case for discussion on both lists - > a matter of standardisation between the two projects. > Not quickly/easily solvable for now. Its not just two projects (BioPerl & BioJava) (grin). Its at least five projects (BioSQL itself plus BioRuby and Biopython). I'm not sure about BioRuby's implementation, but currently I think BioJava is the odd one out - BioPerl, Biopython, and the BioSQL's load_ncbi_taxonomy.pl all make entries in parent_taxon_id reference the automatically generated taxon_id (please correct me if I am wrong). My personal view is that bioperl-db is the reference implementation and should be followed in the event of any ambiguity within BioSQL. In this particular case, there is actually a BioSQL script to check against too (load_ncbi_taxonomy.pl). Hopefully Hilmar can give us an official verdict... Peter From rmb32 at cornell.edu Sat Apr 3 20:09:27 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 03 Apr 2010 13:09:27 -0700 Subject: [BioSQL-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4BB7A077.4070802@cornell.edu> Hi all, Reminder: GSoC student proposals must be submitted to Google by April 9th, 19:00 UTC. That's less than a week away. Students: you should ALREADY be working with mentors on the project mailing lists, they can help you get your proposal into shape. So far, we have 5 proposals submitted to our org in Google's web app. Keep them coming, and let's see some really good ones! Rob Buels OBF GSoC 2010 Administrator From rmb32 at cornell.edu Sun Apr 4 04:37:38 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 03 Apr 2010 21:37:38 -0700 Subject: [BioSQL-l] Reminder: GSoC student applications due April 9, 19:00 UTC Message-ID: <4BB81792.8060001@cornell.edu> Hi all, Sending this again with a different subject line, just in case. GSoC student proposals must be submitted to Google through their web application by *April 9th, 19:00 UTC*. That's less than a week away. Students: you should ALREADY be working with mentors on the project mailing lists, they can help you get your proposal into shape. So far, we have 6 proposals submitted to our org in Google's web app. Keep them coming, and keep them good! Rob Buels OBF GSoC 2010 Administrator From rohitrrj at gmail.com Mon Apr 5 18:14:30 2010 From: rohitrrj at gmail.com (Rohit Jadhav) Date: Mon, 5 Apr 2010 23:44:30 +0530 Subject: [BioSQL-l] Student internship program for open-source projects Message-ID: Dear Sir/Madam, This has reference to Google summar code programme. I am Rohit Jadhav, a Masters student in Bioinformatics at Indiana University Purdue University Indianapolis (USA). I am looking for a Co-op/Internship position for the Summer 2010. My areas of interest include Bioinformatics, Data Mining and Systems Biology. It is worth mentioning something about some of my courses I took recently in the Fall semester of 2009, which have contributed in inspiring me to work in the areas of data mining and systems biology. The Introduction courses in Informatics and Bioinformatics were instrumental for me in getting the current state of the art knowledge in these areas. The advance course in Biostatistics helped me in solving the statistical questions problems addressed in most of the bioinformatics papers. In the spring 2010 I am taking a course on biological database management which will help me in improving my knowledge in biological databases. The course on computational systems biology is a research oriented course which will help me in keeping up with the current advances in the area. The translational bioinformatics course will help me build on my current knowledge on dealing the high-throughput techniques like microarrays. I have also worked as a web developer at the university?s information technology services department, where I was a part of the web tech services team. It was really a valuable experience as it made me work and get experience on almost all the stages of the website development life cycle right from understanding the complex problem, data retrieval, designing, development and testing, also implementing my knowledge in perl, C#, Java and other languages. I have an undergraduate degree in Bioinformatics and am currently pursuing my further studies in the field as a graduate student. I am an Indian national with Indian citizenship. I had already applied on line and I'll be glad to furnish any more details about the projects I had undertaken. I am Looking forward to hear from you. Sincerely, -- Rohit Jadhav -------------- next part -------------- A non-text attachment was scrubbed... Name: RohitJadhav-resume.doc Type: application/msword Size: 48128 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: RohitJadhav-SOP.pdf Type: application/pdf Size: 27547 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Thu Apr 15 18:23:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Apr 2010 19:23:52 +0100 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <1f864af10812150224y540f1ba6y6b30168102885fcd@mail.gmail.com> <320fb6e00907050324i6d64d3abreb4d0c256bf1bdc4@mail.gmail.com> <320fb6e00907090529t61239952y1c86963f13c1db78@mail.gmail.com> <320fb6e00907280458q56f74ec6iefa420ac1caab8da@mail.gmail.com> <320fb6e00911240627o49bc1ec9nc0d26065ebc23423@mail.gmail.com> <070E8BA8-B2C1-4E44-AA2D-9934B3742406@illinois.edu> <320fb6e00911240907u32dca751ldb488cbc38f0e035@mail.gmail.com> <320fb6e00912100703g4e2b7068jb4fea67df3ebd8a8@mail.gmail.com> <320fb6e01001130337p1e0a361ci7ea1a5b5a9639731@mail.gmail.com> Message-ID: On Wed, Jan 13, 2010 at 6:06 PM, Hilmar Lapp wrote: > > Hi Peter, yes, I know I'm remiss on doing that. Will do shortly. Please > don't stop pestering if I seem to have forgotten :-) > > ? ? ? ?-hilmar Cough cough ;-) Peter From biopython at maubp.freeserve.co.uk Thu Apr 15 18:34:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Apr 2010 19:34:26 +0100 Subject: [BioSQL-l] Student internship program for open-source projects In-Reply-To: References: Message-ID: On Mon, Apr 5, 2010 at 7:14 PM, Rohit Jadhav wrote: > Dear Sir/Madam, > This has reference to Google summar code programme. > Hi Rohit, It seems you (and a few other students) had tried emailing the BioSQL mailing list without first subscribing, and your messages were held in a moderation queue until recently. Hopefully Robert or Hilmar replied to you directly since the GSoC application deadline has now passed. Peter From rmb32 at cornell.edu Tue Apr 27 05:52:57 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 26 Apr 2010 22:52:57 -0700 Subject: [BioSQL-l] Google Summer of Code - accepted students Message-ID: <4BD67BB9.3000804@cornell.edu> Hi all, I'm pleased to announce the acceptance of OBF's 2010 Google Summer of Code students, listed in alphabetical order with their project titles and primary mentors: Mark Chapman (PM Andreas Prlic) - Improvements to BioJava including Implementation of Multiple Sequence Alignment Algorithms Jianjiong Gao (PM Peter Rose) - BioJava Packages for Identification, Classification, and Visualization of Posttranslational Modification of Proteins Kazuhiro Hayashi (PM Naohisa Goto) - Ruby 1.9.2 support of BioRuby Sara Rayburn (PM Christian Zmasek) - Implementing Speciation & Duplication Inference Algorithm for Binary and Non-binary Species Tree Joao Pedro Garcia Lopes Maia Rodrigues (PM Eric Talevich) - Extending Bio.PDB: broadening the usefulness of BioPython's Structural Biology module Jun Yin (PM Chris Fields) - BioPerl Alignment Subsystem Refactoring Congratulations to our accepted students! All told, we had 52 applications submitted for the 6 slots (5 originally assigned, plus 1 extra) allotted to us by Google. Proposals were extremely competitive: 6 out of 52 translates to an 11.5% acceptance rate. We received a lot of really excellent proposals, the decisions were not easy. Thanks very much to all the students who applied, we very much appreciate your hard work. Here's to a great 2010 Summer of Code, I'm sure these students will do some wonderful work. Rob Buels OBF GSoC 2010 Administrator From sheoran143 at gmail.com Fri Apr 16 18:43:55 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 16 Apr 2010 18:43:55 -0000 Subject: [BioSQL-l] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: <4BC8AFEF.70107@gmail.com> What my experience says on this issue we should make use of taxon_id because its a unique key in a local instance of biosql. ncbi_taxon_id should only be used for mapping purpose only so that a person can map his local taxon_id to a ncbi_taxon_id otherwise it defeat the sole purpose of having taxon_id as primary key in taxon table. The main goal which I think when biosql is designed is to make it independent of any other organization like genbank or NCBI but its a feature so that we can map a number(ncbi_taxon_id) given by a know authority to a local number (taxon_id). Deepak Sheoran On 4/15/2010 12:54 PM, Peter wrote: > Hi, > > I've CC'd this to the BioSQL mailing list for cross project > discussion. > > On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland wrote: > >> Thanks Deepak. >> >> I've had a look at the code and I believe its due to the >> different ways in which BioJava and BioPerl load the >> taxon table. >> >> BioJava sets the ncbi_taxon_id and parent_taxon_id >> columns based on the values from the NCBI taxonomy >> file. The taxon_id column in BioJava is a meaningless >> auto-generated value that is never used. >> >> BioPerl however is generating taxon_id values and >> linking them by setting parent_taxon_id to the >> generated value. The parent value from the NCBI >> taxonomy file is therefore replaced with the BioPerl >> generated parent ID, meaning that instead of linking >> from parent_taxon_id to ncbi_taxon_id as per BioJava, >> the link is to taxon_id instead. (I'm basing this >> comment on looking at load_ncbi_taxonomy.pl from >> the BioSQL archives.) >> > Note that old versions of load_ncbi_taxonomy.pl > (which is part of BioSQL, not part of BioPerl) would > set taxon_id equal to ncbi_taxon_id, see: > http://bugzilla.open-bio.org/show_bug.cgi?id=2470 > > This may help explain the confusion. > > >> I believe if you load the taxonomy table using BioJava, >> you should see BioJava giving correct behaviour. >> Likewise if you load it using BioPerl, BioPerl will >> behave correctly. But if you load with one then query >> with the other, you'll get incorrect results. >> >> This sounds like a case for discussion on both lists - >> a matter of standardisation between the two projects. >> Not quickly/easily solvable for now. >> > Its not just two projects (BioPerl& BioJava) (grin). > Its at least five projects (BioSQL itself plus BioRuby > and Biopython). > > I'm not sure about BioRuby's implementation, but > currently I think BioJava is the odd one out - BioPerl, > Biopython, and the BioSQL's load_ncbi_taxonomy.pl > all make entries in parent_taxon_id reference the > automatically generated taxon_id (please correct > me if I am wrong). > > My personal view is that bioperl-db is the reference > implementation and should be followed in the event > of any ambiguity within BioSQL. In this particular > case, there is actually a BioSQL script to check > against too (load_ncbi_taxonomy.pl). > > Hopefully Hilmar can give us an official verdict... > > Peter > From rmb32 at cornell.edu Mon Apr 26 22:54:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 26 Apr 2010 22:54:52 -0000 Subject: [BioSQL-l] Google Summer of Code - accepted students Message-ID: <4BD60D63.1040400@cornell.edu> Hi all, I'm pleased to announce the acceptance of OBF's 2010 Google Summer of Code students, listed in alphabetical order with their project titles and primary mentors: Mark Chapman (PM Andreas Prlic) - Improvements to BioJava including Implementation of Multiple Sequence Alignment Algorithms Jianjiong Gao (PM Peter Rose) - BioJava Packages for Identification, Classification, and Visualization of Posttranslational Modification of Proteins Kazuhiro Hayashi (PM Naohisa Goto) - Ruby 1.9.2 support of BioRuby Sara Rayburn (PM Christian Zmasek) - Implementing Speciation & Duplication Inference Algorithm for Binary and Non-binary Species Tree Joao Pedro Garcia Lopes Maia Rodrigues (PM Eric Talevich) - Extending Bio.PDB: broadening the usefulness of BioPython's Structural Biology module Jun Yin (PM Chris Fields) - BioPerl Alignment Subsystem Refactoring Congratulations to our accepted students! All told, we had 52 applications submitted for the 6 slots (5 originally assigned, plus 1 extra) allotted to us by Google. Proposals were extremely competitive: 6 out of 52 translates to an 11.5% acceptance rate. We received a lot of really excellent proposals, the decisions were not easy. Thanks very much to all the students who applied, we very much appreciate your hard work. Here's to a great 2010 Summer of Code, I'm sure these students will do some wonderful work. Rob Buels OBF GSoC 2010 Administrator