From jillyh0 at gmail.com Mon Mar 1 16:42:25 2010 From: jillyh0 at gmail.com (Jillian E Kozyra) Date: Mon, 1 Mar 2010 16:42:25 -0500 Subject: [BioRuby] Phylogenetic Trees or Hierarchical Clustering Message-ID: <9d7d43131003011342s3de1f182oacf6ce1e612a452a@mail.gmail.com> Dear Colleagues, We are working on a linguistics project in which we will calculate language similarities. From the language similarity matrix, we would like to create either a hierarchical clustering output or phylogenetic tree. We seek a pure Ruby plugin with which to do this. Could you give us some guidance? Thanks, Jillian -- 917-434-7511 http://sswl.railsplayground.net From bonnalraoul at ingm.it Mon Mar 8 08:28:16 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Mon, 8 Mar 2010 14:28:16 +0100 Subject: [BioRuby] RVM: Ruby Version Manager Message-ID: Do you know this http://rvm.beginrescueend.com/ tool for having multiple ruby environment installed at the same time ? RVM is a command line tool which allows us to easily install, manage and work with multiple ruby environments from interpreters to sets of gems. RVM itself is easy to install! I'm using it on a vm for developing and testing and it is awesome how it handles everything :-) Give it a try. -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 0 200 662 326 fax: +39 0 200 662 346 http://www.ingm.it From daniel.lundin at molbio.su.se Tue Mar 9 14:48:11 2010 From: daniel.lundin at molbio.su.se (Daniel Lundin) Date: Tue, 09 Mar 2010 20:48:11 +0100 Subject: [BioRuby] HMMER 3 parsers? Message-ID: <4B96A5FB.9060607@molbio.su.se> Hi, HMMER 3 is currently available as a first release candidate. With it comes several news both in the form of new tools and new kinds of data, which means output formats are changed. Is anybody working on BioRuby parsers for these? /D -- Daniel Lundin Department of Molecular Biology & Functional Genomics Arrhenius Laboratories for Natural Sciences Stockholm University, SE-106 91 Stockholm, Sweden tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 Email: daniel.lundin at molbio.su.se From rutgeraldo at gmail.com Wed Mar 10 08:22:48 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Wed, 10 Mar 2010 13:22:48 +0000 Subject: [BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC Message-ID: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> Dear BioRuby-ites, my apologies that my first email to this list is so long and tangential. I am trying to find out how to express RDF triples in BioRuby. In this email I'm explaining why I care enough to try to get funding for someone to work on this. If you don't care about any of this, you can stop reading now. The National Evolutionary Synthesis Center (NESCent.org) is planning to be a mentoring organization for the Google Summer of Code 2010. I have submitted a project idea to this: to develop NeXML I/O and - probably more importantly for you - RDF capabilities for BioRuby. If funded, a student/coder will work on this full time over the summer, under the shared supervision of Jan Aerts and myself. Here is the link: http://tinyurl.com/biorubynexml NeXML is a data format for phylogenetic data that can be read and written in perl, python, java and (to some extent) c++ and javascript. RDF is the cool "new" thing (as per BioHackathon2010), but as far as I can tell BioRuby isn't completely up to speed for it, yet. (As an aside: you might ask yourself why there is something like NeXML when there is PhyloXML for BioRuby. The answer is that NeXML solves a different problem: PhyloXML started essentially as a next generation of New Hampshire eXtended (NHX) to meet the annotation needs of comparative genomics, things such as gene duplications and other molecular evolution events, on phylogenetic trees; NeXML started as a complete XML representation of the NEXUS format, providing other comparative data types such as categorical and continuous character state matrices, restriction site matrices, and so on, in addition to trees, taxa, sequence alignments. There is obviously some overlap between the formats, but I guess that is not unique in bioinformatics :)) NeXML has a semantic annotation facility that uses RDFa. This allows us to add additional metadata to a fundamental phylogenetic data object (a tree, taxon, character, etc.) to form a "triple": the fundamental data object is the triple Subject, and the Predicate and Object are added as RDFa attributes. Since NeXML can be transformed using a standard XSL stylesheet to RDF/XML, we can express a limitless number of statements about phylogenetics. However, this means that any NeXML I/O library needs to be able to represent RDF triples. I have studied the BioRuby API as best as I could (but: I don't know ruby) and couldn't identify how to do this. My questions to you: * is there a way to express triples in BioRuby? * if there is not, what would be a good design to express triples in BioRuby so that this would be more useful than just for NeXML? Thank you! Rutger -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From ktym at hgc.jp Wed Mar 10 09:21:15 2010 From: ktym at hgc.jp (Toshiaki Katayama) Date: Wed, 10 Mar 2010 23:21:15 +0900 Subject: [BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC In-Reply-To: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> References: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> Message-ID: <9081A9B5-611C-45C2-A099-44BAF1E524F4@hgc.jp> Hi Rutger, Thank you for your inputs on GSoC 2010! > * is there a way to express triples in BioRuby? > * if there is not, what would be a good design to express triples in > BioRuby so that this would be more useful than just for NeXML? This is what we discussed during the pre-BioHackathon 2010. http://hackathon3.dbcls.jp/wiki/BioRuby My first idea was to make all BioRuby object have common output method to render the object contents in various formats (such as RDF/XML, Turtle, HTML, GFF, FASTA etc. if appropriate). Then, we tried to separate view from logic using erb, but as you see in the above page, it still looks ugly. It is mainly because view formatting itself requires some additional codes, specific to each format. Therefore, we don't have a solid conclusion on this yet, unfortunately. Anyway, we already have PubMed to RDF converter written in Ruby as the TogoWS REST API (http://togows.dbcls.jp/site/en/rest.html) at http://togows.dbcls.jp/entry/pubmed/16381885 --> http://togows.dbcls.jp/entry/pubmed/16381885.ttl and, we are also trying to support KEGG to RDF conversion in this framework as well. I think we can put the code in BioRuby when we finished. Your suggestions are welcome. :) Regards, Toshiaki On 2010/03/10, at 22:22, Rutger Vos wrote: > Dear BioRuby-ites, > > my apologies that my first email to this list is so long and > tangential. I am trying to find out how to express RDF triples in > BioRuby. In this email I'm explaining why I care enough to try to get > funding for someone to work on this. If you don't care about any of > this, you can stop reading now. > > The National Evolutionary Synthesis Center (NESCent.org) is planning > to be a mentoring organization for the Google Summer of Code 2010. I > have submitted a project idea to this: to develop NeXML I/O and - > probably more importantly for you - RDF capabilities for BioRuby. If > funded, a student/coder will work on this full time over the summer, > under the shared supervision of Jan Aerts and myself. Here is the > link: http://tinyurl.com/biorubynexml > > NeXML is a data format for phylogenetic data that can be read and > written in perl, python, java and (to some extent) c++ and javascript. > RDF is the cool "new" thing (as per BioHackathon2010), but as far as I > can tell BioRuby isn't completely up to speed for it, yet. > > (As an aside: you might ask yourself why there is something like NeXML > when there is PhyloXML for BioRuby. The answer is that NeXML solves a > different problem: PhyloXML started essentially as a next generation > of New Hampshire eXtended (NHX) to meet the annotation needs of > comparative genomics, things such as gene duplications and other > molecular evolution events, on phylogenetic trees; NeXML started as a > complete XML representation of the NEXUS format, providing other > comparative data types such as categorical and continuous character > state matrices, restriction site matrices, and so on, in addition to > trees, taxa, sequence alignments. There is obviously some overlap > between the formats, but I guess that is not unique in bioinformatics > :)) > > NeXML has a semantic annotation facility that uses RDFa. This allows > us to add additional metadata to a fundamental phylogenetic data > object (a tree, taxon, character, etc.) to form a "triple": the > fundamental data object is the triple Subject, and the Predicate and > Object are added as RDFa attributes. Since NeXML can be transformed > using a standard XSL stylesheet to RDF/XML, we can express a limitless > number of statements about phylogenetics. However, this means that any > NeXML I/O library needs to be able to represent RDF triples. I have > studied the BioRuby API as best as I could (but: I don't know ruby) > and couldn't identify how to do this. > > My questions to you: > > * is there a way to express triples in BioRuby? > * if there is not, what would be a good design to express triples in > BioRuby so that this would be more useful than just for NeXML? > > Thank you! > > Rutger > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From rutgeraldo at gmail.com Thu Mar 11 05:22:04 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Thu, 11 Mar 2010 10:22:04 +0000 Subject: [BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC In-Reply-To: <9081A9B5-611C-45C2-A099-44BAF1E524F4@hgc.jp> References: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> <9081A9B5-611C-45C2-A099-44BAF1E524F4@hgc.jp> Message-ID: <2bb9b24a1003110222h4bd642adv31d1975c9edc0bba@mail.gmail.com> Hi Toshiaki, great to hear there's already been a lot of discussion over this. (Well, I'd be surprised if there hadn't been :)) It looks to me like some fairly major bookkeeping would need to be implemented high up in the inheritance tree if *all* bioruby objects are to be serialized into RDF. It also would require all of bioruby to be ontologized in one fell swoop. It is perhaps more likely that subdomains are going to be ontologized more or less independently from one another (as you mention, references->RDF, or in my case phylogenetics->RDF) based implicitly on intermediate data formats (pubmed records and nexml, respectively). That is probably OK, we do things as needs arise. But what would be handy if the API was at least general enough so that this was extensible and we can make additional statements *about* objects when we serialize them to RDF. For example, in your pubmed turtle file, the subject is always . Is there a way, programmatically, where I can add additional statements about ? Rutger On Wed, Mar 10, 2010 at 2:21 PM, Toshiaki Katayama wrote: > Hi Rutger, > > Thank you for your inputs on GSoC 2010! > >> * is there a way to express triples in BioRuby? >> * if there is not, what would be a good design to express triples in >> BioRuby so that this would be more useful than just for NeXML? > > This is what we discussed during the pre-BioHackathon 2010. > > http://hackathon3.dbcls.jp/wiki/BioRuby > > My first idea was to make all BioRuby object have common output > method to render the object contents in various formats > (such as RDF/XML, Turtle, HTML, GFF, FASTA etc. if appropriate). > > Then, we tried to separate view from logic using erb, but as you > see in the above page, it still looks ugly. It is mainly because > view formatting itself requires some additional codes, specific > to each format. > > Therefore, we don't have a solid conclusion on this yet, unfortunately. > > Anyway, we already have PubMed to RDF converter written in Ruby as > the TogoWS REST API (http://togows.dbcls.jp/site/en/rest.html) at > > http://togows.dbcls.jp/entry/pubmed/16381885 > --> http://togows.dbcls.jp/entry/pubmed/16381885.ttl > > and, we are also trying to support KEGG to RDF conversion in this > framework as well. I think we can put the code in BioRuby when we finished. > > Your suggestions are welcome. :) > > Regards, > Toshiaki > > On 2010/03/10, at 22:22, Rutger Vos wrote: > >> Dear BioRuby-ites, >> >> my apologies that my first email to this list is so long and >> tangential. I am trying to find out how to express RDF triples in >> BioRuby. In this email I'm explaining why I care enough to try to get >> funding for someone to work on this. If you don't care about any of >> this, you can stop reading now. >> >> The National Evolutionary Synthesis Center (NESCent.org) is planning >> to be a mentoring organization for the Google Summer of Code 2010. I >> have submitted a project idea to this: to develop NeXML I/O and - >> probably more importantly for you - RDF capabilities for BioRuby. If >> funded, a student/coder will work on this full time over the summer, >> under the shared supervision of Jan Aerts and myself. Here is the >> link: http://tinyurl.com/biorubynexml >> >> NeXML is a data format for phylogenetic data that can be read and >> written in perl, python, java and (to some extent) c++ and javascript. >> RDF is the cool "new" thing (as per BioHackathon2010), but as far as I >> can tell BioRuby isn't completely up to speed for it, yet. >> >> (As an aside: you might ask yourself why there is something like NeXML >> when there is PhyloXML for BioRuby. The answer is that NeXML solves a >> different problem: PhyloXML started essentially as a next generation >> of New Hampshire eXtended (NHX) to meet the annotation needs of >> comparative genomics, things such as gene duplications and other >> molecular evolution events, on phylogenetic trees; NeXML started as a >> complete XML representation of the NEXUS format, providing other >> comparative data types such as categorical and continuous character >> state matrices, restriction site matrices, and so on, in addition to >> trees, taxa, sequence alignments. There is obviously some overlap >> between the formats, but I guess that is not unique in bioinformatics >> :)) >> >> NeXML has a semantic annotation facility that uses RDFa. This allows >> us to add additional metadata to a fundamental phylogenetic data >> object (a tree, taxon, character, etc.) to form a "triple": the >> fundamental data object is the triple Subject, and the Predicate and >> Object are added as RDFa attributes. Since NeXML can be transformed >> using a standard XSL stylesheet to RDF/XML, we can express a limitless >> number of statements about phylogenetics. However, this means that any >> NeXML I/O library needs to be able to represent RDF triples. I have >> studied the BioRuby API as best as I could (but: I don't know ruby) >> and couldn't identify how to do this. >> >> My questions to you: >> >> * is there a way to express triples in BioRuby? >> * if there is not, what would be a good design to express triples in >> BioRuby so that this would be more useful than just for NeXML? >> >> Thank you! >> >> Rutger >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading >> RG6 6BX >> United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://www.nexml.org >> http://rutgervos.blogspot.com >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From bonnalraoul at ingm.it Thu Mar 11 08:02:23 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Thu, 11 Mar 2010 14:02:23 +0100 Subject: [BioRuby] Ruby and Statistics Message-ID: <2122bfdf-d902-4be1-aef2-95013cea31f6@ingm.it> Hello Folks, I need to do statistical computations in Ruby, some time very basic operations like mean and stdv Which library do you suggest ? I don't want to use rsruby (R), for now. Er extend every time Array. I found this: ruby-statsample but I don't know if is the best one. -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 0 200 662 326 fax: +39 0 200 662 346 http://www.ingm.it From ngoto at gen-info.osaka-u.ac.jp Thu Mar 11 08:53:02 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 11 Mar 2010 22:53:02 +0900 Subject: [BioRuby] Ruby and Statistics In-Reply-To: <2122bfdf-d902-4be1-aef2-95013cea31f6@ingm.it> References: <2122bfdf-d902-4be1-aef2-95013cea31f6@ingm.it> Message-ID: <20100311135303.8C5201CBC41B@idnmail.gen-info.osaka-u.ac.jp> Hi, I found some modules, but I haven't used them. math-statistics: http://www.notwork.org/~gotoken/ruby/p/statistics/ statarray: http://rubyforge.org/projects/statarray/ ruby-stats: http://pallas.telperion.info/ruby-stats/ Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Thu, 11 Mar 2010 14:02:23 +0100 "Raoul Bonnal" wrote: > Hello Folks, > I need to do statistical computations in Ruby, some time very basic operations like mean and stdv > Which library do you suggest ? > I don't want to use rsruby (R), for now. Er extend every time Array. > > I found this: ruby-statsample but I don't know if is the best one. > > -- > Raoul J.P. Bonnal > Life Science Informatics > Integrative Biology Program > Fondazione INGM > Via F. Sforza 28 > 20122 Milano, IT > phone: +39 0 200 662 326 > fax: +39 0 200 662 346 > http://www.ingm.it > > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Thu Mar 11 09:12:49 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 11 Mar 2010 23:12:49 +0900 Subject: [BioRuby] HMMER 3 parsers? In-Reply-To: <4B96A5FB.9060607@molbio.su.se> References: <4B96A5FB.9060607@molbio.su.se> Message-ID: <20100311141250.789AA1CBC58F@idnmail.gen-info.osaka-u.ac.jp> Hi, Christian Zmasek are now working for the HMMER 3 support. It will be great if you can help us. http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer/hmmer3report.rb http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer3.rb Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Tue, 09 Mar 2010 20:48:11 +0100 Daniel Lundin wrote: > Hi, > > HMMER 3 is currently available as a first release candidate. With it > comes several news both in the form of new tools and new kinds of data, > which means output formats are changed. Is anybody working on BioRuby > parsers for these? > > /D > > -- > Daniel Lundin > > Department of Molecular Biology & Functional Genomics > Arrhenius Laboratories for Natural Sciences > Stockholm University, SE-106 91 Stockholm, Sweden > > tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 > > Email: daniel.lundin at molbio.su.se > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Thu Mar 11 09:59:11 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 11 Mar 2010 23:59:11 +0900 Subject: [BioRuby] Phylogenetic Trees or Hierarchical Clustering In-Reply-To: <9d7d43131003011342s3de1f182oacf6ce1e612a452a@mail.gmail.com> References: <9d7d43131003011342s3de1f182oacf6ce1e612a452a@mail.gmail.com> Message-ID: <20100311145912.CF9091CBC3DA@idnmail.gen-info.osaka-u.ac.jp> Hi, I always use phylogenetic tree construction software such as PHYLIP and MEGA4, and I don't know much about the pure Ruby solutions. Below are found by using Google search. There are some pure Ruby implementations of clustering algorithms, though I haven't used them. AI4R (Artificial Intelligence for Ruby): http://ai4r.rubyforge.org/ clusterer: http://rubyforge.org/projects/clusterer/ I found a phylogenetic tree visualization implementation written in JRuby, and I found it can also work with normal Ruby 1.8.7. Egan A et al. (2008) IDEA: Interactive Display for Evolutionary Analyses. BMC Bioinformatics 2008, 9:524 http://www.biomedcentral.com/1471-2105/9/524 http://ideanalyses.sourceforge.net/ Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Mon, 1 Mar 2010 16:42:25 -0500 Jillian E Kozyra wrote: > Dear Colleagues, > > We are working on a linguistics project in which we will calculate language > similarities. From the language similarity matrix, we would like to create > either a hierarchical clustering output or phylogenetic tree. We seek a pure > Ruby plugin with which to do this. Could you give us some guidance? > > Thanks, > Jillian > > -- > 917-434-7511 > http://sswl.railsplayground.net > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From daniel.lundin at molbio.su.se Thu Mar 11 11:18:25 2010 From: daniel.lundin at molbio.su.se (Daniel Lundin) Date: Thu, 11 Mar 2010 17:18:25 +0100 Subject: [BioRuby] HMMER 3 parsers? In-Reply-To: <20100311141250.789AA1CBC58F@idnmail.gen-info.osaka-u.ac.jp> References: <4B96A5FB.9060607@molbio.su.se> <20100311141250.789AA1CBC58F@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4B9917D1.9000702@molbio.su.se> Naohisa GOTO skrev: > Hi, > > Christian Zmasek are now working for the HMMER 3 support. > It will be great if you can help us. > Certainly. Since my alternative is writing a parser for myself, I might as well put in my effort for the common good. Christian, is there anything in particular I could help with? I have started collecting some test cases for my own needs. /Daniel > http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer/hmmer3report.rb > http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer3.rb > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Tue, 09 Mar 2010 20:48:11 +0100 > Daniel Lundin wrote: > >> Hi, >> >> HMMER 3 is currently available as a first release candidate. With it >> comes several news both in the form of new tools and new kinds of data, >> which means output formats are changed. Is anybody working on BioRuby >> parsers for these? >> >> /D >> >> -- >> Daniel Lundin >> >> Department of Molecular Biology & Functional Genomics >> Arrhenius Laboratories for Natural Sciences >> Stockholm University, SE-106 91 Stockholm, Sweden >> >> tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 >> >> Email: daniel.lundin at molbio.su.se >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > -- Daniel Lundin Department of Molecular Biology & Functional Genomics Arrhenius Laboratories for Natural Sciences Stockholm University, SE-106 91 Stockholm, Sweden tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 Email: daniel.lundin at molbio.su.se From pjotr.public14 at thebird.nl Thu Mar 11 12:17:27 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 11 Mar 2010 18:17:27 +0100 Subject: [BioRuby] Ruby and Statistics In-Reply-To: <20100311135303.8C5201CBC41B@idnmail.gen-info.osaka-u.ac.jp> References: <2122bfdf-d902-4be1-aef2-95013cea31f6@ingm.it> <20100311135303.8C5201CBC41B@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100311171727.GD12523@thebird.nl> Hi Raoul, Biolib makes the GSL available for Ruby, as well as Rlib. So many standard statistics can be used, including linear regression, etc. If there is other libraries you want to use we can consider mapping those to Ruby (BOOST is a candidate). Main problem is that I am still in the process of documenting biolib before its release 1.0. If you are interested in using these tools, we can work it out between us. Just tell me what functions you want, and I'll help map/document them. Be great for Biolib - as testing is a good thing. Pj. On Thu, Mar 11, 2010 at 10:53:02PM +0900, Naohisa GOTO wrote: > Hi, > > I found some modules, but I haven't used them. > > math-statistics: http://www.notwork.org/~gotoken/ruby/p/statistics/ > > statarray: http://rubyforge.org/projects/statarray/ > > ruby-stats: http://pallas.telperion.info/ruby-stats/ > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Thu, 11 Mar 2010 14:02:23 +0100 > "Raoul Bonnal" wrote: > > > Hello Folks, > > I need to do statistical computations in Ruby, some time very basic operations like mean and stdv > > Which library do you suggest ? > > I don't want to use rsruby (R), for now. Er extend every time Array. > > > > I found this: ruby-statsample but I don't know if is the best one. > > > > -- > > Raoul J.P. Bonnal > > Life Science Informatics > > Integrative Biology Program > > Fondazione INGM > > Via F. Sforza 28 > > 20122 Milano, IT > > phone: +39 0 200 662 326 > > fax: +39 0 200 662 346 > > http://www.ingm.it > > > > > > > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From rutgeraldo at gmail.com Mon Mar 15 08:27:27 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Mon, 15 Mar 2010 12:27:27 +0000 Subject: [BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC In-Reply-To: <2bb9b24a1003110222h4bd642adv31d1975c9edc0bba@mail.gmail.com> References: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> <9081A9B5-611C-45C2-A099-44BAF1E524F4@hgc.jp> <2bb9b24a1003110222h4bd642adv31d1975c9edc0bba@mail.gmail.com> Message-ID: <2bb9b24a1003150527p439c135dm1a164e6a5218835f@mail.gmail.com> To follow up along more practical lines, I've had to deal with similar design issues in Bio::Phylo (perl), TreeBASE and Mesquite (both java). I've learned it makes sense to have: - a simple "annotation" object, with getters and setters for the predicate namespace uri, the predicate string, and the value object (either a literal or a uri), - a get_annotations method for all (fundamental) data objects in the toolkit that returns a collection of these annotation object this way, when you serialize any bioruby object into rdf, you can add as many other statements about that object as you want. Would a refactoring along those lines have a chance of being acceptable to the bioruby community (of course subsequent to a more detailed RFC, testing, discussion, proof of concept, etc.)? On Thursday, March 11, 2010, Rutger Vos wrote: > Hi Toshiaki, > > great to hear there's already been a lot of discussion over this. > (Well, I'd be surprised if there hadn't been :)) > > It looks to me like some fairly major bookkeeping would need to be > implemented high up in the inheritance tree if *all* bioruby objects > are to be serialized into RDF. It also would require all of bioruby to > be ontologized in one fell swoop. > > It is perhaps more likely that subdomains are going to be ontologized > more or less independently from one another (as you mention, > references->RDF, or in my case phylogenetics->RDF) based implicitly on > intermediate data formats (pubmed records and nexml, respectively). > > That is probably OK, we do things as needs arise. > > But what would be handy if the API was at least general enough so that > this was extensible and we can make additional statements *about* > objects when we serialize them to RDF. For example, in your pubmed > turtle file, the subject is always > . Is there a way, > programmatically, where I can add additional statements about > ? > > Rutger > > On Wed, Mar 10, 2010 at 2:21 PM, Toshiaki Katayama wrote: >> Hi Rutger, >> >> Thank you for your inputs on GSoC 2010! >> >>> * is there a way to express triples in BioRuby? >>> * if there is not, what would be a good design to express triples in >>> BioRuby so that this would be more useful than just for NeXML? >> >> This is what we discussed during the pre-BioHackathon 2010. >> >> http://hackathon3.dbcls.jp/wiki/BioRuby >> >> My first idea was to make all BioRuby object have common output >> method to render the object contents in various formats >> (such as RDF/XML, Turtle, HTML, GFF, FASTA etc. if appropriate). >> >> Then, we tried to separate view from logic using erb, but as you >> see in the above page, it still looks ugly. It is mainly because >> view formatting itself requires some additional codes, specific >> to each format. >> >> Therefore, we don't have a solid conclusion on this yet, unfortunately. >> >> Anyway, we already have PubMed to RDF converter written in Ruby as >> the TogoWS REST API (http://togows.dbcls.jp/site/en/rest.html) at >> >> http://togows.dbcls.jp/entry/pubmed/16381885 >> --> http://togows.dbcls.jp/entry/pubmed/16381885.ttl >> >> and, we are also trying to support KEGG to RDF conversion in this >> framework as well. I think we can put the code in BioRuby when we finished. >> >> Your suggestions are welcome. :) >> >> Regards, >> Toshiaki >> >> On 2010/03/10, at 22:22, Rutger Vos wrote: >> >>> Dear BioRuby-ites, >>> >>> my apologies that my first email to this list is so long and >>> tangential. I am trying to find out how to express RDF triples in >>> BioRuby. In this email I'm explaining why I care enough to try to get >>> funding for someone to work on this. If you don't care about any of >>> this, you can stop reading now. >>> >>> The National Evolutionary Synthesis Center (NESCent.org) is planning >>> to be a mentoring organization for the Google Summer of Code 2010. I >>> have submitted a project idea to this: to develop NeXML I/O and - >>> probably more importantly for you - RDF capabilities for BioRuby. If >>> funded, a student/coder will work on this full time over the summer, >>> under the shared supervision of Jan Aerts and myself. Here is the >>> link: http://tinyurl.com/biorubynexml >>> >>> NeXML is a data format for phylogenetic data that can be read and >>> written in perl, python, java and (to some extent) c++ and javascript. >>> RDF is the cool "new" thing (as per BioHackathon2010), but as far as I >>> can tell BioRuby isn't completely up to speed for it, yet. >>> >>> (As an aside: you might ask yourself why there is something like NeXML >>> when there is PhyloXML for BioRuby. The answer is that NeXML solves a >>> different problem: PhyloXML started essentially as a next generation >>> of New Hampshire eXtended (NHX) to meet the annotation needs of >>> comparative genomics, things such as gene duplications and other >>> molecular evolution events, on phylogenetic trees; NeXML started as a >>> complete XML representation of the NEXUS format, providing other >>> comparative data types such as categorical and continuous character >>> state matrices, restriction site matrices, and so on, in addition to >>> trees, taxa, sequence alignments. There is obviously some overlap >>> between the formats, but I guess that is not unique in bioinformatics >>> :)) >>> >>> NeXML has a semantic annotation facility that uses RDFa. This allows >>> us to add additional metadata to a fundamental phylogenetic data >>> object (a tree, taxon, character, etc.) to form a "triple": the >>> fundamental data object is the triple Subject, and the Predicate and >>> Object are added as RDFa attributes. Since NeXML can be transformed >>> using a standard XSL stylesheet to RDF/XML, we can express a limitless >>> number of statements about phylogenetics. H -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From ngoto at gen-info.osaka-u.ac.jp Fri Mar 19 01:18:41 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 19 Mar 2010 14:18:41 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> Begin forwarded message: Date: Thu, 18 Mar 2010 17:02:32 -0500 From: Chris Fields To: open-bio-l at lists.open-bio.org Subject: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! (forwarding to the Open-Bio list, as the original post is still clearing the OBF mail filters) Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2010 Administrator _______________________________________________ Open-Bio-l mailing list Open-Bio-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/open-bio-l From k.hayashi.info at gmail.com Tue Mar 23 08:20:52 2010 From: k.hayashi.info at gmail.com (Kazuhiro Hayashi) Date: Tue, 23 Mar 2010 21:20:52 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Hi, all My name is Kazuhiro Hayashi. I'm a 1st-year master's degree student at Depertment of Computational Biology, Graduate School of Frontier Sciences, the University of Tokyo. The reason why I sent this mail is to ask you some questions about Google Summer of Code 2010. I'm interested in Google Summer of Code 2010, Especially, the projects about BioRuby. At the moment, I will apply "Ruby 1.9.2 support of BioRuby and I'd like to contribute BioRuby community through Google Summer of Code 2010. So, I have three questions. Could you answer them? One is about differences between Ruby 1.8.x and 1.9.2 OBF's GSoC page says that the participant needs to know Ruby 1.9.2. Until now, I've used only Ruby 1.8.7 and never used Ruby 1.9.2. Honestly, I hardly know differences between Ruby 1.8.x and Ruby 1.9.2. Can I join this project? Another is how many programs in BioRuby run on Ruby 1.9.2. Could you tell me weather you have already known it or not (and how to know it)? The other is implementation of the unit tests. Does this mean that the participant needs to implement unit tests for all codes which haven't had them yet. Currently, These are all my questions about GSoC 2010. If you have some advice for the applicants, please send a reply to this mailing list. Thank you very much for reading my broken English. :-) Best regards 2010/3/19 Naohisa GOTO : > Begin forwarded message: > > Date: Thu, 18 Mar 2010 17:02:32 -0500 > From: Chris Fields > To: open-bio-l at lists.open-bio.org > Subject: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! > > > (forwarding to the Open-Bio list, as the original post is still clearing the OBF mail filters) > > Hi all, > > Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! > > GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo > > Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. > > For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. > > Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! > > Rob Buels > OBF GSoC 2010 Administrator > > > > _______________________________________________ > Open-Bio-l mailing list > Open-Bio-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > -- ??? Kazuhiro Hayashi Department of Computational Biology, The University of Tokyo email: k_hayashi at cb.k.u-tokyo.ac.jp tel: 04-7136-3988 From biopython at maubp.freeserve.co.uk Tue Mar 23 09:20:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Mar 2010 13:20:57 +0000 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <320fb6e01003230620l58717628t4d12f67411805c48@mail.gmail.com> On Tue, Mar 23, 2010 at 12:20 PM, Kazuhiro Hayashi wrote: > Hi, all > > My name is Kazuhiro Hayashi. > I'm a 1st-year master's degree student at Depertment of Computational > Biology, Graduate School of Frontier Sciences, the University of > Tokyo. > > The reason why I sent this mail is to ask you some questions about > Google Summer of Code 2010. > > ... > > Thank you very much for reading my broken English. :-) Hello Hayashi-san, I don't know if the BioRuby team have any preference for which language the Google Summer of Code projects will be discussed in (English and/or Japanese). It will probably depend on the mentors. However, there is also a Japanese BioRuby mailing list: http://lists.open-bio.org/mailman/listinfo/bioruby-ja Peter (@Biopython) From ngoto at gen-info.osaka-u.ac.jp Tue Mar 23 11:21:33 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 24 Mar 2010 00:21:33 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> Hi Kazuhiro, On Tue, 23 Mar 2010 21:20:52 +0900 Kazuhiro Hayashi wrote: > Hi, all > > My name is Kazuhiro Hayashi. > I'm a 1st-year master's degree student at Depertment of Computational > Biology, Graduate School of Frontier Sciences, the University of > Tokyo. > > The reason why I sent this mail is to ask you some questions about > Google Summer of Code 2010. > > I'm interested in Google Summer of Code 2010, Especially, the projects > about BioRuby. > At the moment, I will apply "Ruby 1.9.2 support of BioRuby and I'd > like to contribute BioRuby community through Google Summer of Code > 2010. > So, I have three questions. > Could you answer them? > > One is about differences between Ruby 1.8.x and 1.9.2 > OBF's GSoC page says that the participant needs to know Ruby 1.9.2. > Until now, I've used only Ruby 1.8.7 and never used Ruby 1.9.2. > Honestly, I hardly know differences between Ruby 1.8.x and Ruby 1.9.2. > Can I join this project? Yes. You will need to study about them during the project, but not now. I've modified the "needed skills" in the project wiki page to clarify the point. > Another is how many programs in BioRuby run on Ruby 1.9.2. > Could you tell me weather you have already known it or not (and how to know it)? I don't know much. Some programs worked, but some didn't. > The other is implementation of the unit tests. > Does this mean that the participant needs to implement unit tests for > all codes which haven't had them yet. Yes or no, depends on planning. One idea is to implement almost all with rough coding, and to improve them after that. I also think that classes and modules that strongly depend on external program or web service can be skipped. > Currently, These are all my questions about GSoC 2010. > > If you have some advice for the applicants, please send a reply to > this mailing list. > > Thank you very much for reading my broken English. :-) > > Best regards > > > 2010/3/19 Naohisa GOTO : > > Begin forwarded message: > > > > Date: Thu, 18 Mar 2010 17:02:32 -0500 > > From: Chris Fields > > To: open-bio-l at lists.open-bio.org > > Subject: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! > > > > > > (forwarding to the Open-Bio list, as the original post is still clearing the OBF mail filters) > > > > Hi all, > > > > Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! > > > > GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo > > > > Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. > > > > For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. > > > > Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! > > > > Rob Buels > > OBF GSoC 2010 Administrator > > > > > > > > _______________________________________________ > > Open-Bio-l mailing list > > Open-Bio-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > > > > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > > > -- > ??? > Kazuhiro Hayashi > Department of Computational Biology, The University of Tokyo > email: k_hayashi at cb.k.u-tokyo.ac.jp > tel: 04-7136-3988 > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From ngoto at gen-info.osaka-u.ac.jp Wed Mar 24 10:22:23 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 24 Mar 2010 23:22:23 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <320fb6e01003230620l58717628t4d12f67411805c48@mail.gmail.com> References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <320fb6e01003230620l58717628t4d12f67411805c48@mail.gmail.com> Message-ID: <20100324142225.08B501CBC3D0@idnmail.gen-info.osaka-u.ac.jp> Hi, The objective of the project is software development. I think it is OK to use Japanese for communicating with Japanese-speaking mentors. Using the bioruby-ja mailing list for discussion seems good. Students still need to write application form in English required by Google. It would be great if someone can help English proofreading for ESL students. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Tue, 23 Mar 2010 13:20:57 +0000 Peter wrote: > On Tue, Mar 23, 2010 at 12:20 PM, Kazuhiro Hayashi > wrote: > > Hi, all > > > > My name is Kazuhiro Hayashi. > > I'm a 1st-year master's degree student at Depertment of Computational > > Biology, Graduate School of Frontier Sciences, the University of > > Tokyo. > > > > The reason why I sent this mail is to ask you some questions about > > Google Summer of Code 2010. > > > > ... > > > > Thank you very much for reading my broken English. :-) > > Hello Hayashi-san, > > I don't know if the BioRuby team have any preference for which > language the Google Summer of Code projects will be discussed > in (English and/or Japanese). It will probably depend on the mentors. > > However, there is also a Japanese BioRuby mailing list: > http://lists.open-bio.org/mailman/listinfo/bioruby-ja > > Peter > (@Biopython) > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From k.hayashi.info at gmail.com Wed Mar 24 10:35:21 2010 From: k.hayashi.info at gmail.com (Kazuhiro Hayashi) Date: Wed, 24 Mar 2010 23:35:21 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Hi. Thank you for your replies. I'd like to communicate with you on this mailing list (and I will write e-mails in English as much as possible ). :- ) However, If I should do it on somewhere else, I will do so. I'm not sure where is the best place to talk about GSoC 2010. Anyway, I appriciate your advice. By the way, I have one more question. Could you tell me how much I have to write the proposal concretely? I have to write how to implement the programs and when I write each? Best regards Kazuhiro 2010/3/23 Peter : > On Tue, Mar 23, 2010 at 12:20 PM, Kazuhiro Hayashi > wrote: >> Hi, all >> >> My name is Kazuhiro Hayashi. >> I'm a 1st-year master's degree student at Depertment of Computational >> Biology, Graduate School of Frontier Sciences, the University of >> Tokyo. >> >> The reason why I sent this mail is to ask you some questions about >> Google Summer of Code 2010. >> >> ... >> >> Thank you very much for reading my broken English. :-) > > Hello Hayashi-san, > > I don't know if the BioRuby team have any preference for which > language the Google Summer of Code projects will be discussed > in (English and/or Japanese). It will probably depend on the mentors. > > However, there is also a Japanese BioRuby mailing list: > http://lists.open-bio.org/mailman/listinfo/bioruby-ja > > Peter > (@Biopython) > 2010?3?24?0:21 Naohisa GOTO : > Hi Kazuhiro, > > On Tue, 23 Mar 2010 21:20:52 +0900 > Kazuhiro Hayashi wrote: > >> Hi, all >> >> My name is Kazuhiro Hayashi. >> I'm a 1st-year master's degree student at Depertment of Computational >> Biology, Graduate School of Frontier Sciences, the University of >> Tokyo. >> >> The reason why I sent this mail is to ask you some questions about >> Google Summer of Code 2010. >> >> I'm interested in Google Summer of Code 2010, Especially, the projects >> about BioRuby. >> At the moment, I will apply "Ruby 1.9.2 support of BioRuby and I'd >> like to contribute BioRuby community through Google Summer of Code >> 2010. >> So, I have three questions. >> Could you answer them? >> >> One is about differences between Ruby 1.8.x and 1.9.2 >> OBF's GSoC page says that the participant needs to know Ruby 1.9.2. >> Until now, I've used only Ruby 1.8.7 and never used Ruby 1.9.2. >> Honestly, I hardly know differences between Ruby 1.8.x and Ruby 1.9.2. >> Can I join this project? > > Yes. > You will need to study about them during the project, but not now. > I've modified the "needed skills" in the project wiki page > to clarify the point. > >> Another is how many programs in BioRuby run on Ruby 1.9.2. >> Could you tell me weather you have already known it or not (and how to know it)? > > I don't know much. Some programs worked, but some didn't. > >> The other is implementation of the unit tests. >> Does this mean that the participant needs to implement unit tests for >> all codes which haven't had them yet. > > Yes or no, depends on planning. One idea is to implement > almost all with rough coding, and to improve them after that. > I also think that classes and modules that strongly depend > on external program or web service can be skipped. > >> Currently, These are all my questions about GSoC 2010. >> >> If you have some advice for the applicants, please send a reply to >> this mailing list. >> >> Thank you very much for reading my broken English. :-) >> >> Best regards >> >> >> 2010/3/19 Naohisa GOTO : >> > Begin forwarded message: >> > >> > Date: Thu, 18 Mar 2010 17:02:32 -0500 >> > From: Chris Fields >> > To: open-bio-l at lists.open-bio.org >> > Subject: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! >> > >> > >> > (forwarding to the Open-Bio list, as the original post is still clearing the OBF mail filters) >> > >> > Hi all, >> > >> > Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! >> > >> > GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo >> > >> > Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. >> > >> > For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. >> > >> > Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! >> > >> > Rob Buels >> > OBF GSoC 2010 Administrator >> > >> > >> > >> > _______________________________________________ >> > Open-Bio-l mailing list >> > Open-Bio-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/open-bio-l >> > >> > >> > _______________________________________________ >> > BioRuby Project - http://www.bioruby.org/ >> > BioRuby mailing list >> > BioRuby at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioruby >> > >> >> >> -- >> ??? >> Kazuhiro Hayashi >> Department of Computational Biology, The University of Tokyo >> email: k_hayashi at cb.k.u-tokyo.ac.jp >> tel: 04-7136-3988 >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > -- Kazuhiro Hayashi Department of Computational Biology, The University of Tokyo email: k_hayashi at cb.k.u-tokyo.ac.jp tel: 04-7136-3988 From biopython at maubp.freeserve.co.uk Wed Mar 24 10:51:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:51:46 +0000 Subject: [BioRuby] [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> Message-ID: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: > > On Mar 24, 2010, at 9:08 AM, Peter wrote: > >> Hi, >> >> This is probably of interest to all the Bio* projects offering access >> to the NCBI Entrez utilities. See forwarded message below. >> >> I *think* the new guidelines basically say that the email & tool parameters are >> optional BUT if your IP address ever gets banned for excessive use you then >> have to register an email & tool combination. >> >> Regarding the email address, the NCBI say to use the email of the developer >> (not the end user). However, they do not distinguish between the developers >> of a library (like us), and the developers of an application or script using a >> library (who may also be the end user). >> >> Currently we (Biopython) and I think BioPerl ask developers using our libraries >> to populate the email address themselves. I *think* this is still the >> right action. >> >> Peter > > > Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I > think with the SOAP-based ones as well). ?We're providing a specific set of > tools for user to write up their own applications end applications. ?I can try > contacting them regarding this to get an official response to clarify this > somewhat. Please give the NCBI an email - you can CC me too if you like. > Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a > default, but always leave the email blank and issue a warning if it isn't > set. ?We could just as easily leave both blank and issue warnings for both. We currently leave out the email and set the tool parameter to "Biopython" by default but this can be overridden. Currently leaving out the email does cause Biopython to give a warning. Peter From hlapp at drycafe.net Wed Mar 24 11:27:37 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Mar 2010 11:27:37 -0400 Subject: [BioRuby] [Open-bio-l] [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> On Mar 24, 2010, at 10:51 AM, Peter wrote: > Please give the NCBI an email - you can CC me too if you like. Can't this be the developers' mailing list (or lists, the appropriate one for each toolkit)? We can even whitelist all NCBI sender addresses so they can easily email us if there are issues. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From k.hayashi.info at gmail.com Thu Mar 25 13:31:07 2010 From: k.hayashi.info at gmail.com (Kazuhiro Hayashi) Date: Fri, 26 Mar 2010 02:31:07 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Hi, Thank you for your replies. I'd like to communicate with you on this mailing list (and I will write e-mails in English as much as possible ). :- ) However, If I should do it on somewhere else, I will do so. I'm not sure where is the best place to talk about GSoC 2010. Anyway, I appreciate your advice. By the way, I have one more question. Could you tell me how much I have to write the proposal concretely? I have to write how to implement the programs and when I write each? Best regards Kazuhiro ( I'm sorry if you have already received the same mail. I sent it yesterday, but I haven't received yet....) -- ??? Kazuhiro Hayashi Department of Computational Biology, The University of Tokyo email: k_hayashi at cb.k.u-tokyo.ac.jp tel: 04-7136-3988 From czmasek at burnham.org Thu Mar 25 20:39:42 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Thu, 25 Mar 2010 17:39:42 -0700 Subject: [BioRuby] HMMER 3 parsers? In-Reply-To: <4B9917D1.9000702@molbio.su.se> References: <4B96A5FB.9060607@molbio.su.se> <20100311141250.789AA1CBC58F@idnmail.gen-info.osaka-u.ac.jp> <4B9917D1.9000702@molbio.su.se> Message-ID: <4BAC024E.6000009@burnham.org> Hi, Daniel: Sorry for the late reply, for some reasons my email reader suddenly sorts messages wrongly. In any case, the parser for hmmer3 hmmscan and hmmsearch is basically finished. So, if I could somehow get access to your test cases, that would be great! Thank you! Christian Daniel Lundin wrote: > Naohisa GOTO skrev: >> Hi, >> >> Christian Zmasek are now working for the HMMER 3 support. >> It will be great if you can help us. >> > Certainly. Since my alternative is writing a parser for myself, I might > as well put in my effort for the common good. > > Christian, is there anything in particular I could help with? I have > started collecting some test cases for my own needs. > > /Daniel > >> http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer/hmmer3report.rb >> http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer3.rb >> >> Naohisa Goto >> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org >> >> On Tue, 09 Mar 2010 20:48:11 +0100 >> Daniel Lundin wrote: >> >>> Hi, >>> >>> HMMER 3 is currently available as a first release candidate. With it >>> comes several news both in the form of new tools and new kinds of data, >>> which means output formats are changed. Is anybody working on BioRuby >>> parsers for these? >>> >>> /D >>> >>> -- >>> Daniel Lundin >>> >>> Department of Molecular Biology & Functional Genomics >>> Arrhenius Laboratories for Natural Sciences >>> Stockholm University, SE-106 91 Stockholm, Sweden >>> >>> tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 >>> >>> Email: daniel.lundin at molbio.su.se >>> _______________________________________________ >>> BioRuby Project - http://www.bioruby.org/ >>> BioRuby mailing list >>> BioRuby at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioruby > > From ngoto at gen-info.osaka-u.ac.jp Fri Mar 26 08:43:38 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 26 Mar 2010 21:43:38 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100326124339.A02641CBC50D@idnmail.gen-info.osaka-u.ac.jp> Hi, It is generally good to write many specific details. However, the most important thing now is whether the proposal is accepted by Google. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Fri, 26 Mar 2010 02:31:07 +0900 Kazuhiro Hayashi wrote: > Hi, > > Thank you for your replies. > > I'd like to communicate with you on this mailing list (and I will > write e-mails in English as much as possible ). :- ) > However, If I should do it on somewhere else, I will do so. > I'm not sure where is the best place to talk about GSoC 2010. > Anyway, I appreciate your advice. > > > By the way, I have one more question. > Could you tell me how much I have to write the proposal concretely? > I have to write how to implement the programs and when I write each? > > Best regards > > Kazuhiro > > ( I'm sorry if you have already received the same mail. I sent it > yesterday, but I haven't received yet....) > > -- > ??? > Kazuhiro Hayashi > Department of Computational Biology, The University of Tokyo > email: k_hayashi at cb.k.u-tokyo.ac.jp > tel: 04-7136-3988 From k.hayashi.info at gmail.com Fri Mar 26 11:21:41 2010 From: k.hayashi.info at gmail.com (Kazuhiro Hayashi) Date: Sat, 27 Mar 2010 00:21:41 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <20100326124339.A02641CBC50D@idnmail.gen-info.osaka-u.ac.jp> References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> <20100326124339.A02641CBC50D@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Hi Goto-san, > It is generally good to write many specific details. > However, the most important thing now is whether the proposal > is accepted by Google. Is it possible to show you a draft of my proposal? I'd like you to proofread my proposal before the deadline for application. Best regards Kazuhiro 2010?3?26?21:43 Naohisa GOTO : > Hi, > > It is generally good to write many specific details. > However, the most important thing now is whether the proposal > is accepted by Google. > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Fri, 26 Mar 2010 02:31:07 +0900 > Kazuhiro Hayashi wrote: > >> Hi, >> >> Thank you for your replies. >> >> I'd like to communicate with you on this mailing list (and I will >> write e-mails in English as much as possible ). :- ) >> However, If I should do it on somewhere else, I will do so. >> I'm not sure where is the best place to talk about GSoC 2010. >> Anyway, I appreciate your advice. >> >> >> By the way, I have one more question. >> Could you tell me how much I have to write the proposal concretely? >> I have to write how to implement the programs and when I write each? >> >> Best regards >> >> Kazuhiro >> >> ( I'm sorry if you have already received the same mail. I sent it >> yesterday, but I haven't received yet....) >> >> -- >> ??? >> Kazuhiro Hayashi >> Department of Computational Biology, The University of Tokyo >> email: k_hayashi at cb.k.u-tokyo.ac.jp >> tel: 04-7136-3988 > > -- Kazuhiro Hayashi Department of Computational Biology, The University of Tokyo email: k_hayashi at cb.k.u-tokyo.ac.jp tel: 04-7136-3988 From czmasek at burnham.org Fri Mar 26 14:26:54 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Fri, 26 Mar 2010 11:26:54 -0700 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer o Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> <20100326124339.A02641CBC50D@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4BACFC6E.4010303@burnham.org> Hi, Re. "Is it possible to show you a draft of my proposal?" I think this is not only possible, it is highly recommended. From my experience, a detailed, well written, and realistic proposal is very important. Remember, not all projects will get accepted (currently, OBF has 14 projects, I would be very surprised if more than half would get accepted at the end). The better a student's proposal, the more likely it is that the project will get accepted. Christian Kazuhiro Hayashi wrote: > Hi Goto-san, > >> It is generally good to write many specific details. >> However, the most important thing now is whether the proposal >> is accepted by Google. > > Is it possible to show you a draft of my proposal? > I'd like you to proofread my proposal before the deadline for application. > > Best regards > > Kazuhiro > > 2010?3?26?21:43 Naohisa GOTO : >> Hi, >> >> It is generally good to write many specific details. >> However, the most important thing now is whether the proposal >> is accepted by Google. >> >> Naohisa Goto >> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org >> >> On Fri, 26 Mar 2010 02:31:07 +0900 >> Kazuhiro Hayashi wrote: >> >>> Hi, >>> >>> Thank you for your replies. >>> >>> I'd like to communicate with you on this mailing list (and I will >>> write e-mails in English as much as possible ). :- ) >>> However, If I should do it on somewhere else, I will do so. >>> I'm not sure where is the best place to talk about GSoC 2010. >>> Anyway, I appreciate your advice. >>> >>> >>> By the way, I have one more question. >>> Could you tell me how much I have to write the proposal concretely? >>> I have to write how to implement the programs and when I write each? >>> >>> Best regards >>> >>> Kazuhiro >>> >>> ( I'm sorry if you have already received the same mail. I sent it >>> yesterday, but I haven't received yet....) >>> >>> -- >>> ??? >>> Kazuhiro Hayashi >>> Department of Computational Biology, The University of Tokyo >>> email: k_hayashi at cb.k.u-tokyo.ac.jp >>> tel: 04-7136-3988 >> > > > From sararayburn at gmail.com Sat Mar 27 16:13:01 2010 From: sararayburn at gmail.com (Sara Rayburn) Date: Sat, 27 Mar 2010 15:13:01 -0500 Subject: [BioRuby] GSOC 2010 preliminary proposal question Message-ID: Hello all. My name is Sara Rayburn. I'm a doctoral student at the University of Louisiana at Lafayette. I am planning to submit a proposal to implement the speciation/duplication inference algorithm this summer. I'd like to tackle both the implementation and the extension to non-binary trees. In reading the posted reference on reconciliation in non-binary trees, there are two types of duplications referenced, required and conditional duplications. In an implementation of this approach, would it be better to identify only required duplications and clear speciations, or should there be an additional distinction for the conditional duplications? I hope to post a preliminary project plan and proposal for feedback in the next couple of days. Thanks in advance for your feedback. Sara Rayburn University of Louisiana at Lafayette sararayburn at gmail.com From czmasek at burnham.org Mon Mar 29 19:32:12 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 29 Mar 2010 16:32:12 -0700 Subject: [BioRuby] Beta application for review: BioRuby - Simple duplication inference implementation In-Reply-To: References: Message-ID: <4BB1387C.6090503@burnham.org> Hi, Jure: Your application seems to be on the right way. In general, your time table needs to be more detailed. For each step you should list: 1. Goal/deliverable (you have that) 2. Approach 3. Time estimation (you have that) 4. Anticipated problems & possible alternative approaches Some more comments: > > *The idea:* > > We would implement the simple and fast duplication inference algorithm > described by Zmasek and Eddy (Zmasek and Eddy, 2001, "A simple algorithm > to infer gene duplication and speciation events on a gene tree". Finding > gene duplications is an extremely important part of bioinformatics and > biomedical research, as duplications are thought to be powerful drivers > in the evolution of new protein function. I think 'extremely important part of bioinformatics' is a somewhat of an exaggeration and too vague. Better write about how gene duplications complicate efforts on gene function prediction, and their significance in (the theory of) molecular evolution. > It is thus important to find > gene duplication sequences, which when translated are more likely to be > functionally different, and distinguish them from gene speciation > sequences, which are more likely functionally equivalent. 'gene duplication sequences' should be 'genes related by a duplication' or similar. 'gene speciation sequence' should be 'genes related by a speciation' or similar. > Currently the algorithm supports rooted fully binary trees and we would > like to change that, by also implementing support for unrooted and > non-binary trees. Goals are like this: 1. Implement algorithm as it is 2. Allow rooting of unrooted gene trees by minimizing sum of duplications. Optional: 3. Extend algorithm to work on non-binary species trees 4. Extend algorithm to work on non-binary gene trees > > *The work:* > > There are several milestones to be reached in developing this idea and > this is the work plan I propose: > 1. Development of unit tests with known species and gene trees (1 week). > > 2. Making or reusing necessary data structures, made easier by last > years GSoC contribution implementing phyloXML in BioRuby (1/2 weeks - 1 > week): > - gene tree, > - species tree, > - tree node, > - children(), > - parent(). > > 3. Developing checks for the correctness of input data for rooted fully > binary trees SDI (1/2 weeks - 1 week): > - making sure trees are rooted and binary, > - all species/gene tree nodes have at least on type of taxonomic data. > - making a taxonomy base from a type of data present in all nodes > (scientific or common name, taxonomy code, id), > - making sure taxonomic data is unique throughout external nodes. > 4. Implementation of the recursive M function (1 week) > - traverse the gene tree in postorder (left subtree, right subtree, root), > - finding occurrences where M(parent) equals M(child 1 or 2) - this is > representative for finding a duplication. If M(parent) matches neither, > the processed node is a speciation. > > 5. Milestone - finished implementation of SDI for rooted fully binary > trees (1/2 week): > - Extensive testing, > - cleaning up. > > 6. Working on unrooted non-binary trees implementation (4-8 weeks): > - Look to the forester java library SDI module for insight (by the > mentor of this project, Zmasek), > - Doing some heavy lifting, > - at this point I consider this implementation a possible pitfall, > because of substantially increased complexity. This needs to much more detailed. Species trees are always rooted. Unrooted gene trees can be handled naively by rooting them in all possible places, and running the SDI algorithm on each differently rooted tree, and keeping the gene tree which has the lowest number of duplications. A more efficient approach for this is described in: Zmasek and Eddy (2002). RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics. 2002 May 16;3:14. See: http://evogsoc2010.wordpress.com/2010/03/25/references-for-gene-duplications-proposal/ > > 7. Finishing up (1 week): > - Extensive testing, > - cleaning up. > > *Why me?:* > > I like to set foot on unknown territory and challenge myself constantly. > That being said, I have long searched for something that would connect > my love of medicine to my love of programming, and now, thanks to GSoC > and OBF, I think I found it - bioinformatics. I am at a stage of my > medical study, where I have to decide what my future will entail, and I > am (now, after thinking about it for a long time) positive that > bioinformatics will be a big part of it. What better way to get future > off to a good start, than with a Google Summer of Code project? Based on > this enthusiasm alone you can be assured that I'll work really hard on > this project and that I will be happy to see it done. As this would be > my first serious open source engagement, you also have a chance of > forming a completely new addition to the open source world and making an > excellent contributor out of me. > > *Previous experience:* > > 1. I have been working on a simulation of an analytical chemistry method > for the past 2 years now, more specifically we have modeled laser > ablation + inductively coupled plasma mass spectrometry with a simple > model, which aids our elemental mapping projects. For the write-up of > this project I have been awarded with a "Pre?ernovo priznanje" in 2008 > (PDF upon request). This work entails several interesting components, > from basics such as: C# development, image input, output, multi-threaded > programming, UI development; to complex themes such as: genetic > algorithms and neural networks. All of which I learned as we worked on > the project without much hassle (source code upon request). This work is > not yet open source, because we are in the finalizing stages of the > paper and will release the source code after publication under an open > source license. > > 2. I have programmed since I was a child and I have developed a wide > specter of things in my lifetime (from a full CMS in PHP to an IRC > robot, source code upon request), but I have little experience in fully > open source projects, which I think so highly of. > > *Biography:* > > My name is Jure Triglav and I'm a 24 year old medical student from > Ljubljana, Slovenia. I was born in a small town of Murska Sobota in > Slovenia, where I went to grade school (graded excellent for all years, > awarded "Zoisova ?tipendija" for the gifted, which I still hold) and > high-school (excellent, finished as "Zlati maturant" in the company of > about 200 best students in the country). I moved to Ljubljana in 2004 to > study medicine. I am now in the last year of my medical study which I > find challenging and very interesting. > My hobbies are all over the place, from book design to photography, from > web design to typography, from guitar to poetry, from reading to > programming, from traveling to sports. > > > > *Other obligations for the summer:* > > I have 5-hour daily clinical practice every weekday in June, July and > August, which is not nearly as serious as it sounds, especially since > this is the summer rotation which is known for its laid back feel. These > practice start at 8 am and finish at 1 pm, and for students are not > really stressful or exhausting at all. I have in the past juggled many > research obligations with clinical practice and my studies without > hiccups, but I will not do this this summer and will dedicate 8 hours > daily to Google Summer of Code, as I realize what a great opportunity > this is and how much work is required. I have no other work, research or > vacation obligations for the period of Google Summer of Code. Neverthelessm, this sounds like a serious concern. > > *Contact information: * > > (I will provide additional contact information in the final application) > Name: Jure Triglav > E-mail: juretriglav at gmail.com > IRC handle: x` on #obf-soc, #gsoc > From czmasek at burnham.org Mon Mar 29 19:39:29 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 29 Mar 2010 16:39:29 -0700 Subject: [BioRuby] Google summer of code 2010 - Stathis Kamperis In-Reply-To: <2218b9af1003290119q1c6b2eeclc3c84ffdbaa97b2a@mail.gmail.com> References: <2218b9af1003290119q1c6b2eeclc3c84ffdbaa97b2a@mail.gmail.com> Message-ID: <4BB13A31.8020203@burnham.org> Hi, Stathis: Thank you for your interest in this proposal! Stathis Kamperis wrote: > Dear Dr. Zmasek, > > my name is Stathis Kamperis and I'm interested in this year's Google > Summer of Code project: > "Implementation of algorithm to infer gene duplications in BioRuby". > > I am a medicine graduate, physics undergraduate and computer > enthusiast. I come from Greece and I am 26 years old. > I have a long standing programming experience with a vast range of > programming languages including, since recently, Ruby. > I also have a decent molecular/biology background. > > I successfully participated in last years Google Summer of Code > working for the DragonFlyBSD[1] organisation. My work had to do with > POSIX standard conformance audit, regression testing and quality > assurance. > > As I understand, the project is about implementing your algorithm to > BioRuby. Is there any prototype implemented in any language/framework > at the moment ? Yes, there is: See: http://forester-atv.cvs.sourceforge.net/viewvc/forester-atv/forester-atv/java/src/org/forester/sdi/ Especially, SDI.java and SDIR.java (for unrooted trees) In your abstract you mention: > "We show empirically, using 1750 gene trees constructed from the Pfam > protein family database, that it appears to be a practical (and often > superior) algorithm for analyzing real gene trees." > So, I wonder, what does 'empirically' mean here or how did you conduct > your tests ? Essentially, my Java implementation was used to run this tests. Hope this helps, Christian From czmasek at burnham.org Mon Mar 29 20:01:10 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 29 Mar 2010 17:01:10 -0700 Subject: [BioRuby] GSOC 2010 preliminary proposal question In-Reply-To: References: Message-ID: <4BB13F46.7010607@burnham.org> Hi, Sara: Thank you for your interest in this proposal! I think focusing on 'required' duplications is appropriate, since non-binary species trees are oftentimes a means to express uncertainty in the "tree-of-life" and to prevent introduction of spurious duplications due to this. Christian Sara Rayburn wrote: > Hello all. My name is Sara Rayburn. I'm a doctoral student at the University of Louisiana at Lafayette. I am planning to submit a proposal to implement the speciation/duplication inference algorithm this summer. I'd like to tackle both the implementation and the extension to non-binary trees. In reading the posted reference on reconciliation in non-binary trees, there are two types of duplications referenced, required and conditional duplications. In an implementation of this approach, would it be better to identify only required duplications and clear speciations, or should there be an additional distinction for the conditional duplications? > > I hope to post a preliminary project plan and proposal for feedback in the next couple of days. Thanks in advance for your feedback. > > > > Sara Rayburn > University of Louisiana at Lafayette > sararayburn at gmail.com > > > > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From donttrustben at gmail.com Wed Mar 31 20:33:27 2010 From: donttrustben at gmail.com (Ben Woodcroft) Date: Thu, 1 Apr 2010 11:33:27 +1100 Subject: [BioRuby] FlatFile GFF Message-ID: Hi, I have a conceptual question for the list. When I open a gff2 file using Bio::FlatFile, the next_entry method gives me all of the lines at once (in the form of a Bio::GFF::GFF2 object). f = Bio::FlatFile.open(Bio::GFF::GFF2,"some.gff2") => Bio::FlatFile g = f.next_entry => Bio::GFF::GFF2 object g.records => array of GFF2 records To me, this seems a little counter-intuitive. I expected to get info for a single line of the GFF file from FlatFile#next_entry The other problem is that the whole file must be parsed at the beginning, and this can cause memory problems when using large GFF files (e.g. the current WormBase gff2 is 2.6GB). To get around the problem I can use File.foreach('some.gff2') and then parse each line using Bio::GFF::GFF2. I'm not sure what the situation is with other file formats. So, my question is, could we introduce a foreach method into FlatFile that iterates (without parsing all at once so it is light on memory) over the GFF/etc entries in the file? Ideally we could change next_entry, but that wouldn't be backwards compatible I don't think. Thanks, ben -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From jillyh0 at gmail.com Mon Mar 1 21:42:25 2010 From: jillyh0 at gmail.com (Jillian E Kozyra) Date: Mon, 1 Mar 2010 16:42:25 -0500 Subject: [BioRuby] Phylogenetic Trees or Hierarchical Clustering Message-ID: <9d7d43131003011342s3de1f182oacf6ce1e612a452a@mail.gmail.com> Dear Colleagues, We are working on a linguistics project in which we will calculate language similarities. From the language similarity matrix, we would like to create either a hierarchical clustering output or phylogenetic tree. We seek a pure Ruby plugin with which to do this. Could you give us some guidance? Thanks, Jillian -- 917-434-7511 http://sswl.railsplayground.net From bonnalraoul at ingm.it Mon Mar 8 13:28:16 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Mon, 8 Mar 2010 14:28:16 +0100 Subject: [BioRuby] RVM: Ruby Version Manager Message-ID: Do you know this http://rvm.beginrescueend.com/ tool for having multiple ruby environment installed at the same time ? RVM is a command line tool which allows us to easily install, manage and work with multiple ruby environments from interpreters to sets of gems. RVM itself is easy to install! I'm using it on a vm for developing and testing and it is awesome how it handles everything :-) Give it a try. -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 0 200 662 326 fax: +39 0 200 662 346 http://www.ingm.it From daniel.lundin at molbio.su.se Tue Mar 9 19:48:11 2010 From: daniel.lundin at molbio.su.se (Daniel Lundin) Date: Tue, 09 Mar 2010 20:48:11 +0100 Subject: [BioRuby] HMMER 3 parsers? Message-ID: <4B96A5FB.9060607@molbio.su.se> Hi, HMMER 3 is currently available as a first release candidate. With it comes several news both in the form of new tools and new kinds of data, which means output formats are changed. Is anybody working on BioRuby parsers for these? /D -- Daniel Lundin Department of Molecular Biology & Functional Genomics Arrhenius Laboratories for Natural Sciences Stockholm University, SE-106 91 Stockholm, Sweden tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 Email: daniel.lundin at molbio.su.se From rutgeraldo at gmail.com Wed Mar 10 13:22:48 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Wed, 10 Mar 2010 13:22:48 +0000 Subject: [BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC Message-ID: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> Dear BioRuby-ites, my apologies that my first email to this list is so long and tangential. I am trying to find out how to express RDF triples in BioRuby. In this email I'm explaining why I care enough to try to get funding for someone to work on this. If you don't care about any of this, you can stop reading now. The National Evolutionary Synthesis Center (NESCent.org) is planning to be a mentoring organization for the Google Summer of Code 2010. I have submitted a project idea to this: to develop NeXML I/O and - probably more importantly for you - RDF capabilities for BioRuby. If funded, a student/coder will work on this full time over the summer, under the shared supervision of Jan Aerts and myself. Here is the link: http://tinyurl.com/biorubynexml NeXML is a data format for phylogenetic data that can be read and written in perl, python, java and (to some extent) c++ and javascript. RDF is the cool "new" thing (as per BioHackathon2010), but as far as I can tell BioRuby isn't completely up to speed for it, yet. (As an aside: you might ask yourself why there is something like NeXML when there is PhyloXML for BioRuby. The answer is that NeXML solves a different problem: PhyloXML started essentially as a next generation of New Hampshire eXtended (NHX) to meet the annotation needs of comparative genomics, things such as gene duplications and other molecular evolution events, on phylogenetic trees; NeXML started as a complete XML representation of the NEXUS format, providing other comparative data types such as categorical and continuous character state matrices, restriction site matrices, and so on, in addition to trees, taxa, sequence alignments. There is obviously some overlap between the formats, but I guess that is not unique in bioinformatics :)) NeXML has a semantic annotation facility that uses RDFa. This allows us to add additional metadata to a fundamental phylogenetic data object (a tree, taxon, character, etc.) to form a "triple": the fundamental data object is the triple Subject, and the Predicate and Object are added as RDFa attributes. Since NeXML can be transformed using a standard XSL stylesheet to RDF/XML, we can express a limitless number of statements about phylogenetics. However, this means that any NeXML I/O library needs to be able to represent RDF triples. I have studied the BioRuby API as best as I could (but: I don't know ruby) and couldn't identify how to do this. My questions to you: * is there a way to express triples in BioRuby? * if there is not, what would be a good design to express triples in BioRuby so that this would be more useful than just for NeXML? Thank you! Rutger -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From ktym at hgc.jp Wed Mar 10 14:21:15 2010 From: ktym at hgc.jp (Toshiaki Katayama) Date: Wed, 10 Mar 2010 23:21:15 +0900 Subject: [BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC In-Reply-To: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> References: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> Message-ID: <9081A9B5-611C-45C2-A099-44BAF1E524F4@hgc.jp> Hi Rutger, Thank you for your inputs on GSoC 2010! > * is there a way to express triples in BioRuby? > * if there is not, what would be a good design to express triples in > BioRuby so that this would be more useful than just for NeXML? This is what we discussed during the pre-BioHackathon 2010. http://hackathon3.dbcls.jp/wiki/BioRuby My first idea was to make all BioRuby object have common output method to render the object contents in various formats (such as RDF/XML, Turtle, HTML, GFF, FASTA etc. if appropriate). Then, we tried to separate view from logic using erb, but as you see in the above page, it still looks ugly. It is mainly because view formatting itself requires some additional codes, specific to each format. Therefore, we don't have a solid conclusion on this yet, unfortunately. Anyway, we already have PubMed to RDF converter written in Ruby as the TogoWS REST API (http://togows.dbcls.jp/site/en/rest.html) at http://togows.dbcls.jp/entry/pubmed/16381885 --> http://togows.dbcls.jp/entry/pubmed/16381885.ttl and, we are also trying to support KEGG to RDF conversion in this framework as well. I think we can put the code in BioRuby when we finished. Your suggestions are welcome. :) Regards, Toshiaki On 2010/03/10, at 22:22, Rutger Vos wrote: > Dear BioRuby-ites, > > my apologies that my first email to this list is so long and > tangential. I am trying to find out how to express RDF triples in > BioRuby. In this email I'm explaining why I care enough to try to get > funding for someone to work on this. If you don't care about any of > this, you can stop reading now. > > The National Evolutionary Synthesis Center (NESCent.org) is planning > to be a mentoring organization for the Google Summer of Code 2010. I > have submitted a project idea to this: to develop NeXML I/O and - > probably more importantly for you - RDF capabilities for BioRuby. If > funded, a student/coder will work on this full time over the summer, > under the shared supervision of Jan Aerts and myself. Here is the > link: http://tinyurl.com/biorubynexml > > NeXML is a data format for phylogenetic data that can be read and > written in perl, python, java and (to some extent) c++ and javascript. > RDF is the cool "new" thing (as per BioHackathon2010), but as far as I > can tell BioRuby isn't completely up to speed for it, yet. > > (As an aside: you might ask yourself why there is something like NeXML > when there is PhyloXML for BioRuby. The answer is that NeXML solves a > different problem: PhyloXML started essentially as a next generation > of New Hampshire eXtended (NHX) to meet the annotation needs of > comparative genomics, things such as gene duplications and other > molecular evolution events, on phylogenetic trees; NeXML started as a > complete XML representation of the NEXUS format, providing other > comparative data types such as categorical and continuous character > state matrices, restriction site matrices, and so on, in addition to > trees, taxa, sequence alignments. There is obviously some overlap > between the formats, but I guess that is not unique in bioinformatics > :)) > > NeXML has a semantic annotation facility that uses RDFa. This allows > us to add additional metadata to a fundamental phylogenetic data > object (a tree, taxon, character, etc.) to form a "triple": the > fundamental data object is the triple Subject, and the Predicate and > Object are added as RDFa attributes. Since NeXML can be transformed > using a standard XSL stylesheet to RDF/XML, we can express a limitless > number of statements about phylogenetics. However, this means that any > NeXML I/O library needs to be able to represent RDF triples. I have > studied the BioRuby API as best as I could (but: I don't know ruby) > and couldn't identify how to do this. > > My questions to you: > > * is there a way to express triples in BioRuby? > * if there is not, what would be a good design to express triples in > BioRuby so that this would be more useful than just for NeXML? > > Thank you! > > Rutger > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From rutgeraldo at gmail.com Thu Mar 11 10:22:04 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Thu, 11 Mar 2010 10:22:04 +0000 Subject: [BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC In-Reply-To: <9081A9B5-611C-45C2-A099-44BAF1E524F4@hgc.jp> References: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> <9081A9B5-611C-45C2-A099-44BAF1E524F4@hgc.jp> Message-ID: <2bb9b24a1003110222h4bd642adv31d1975c9edc0bba@mail.gmail.com> Hi Toshiaki, great to hear there's already been a lot of discussion over this. (Well, I'd be surprised if there hadn't been :)) It looks to me like some fairly major bookkeeping would need to be implemented high up in the inheritance tree if *all* bioruby objects are to be serialized into RDF. It also would require all of bioruby to be ontologized in one fell swoop. It is perhaps more likely that subdomains are going to be ontologized more or less independently from one another (as you mention, references->RDF, or in my case phylogenetics->RDF) based implicitly on intermediate data formats (pubmed records and nexml, respectively). That is probably OK, we do things as needs arise. But what would be handy if the API was at least general enough so that this was extensible and we can make additional statements *about* objects when we serialize them to RDF. For example, in your pubmed turtle file, the subject is always . Is there a way, programmatically, where I can add additional statements about ? Rutger On Wed, Mar 10, 2010 at 2:21 PM, Toshiaki Katayama wrote: > Hi Rutger, > > Thank you for your inputs on GSoC 2010! > >> * is there a way to express triples in BioRuby? >> * if there is not, what would be a good design to express triples in >> BioRuby so that this would be more useful than just for NeXML? > > This is what we discussed during the pre-BioHackathon 2010. > > http://hackathon3.dbcls.jp/wiki/BioRuby > > My first idea was to make all BioRuby object have common output > method to render the object contents in various formats > (such as RDF/XML, Turtle, HTML, GFF, FASTA etc. if appropriate). > > Then, we tried to separate view from logic using erb, but as you > see in the above page, it still looks ugly. It is mainly because > view formatting itself requires some additional codes, specific > to each format. > > Therefore, we don't have a solid conclusion on this yet, unfortunately. > > Anyway, we already have PubMed to RDF converter written in Ruby as > the TogoWS REST API (http://togows.dbcls.jp/site/en/rest.html) at > > http://togows.dbcls.jp/entry/pubmed/16381885 > --> http://togows.dbcls.jp/entry/pubmed/16381885.ttl > > and, we are also trying to support KEGG to RDF conversion in this > framework as well. I think we can put the code in BioRuby when we finished. > > Your suggestions are welcome. :) > > Regards, > Toshiaki > > On 2010/03/10, at 22:22, Rutger Vos wrote: > >> Dear BioRuby-ites, >> >> my apologies that my first email to this list is so long and >> tangential. I am trying to find out how to express RDF triples in >> BioRuby. In this email I'm explaining why I care enough to try to get >> funding for someone to work on this. If you don't care about any of >> this, you can stop reading now. >> >> The National Evolutionary Synthesis Center (NESCent.org) is planning >> to be a mentoring organization for the Google Summer of Code 2010. I >> have submitted a project idea to this: to develop NeXML I/O and - >> probably more importantly for you - RDF capabilities for BioRuby. If >> funded, a student/coder will work on this full time over the summer, >> under the shared supervision of Jan Aerts and myself. Here is the >> link: http://tinyurl.com/biorubynexml >> >> NeXML is a data format for phylogenetic data that can be read and >> written in perl, python, java and (to some extent) c++ and javascript. >> RDF is the cool "new" thing (as per BioHackathon2010), but as far as I >> can tell BioRuby isn't completely up to speed for it, yet. >> >> (As an aside: you might ask yourself why there is something like NeXML >> when there is PhyloXML for BioRuby. The answer is that NeXML solves a >> different problem: PhyloXML started essentially as a next generation >> of New Hampshire eXtended (NHX) to meet the annotation needs of >> comparative genomics, things such as gene duplications and other >> molecular evolution events, on phylogenetic trees; NeXML started as a >> complete XML representation of the NEXUS format, providing other >> comparative data types such as categorical and continuous character >> state matrices, restriction site matrices, and so on, in addition to >> trees, taxa, sequence alignments. There is obviously some overlap >> between the formats, but I guess that is not unique in bioinformatics >> :)) >> >> NeXML has a semantic annotation facility that uses RDFa. This allows >> us to add additional metadata to a fundamental phylogenetic data >> object (a tree, taxon, character, etc.) to form a "triple": the >> fundamental data object is the triple Subject, and the Predicate and >> Object are added as RDFa attributes. Since NeXML can be transformed >> using a standard XSL stylesheet to RDF/XML, we can express a limitless >> number of statements about phylogenetics. However, this means that any >> NeXML I/O library needs to be able to represent RDF triples. I have >> studied the BioRuby API as best as I could (but: I don't know ruby) >> and couldn't identify how to do this. >> >> My questions to you: >> >> * is there a way to express triples in BioRuby? >> * if there is not, what would be a good design to express triples in >> BioRuby so that this would be more useful than just for NeXML? >> >> Thank you! >> >> Rutger >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading >> RG6 6BX >> United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://www.nexml.org >> http://rutgervos.blogspot.com >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From bonnalraoul at ingm.it Thu Mar 11 13:02:23 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Thu, 11 Mar 2010 14:02:23 +0100 Subject: [BioRuby] Ruby and Statistics Message-ID: <2122bfdf-d902-4be1-aef2-95013cea31f6@ingm.it> Hello Folks, I need to do statistical computations in Ruby, some time very basic operations like mean and stdv Which library do you suggest ? I don't want to use rsruby (R), for now. Er extend every time Array. I found this: ruby-statsample but I don't know if is the best one. -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 0 200 662 326 fax: +39 0 200 662 346 http://www.ingm.it From ngoto at gen-info.osaka-u.ac.jp Thu Mar 11 13:53:02 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 11 Mar 2010 22:53:02 +0900 Subject: [BioRuby] Ruby and Statistics In-Reply-To: <2122bfdf-d902-4be1-aef2-95013cea31f6@ingm.it> References: <2122bfdf-d902-4be1-aef2-95013cea31f6@ingm.it> Message-ID: <20100311135303.8C5201CBC41B@idnmail.gen-info.osaka-u.ac.jp> Hi, I found some modules, but I haven't used them. math-statistics: http://www.notwork.org/~gotoken/ruby/p/statistics/ statarray: http://rubyforge.org/projects/statarray/ ruby-stats: http://pallas.telperion.info/ruby-stats/ Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Thu, 11 Mar 2010 14:02:23 +0100 "Raoul Bonnal" wrote: > Hello Folks, > I need to do statistical computations in Ruby, some time very basic operations like mean and stdv > Which library do you suggest ? > I don't want to use rsruby (R), for now. Er extend every time Array. > > I found this: ruby-statsample but I don't know if is the best one. > > -- > Raoul J.P. Bonnal > Life Science Informatics > Integrative Biology Program > Fondazione INGM > Via F. Sforza 28 > 20122 Milano, IT > phone: +39 0 200 662 326 > fax: +39 0 200 662 346 > http://www.ingm.it > > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Thu Mar 11 14:12:49 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 11 Mar 2010 23:12:49 +0900 Subject: [BioRuby] HMMER 3 parsers? In-Reply-To: <4B96A5FB.9060607@molbio.su.se> References: <4B96A5FB.9060607@molbio.su.se> Message-ID: <20100311141250.789AA1CBC58F@idnmail.gen-info.osaka-u.ac.jp> Hi, Christian Zmasek are now working for the HMMER 3 support. It will be great if you can help us. http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer/hmmer3report.rb http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer3.rb Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Tue, 09 Mar 2010 20:48:11 +0100 Daniel Lundin wrote: > Hi, > > HMMER 3 is currently available as a first release candidate. With it > comes several news both in the form of new tools and new kinds of data, > which means output formats are changed. Is anybody working on BioRuby > parsers for these? > > /D > > -- > Daniel Lundin > > Department of Molecular Biology & Functional Genomics > Arrhenius Laboratories for Natural Sciences > Stockholm University, SE-106 91 Stockholm, Sweden > > tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 > > Email: daniel.lundin at molbio.su.se > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Thu Mar 11 14:59:11 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 11 Mar 2010 23:59:11 +0900 Subject: [BioRuby] Phylogenetic Trees or Hierarchical Clustering In-Reply-To: <9d7d43131003011342s3de1f182oacf6ce1e612a452a@mail.gmail.com> References: <9d7d43131003011342s3de1f182oacf6ce1e612a452a@mail.gmail.com> Message-ID: <20100311145912.CF9091CBC3DA@idnmail.gen-info.osaka-u.ac.jp> Hi, I always use phylogenetic tree construction software such as PHYLIP and MEGA4, and I don't know much about the pure Ruby solutions. Below are found by using Google search. There are some pure Ruby implementations of clustering algorithms, though I haven't used them. AI4R (Artificial Intelligence for Ruby): http://ai4r.rubyforge.org/ clusterer: http://rubyforge.org/projects/clusterer/ I found a phylogenetic tree visualization implementation written in JRuby, and I found it can also work with normal Ruby 1.8.7. Egan A et al. (2008) IDEA: Interactive Display for Evolutionary Analyses. BMC Bioinformatics 2008, 9:524 http://www.biomedcentral.com/1471-2105/9/524 http://ideanalyses.sourceforge.net/ Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Mon, 1 Mar 2010 16:42:25 -0500 Jillian E Kozyra wrote: > Dear Colleagues, > > We are working on a linguistics project in which we will calculate language > similarities. From the language similarity matrix, we would like to create > either a hierarchical clustering output or phylogenetic tree. We seek a pure > Ruby plugin with which to do this. Could you give us some guidance? > > Thanks, > Jillian > > -- > 917-434-7511 > http://sswl.railsplayground.net > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From daniel.lundin at molbio.su.se Thu Mar 11 16:18:25 2010 From: daniel.lundin at molbio.su.se (Daniel Lundin) Date: Thu, 11 Mar 2010 17:18:25 +0100 Subject: [BioRuby] HMMER 3 parsers? In-Reply-To: <20100311141250.789AA1CBC58F@idnmail.gen-info.osaka-u.ac.jp> References: <4B96A5FB.9060607@molbio.su.se> <20100311141250.789AA1CBC58F@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4B9917D1.9000702@molbio.su.se> Naohisa GOTO skrev: > Hi, > > Christian Zmasek are now working for the HMMER 3 support. > It will be great if you can help us. > Certainly. Since my alternative is writing a parser for myself, I might as well put in my effort for the common good. Christian, is there anything in particular I could help with? I have started collecting some test cases for my own needs. /Daniel > http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer/hmmer3report.rb > http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer3.rb > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Tue, 09 Mar 2010 20:48:11 +0100 > Daniel Lundin wrote: > >> Hi, >> >> HMMER 3 is currently available as a first release candidate. With it >> comes several news both in the form of new tools and new kinds of data, >> which means output formats are changed. Is anybody working on BioRuby >> parsers for these? >> >> /D >> >> -- >> Daniel Lundin >> >> Department of Molecular Biology & Functional Genomics >> Arrhenius Laboratories for Natural Sciences >> Stockholm University, SE-106 91 Stockholm, Sweden >> >> tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 >> >> Email: daniel.lundin at molbio.su.se >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > -- Daniel Lundin Department of Molecular Biology & Functional Genomics Arrhenius Laboratories for Natural Sciences Stockholm University, SE-106 91 Stockholm, Sweden tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 Email: daniel.lundin at molbio.su.se From pjotr.public14 at thebird.nl Thu Mar 11 17:17:27 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 11 Mar 2010 18:17:27 +0100 Subject: [BioRuby] Ruby and Statistics In-Reply-To: <20100311135303.8C5201CBC41B@idnmail.gen-info.osaka-u.ac.jp> References: <2122bfdf-d902-4be1-aef2-95013cea31f6@ingm.it> <20100311135303.8C5201CBC41B@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100311171727.GD12523@thebird.nl> Hi Raoul, Biolib makes the GSL available for Ruby, as well as Rlib. So many standard statistics can be used, including linear regression, etc. If there is other libraries you want to use we can consider mapping those to Ruby (BOOST is a candidate). Main problem is that I am still in the process of documenting biolib before its release 1.0. If you are interested in using these tools, we can work it out between us. Just tell me what functions you want, and I'll help map/document them. Be great for Biolib - as testing is a good thing. Pj. On Thu, Mar 11, 2010 at 10:53:02PM +0900, Naohisa GOTO wrote: > Hi, > > I found some modules, but I haven't used them. > > math-statistics: http://www.notwork.org/~gotoken/ruby/p/statistics/ > > statarray: http://rubyforge.org/projects/statarray/ > > ruby-stats: http://pallas.telperion.info/ruby-stats/ > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Thu, 11 Mar 2010 14:02:23 +0100 > "Raoul Bonnal" wrote: > > > Hello Folks, > > I need to do statistical computations in Ruby, some time very basic operations like mean and stdv > > Which library do you suggest ? > > I don't want to use rsruby (R), for now. Er extend every time Array. > > > > I found this: ruby-statsample but I don't know if is the best one. > > > > -- > > Raoul J.P. Bonnal > > Life Science Informatics > > Integrative Biology Program > > Fondazione INGM > > Via F. Sforza 28 > > 20122 Milano, IT > > phone: +39 0 200 662 326 > > fax: +39 0 200 662 346 > > http://www.ingm.it > > > > > > > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From rutgeraldo at gmail.com Mon Mar 15 12:27:27 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Mon, 15 Mar 2010 12:27:27 +0000 Subject: [BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC In-Reply-To: <2bb9b24a1003110222h4bd642adv31d1975c9edc0bba@mail.gmail.com> References: <2bb9b24a1003100522p68330d6bu3f8e5f3a7f50dd6b@mail.gmail.com> <9081A9B5-611C-45C2-A099-44BAF1E524F4@hgc.jp> <2bb9b24a1003110222h4bd642adv31d1975c9edc0bba@mail.gmail.com> Message-ID: <2bb9b24a1003150527p439c135dm1a164e6a5218835f@mail.gmail.com> To follow up along more practical lines, I've had to deal with similar design issues in Bio::Phylo (perl), TreeBASE and Mesquite (both java). I've learned it makes sense to have: - a simple "annotation" object, with getters and setters for the predicate namespace uri, the predicate string, and the value object (either a literal or a uri), - a get_annotations method for all (fundamental) data objects in the toolkit that returns a collection of these annotation object this way, when you serialize any bioruby object into rdf, you can add as many other statements about that object as you want. Would a refactoring along those lines have a chance of being acceptable to the bioruby community (of course subsequent to a more detailed RFC, testing, discussion, proof of concept, etc.)? On Thursday, March 11, 2010, Rutger Vos wrote: > Hi Toshiaki, > > great to hear there's already been a lot of discussion over this. > (Well, I'd be surprised if there hadn't been :)) > > It looks to me like some fairly major bookkeeping would need to be > implemented high up in the inheritance tree if *all* bioruby objects > are to be serialized into RDF. It also would require all of bioruby to > be ontologized in one fell swoop. > > It is perhaps more likely that subdomains are going to be ontologized > more or less independently from one another (as you mention, > references->RDF, or in my case phylogenetics->RDF) based implicitly on > intermediate data formats (pubmed records and nexml, respectively). > > That is probably OK, we do things as needs arise. > > But what would be handy if the API was at least general enough so that > this was extensible and we can make additional statements *about* > objects when we serialize them to RDF. For example, in your pubmed > turtle file, the subject is always > . Is there a way, > programmatically, where I can add additional statements about > ? > > Rutger > > On Wed, Mar 10, 2010 at 2:21 PM, Toshiaki Katayama wrote: >> Hi Rutger, >> >> Thank you for your inputs on GSoC 2010! >> >>> * is there a way to express triples in BioRuby? >>> * if there is not, what would be a good design to express triples in >>> BioRuby so that this would be more useful than just for NeXML? >> >> This is what we discussed during the pre-BioHackathon 2010. >> >> http://hackathon3.dbcls.jp/wiki/BioRuby >> >> My first idea was to make all BioRuby object have common output >> method to render the object contents in various formats >> (such as RDF/XML, Turtle, HTML, GFF, FASTA etc. if appropriate). >> >> Then, we tried to separate view from logic using erb, but as you >> see in the above page, it still looks ugly. It is mainly because >> view formatting itself requires some additional codes, specific >> to each format. >> >> Therefore, we don't have a solid conclusion on this yet, unfortunately. >> >> Anyway, we already have PubMed to RDF converter written in Ruby as >> the TogoWS REST API (http://togows.dbcls.jp/site/en/rest.html) at >> >> http://togows.dbcls.jp/entry/pubmed/16381885 >> --> http://togows.dbcls.jp/entry/pubmed/16381885.ttl >> >> and, we are also trying to support KEGG to RDF conversion in this >> framework as well. I think we can put the code in BioRuby when we finished. >> >> Your suggestions are welcome. :) >> >> Regards, >> Toshiaki >> >> On 2010/03/10, at 22:22, Rutger Vos wrote: >> >>> Dear BioRuby-ites, >>> >>> my apologies that my first email to this list is so long and >>> tangential. I am trying to find out how to express RDF triples in >>> BioRuby. In this email I'm explaining why I care enough to try to get >>> funding for someone to work on this. If you don't care about any of >>> this, you can stop reading now. >>> >>> The National Evolutionary Synthesis Center (NESCent.org) is planning >>> to be a mentoring organization for the Google Summer of Code 2010. I >>> have submitted a project idea to this: to develop NeXML I/O and - >>> probably more importantly for you - RDF capabilities for BioRuby. If >>> funded, a student/coder will work on this full time over the summer, >>> under the shared supervision of Jan Aerts and myself. Here is the >>> link: http://tinyurl.com/biorubynexml >>> >>> NeXML is a data format for phylogenetic data that can be read and >>> written in perl, python, java and (to some extent) c++ and javascript. >>> RDF is the cool "new" thing (as per BioHackathon2010), but as far as I >>> can tell BioRuby isn't completely up to speed for it, yet. >>> >>> (As an aside: you might ask yourself why there is something like NeXML >>> when there is PhyloXML for BioRuby. The answer is that NeXML solves a >>> different problem: PhyloXML started essentially as a next generation >>> of New Hampshire eXtended (NHX) to meet the annotation needs of >>> comparative genomics, things such as gene duplications and other >>> molecular evolution events, on phylogenetic trees; NeXML started as a >>> complete XML representation of the NEXUS format, providing other >>> comparative data types such as categorical and continuous character >>> state matrices, restriction site matrices, and so on, in addition to >>> trees, taxa, sequence alignments. There is obviously some overlap >>> between the formats, but I guess that is not unique in bioinformatics >>> :)) >>> >>> NeXML has a semantic annotation facility that uses RDFa. This allows >>> us to add additional metadata to a fundamental phylogenetic data >>> object (a tree, taxon, character, etc.) to form a "triple": the >>> fundamental data object is the triple Subject, and the Predicate and >>> Object are added as RDFa attributes. Since NeXML can be transformed >>> using a standard XSL stylesheet to RDF/XML, we can express a limitless >>> number of statements about phylogenetics. H -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From ngoto at gen-info.osaka-u.ac.jp Fri Mar 19 05:18:41 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 19 Mar 2010 14:18:41 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> Begin forwarded message: Date: Thu, 18 Mar 2010 17:02:32 -0500 From: Chris Fields To: open-bio-l at lists.open-bio.org Subject: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! (forwarding to the Open-Bio list, as the original post is still clearing the OBF mail filters) Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2010 Administrator _______________________________________________ Open-Bio-l mailing list Open-Bio-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/open-bio-l From k.hayashi.info at gmail.com Tue Mar 23 12:20:52 2010 From: k.hayashi.info at gmail.com (Kazuhiro Hayashi) Date: Tue, 23 Mar 2010 21:20:52 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Hi, all My name is Kazuhiro Hayashi. I'm a 1st-year master's degree student at Depertment of Computational Biology, Graduate School of Frontier Sciences, the University of Tokyo. The reason why I sent this mail is to ask you some questions about Google Summer of Code 2010. I'm interested in Google Summer of Code 2010, Especially, the projects about BioRuby. At the moment, I will apply "Ruby 1.9.2 support of BioRuby and I'd like to contribute BioRuby community through Google Summer of Code 2010. So, I have three questions. Could you answer them? One is about differences between Ruby 1.8.x and 1.9.2 OBF's GSoC page says that the participant needs to know Ruby 1.9.2. Until now, I've used only Ruby 1.8.7 and never used Ruby 1.9.2. Honestly, I hardly know differences between Ruby 1.8.x and Ruby 1.9.2. Can I join this project? Another is how many programs in BioRuby run on Ruby 1.9.2. Could you tell me weather you have already known it or not (and how to know it)? The other is implementation of the unit tests. Does this mean that the participant needs to implement unit tests for all codes which haven't had them yet. Currently, These are all my questions about GSoC 2010. If you have some advice for the applicants, please send a reply to this mailing list. Thank you very much for reading my broken English. :-) Best regards 2010/3/19 Naohisa GOTO : > Begin forwarded message: > > Date: Thu, 18 Mar 2010 17:02:32 -0500 > From: Chris Fields > To: open-bio-l at lists.open-bio.org > Subject: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! > > > (forwarding to the Open-Bio list, as the original post is still clearing the OBF mail filters) > > Hi all, > > Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! > > GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo > > Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. > > For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. > > Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! > > Rob Buels > OBF GSoC 2010 Administrator > > > > _______________________________________________ > Open-Bio-l mailing list > Open-Bio-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > -- ??? Kazuhiro Hayashi Department of Computational Biology, The University of Tokyo email: k_hayashi at cb.k.u-tokyo.ac.jp tel: 04-7136-3988 From biopython at maubp.freeserve.co.uk Tue Mar 23 13:20:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Mar 2010 13:20:57 +0000 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <320fb6e01003230620l58717628t4d12f67411805c48@mail.gmail.com> On Tue, Mar 23, 2010 at 12:20 PM, Kazuhiro Hayashi wrote: > Hi, all > > My name is Kazuhiro Hayashi. > I'm a 1st-year master's degree student at Depertment of Computational > Biology, Graduate School of Frontier Sciences, the University of > Tokyo. > > The reason why I sent this mail is to ask you some questions about > Google Summer of Code 2010. > > ... > > Thank you very much for reading my broken English. :-) Hello Hayashi-san, I don't know if the BioRuby team have any preference for which language the Google Summer of Code projects will be discussed in (English and/or Japanese). It will probably depend on the mentors. However, there is also a Japanese BioRuby mailing list: http://lists.open-bio.org/mailman/listinfo/bioruby-ja Peter (@Biopython) From ngoto at gen-info.osaka-u.ac.jp Tue Mar 23 15:21:33 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 24 Mar 2010 00:21:33 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> Hi Kazuhiro, On Tue, 23 Mar 2010 21:20:52 +0900 Kazuhiro Hayashi wrote: > Hi, all > > My name is Kazuhiro Hayashi. > I'm a 1st-year master's degree student at Depertment of Computational > Biology, Graduate School of Frontier Sciences, the University of > Tokyo. > > The reason why I sent this mail is to ask you some questions about > Google Summer of Code 2010. > > I'm interested in Google Summer of Code 2010, Especially, the projects > about BioRuby. > At the moment, I will apply "Ruby 1.9.2 support of BioRuby and I'd > like to contribute BioRuby community through Google Summer of Code > 2010. > So, I have three questions. > Could you answer them? > > One is about differences between Ruby 1.8.x and 1.9.2 > OBF's GSoC page says that the participant needs to know Ruby 1.9.2. > Until now, I've used only Ruby 1.8.7 and never used Ruby 1.9.2. > Honestly, I hardly know differences between Ruby 1.8.x and Ruby 1.9.2. > Can I join this project? Yes. You will need to study about them during the project, but not now. I've modified the "needed skills" in the project wiki page to clarify the point. > Another is how many programs in BioRuby run on Ruby 1.9.2. > Could you tell me weather you have already known it or not (and how to know it)? I don't know much. Some programs worked, but some didn't. > The other is implementation of the unit tests. > Does this mean that the participant needs to implement unit tests for > all codes which haven't had them yet. Yes or no, depends on planning. One idea is to implement almost all with rough coding, and to improve them after that. I also think that classes and modules that strongly depend on external program or web service can be skipped. > Currently, These are all my questions about GSoC 2010. > > If you have some advice for the applicants, please send a reply to > this mailing list. > > Thank you very much for reading my broken English. :-) > > Best regards > > > 2010/3/19 Naohisa GOTO : > > Begin forwarded message: > > > > Date: Thu, 18 Mar 2010 17:02:32 -0500 > > From: Chris Fields > > To: open-bio-l at lists.open-bio.org > > Subject: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! > > > > > > (forwarding to the Open-Bio list, as the original post is still clearing the OBF mail filters) > > > > Hi all, > > > > Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! > > > > GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo > > > > Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. > > > > For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. > > > > Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! > > > > Rob Buels > > OBF GSoC 2010 Administrator > > > > > > > > _______________________________________________ > > Open-Bio-l mailing list > > Open-Bio-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > > > > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > > > -- > ??? > Kazuhiro Hayashi > Department of Computational Biology, The University of Tokyo > email: k_hayashi at cb.k.u-tokyo.ac.jp > tel: 04-7136-3988 > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From ngoto at gen-info.osaka-u.ac.jp Wed Mar 24 14:22:23 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 24 Mar 2010 23:22:23 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <320fb6e01003230620l58717628t4d12f67411805c48@mail.gmail.com> References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <320fb6e01003230620l58717628t4d12f67411805c48@mail.gmail.com> Message-ID: <20100324142225.08B501CBC3D0@idnmail.gen-info.osaka-u.ac.jp> Hi, The objective of the project is software development. I think it is OK to use Japanese for communicating with Japanese-speaking mentors. Using the bioruby-ja mailing list for discussion seems good. Students still need to write application form in English required by Google. It would be great if someone can help English proofreading for ESL students. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Tue, 23 Mar 2010 13:20:57 +0000 Peter wrote: > On Tue, Mar 23, 2010 at 12:20 PM, Kazuhiro Hayashi > wrote: > > Hi, all > > > > My name is Kazuhiro Hayashi. > > I'm a 1st-year master's degree student at Depertment of Computational > > Biology, Graduate School of Frontier Sciences, the University of > > Tokyo. > > > > The reason why I sent this mail is to ask you some questions about > > Google Summer of Code 2010. > > > > ... > > > > Thank you very much for reading my broken English. :-) > > Hello Hayashi-san, > > I don't know if the BioRuby team have any preference for which > language the Google Summer of Code projects will be discussed > in (English and/or Japanese). It will probably depend on the mentors. > > However, there is also a Japanese BioRuby mailing list: > http://lists.open-bio.org/mailman/listinfo/bioruby-ja > > Peter > (@Biopython) > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From k.hayashi.info at gmail.com Wed Mar 24 14:35:21 2010 From: k.hayashi.info at gmail.com (Kazuhiro Hayashi) Date: Wed, 24 Mar 2010 23:35:21 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Hi. Thank you for your replies. I'd like to communicate with you on this mailing list (and I will write e-mails in English as much as possible ). :- ) However, If I should do it on somewhere else, I will do so. I'm not sure where is the best place to talk about GSoC 2010. Anyway, I appriciate your advice. By the way, I have one more question. Could you tell me how much I have to write the proposal concretely? I have to write how to implement the programs and when I write each? Best regards Kazuhiro 2010/3/23 Peter : > On Tue, Mar 23, 2010 at 12:20 PM, Kazuhiro Hayashi > wrote: >> Hi, all >> >> My name is Kazuhiro Hayashi. >> I'm a 1st-year master's degree student at Depertment of Computational >> Biology, Graduate School of Frontier Sciences, the University of >> Tokyo. >> >> The reason why I sent this mail is to ask you some questions about >> Google Summer of Code 2010. >> >> ... >> >> Thank you very much for reading my broken English. :-) > > Hello Hayashi-san, > > I don't know if the BioRuby team have any preference for which > language the Google Summer of Code projects will be discussed > in (English and/or Japanese). It will probably depend on the mentors. > > However, there is also a Japanese BioRuby mailing list: > http://lists.open-bio.org/mailman/listinfo/bioruby-ja > > Peter > (@Biopython) > 2010?3?24?0:21 Naohisa GOTO : > Hi Kazuhiro, > > On Tue, 23 Mar 2010 21:20:52 +0900 > Kazuhiro Hayashi wrote: > >> Hi, all >> >> My name is Kazuhiro Hayashi. >> I'm a 1st-year master's degree student at Depertment of Computational >> Biology, Graduate School of Frontier Sciences, the University of >> Tokyo. >> >> The reason why I sent this mail is to ask you some questions about >> Google Summer of Code 2010. >> >> I'm interested in Google Summer of Code 2010, Especially, the projects >> about BioRuby. >> At the moment, I will apply "Ruby 1.9.2 support of BioRuby and I'd >> like to contribute BioRuby community through Google Summer of Code >> 2010. >> So, I have three questions. >> Could you answer them? >> >> One is about differences between Ruby 1.8.x and 1.9.2 >> OBF's GSoC page says that the participant needs to know Ruby 1.9.2. >> Until now, I've used only Ruby 1.8.7 and never used Ruby 1.9.2. >> Honestly, I hardly know differences between Ruby 1.8.x and Ruby 1.9.2. >> Can I join this project? > > Yes. > You will need to study about them during the project, but not now. > I've modified the "needed skills" in the project wiki page > to clarify the point. > >> Another is how many programs in BioRuby run on Ruby 1.9.2. >> Could you tell me weather you have already known it or not (and how to know it)? > > I don't know much. Some programs worked, but some didn't. > >> The other is implementation of the unit tests. >> Does this mean that the participant needs to implement unit tests for >> all codes which haven't had them yet. > > Yes or no, depends on planning. One idea is to implement > almost all with rough coding, and to improve them after that. > I also think that classes and modules that strongly depend > on external program or web service can be skipped. > >> Currently, These are all my questions about GSoC 2010. >> >> If you have some advice for the applicants, please send a reply to >> this mailing list. >> >> Thank you very much for reading my broken English. :-) >> >> Best regards >> >> >> 2010/3/19 Naohisa GOTO : >> > Begin forwarded message: >> > >> > Date: Thu, 18 Mar 2010 17:02:32 -0500 >> > From: Chris Fields >> > To: open-bio-l at lists.open-bio.org >> > Subject: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! >> > >> > >> > (forwarding to the Open-Bio list, as the original post is still clearing the OBF mail filters) >> > >> > Hi all, >> > >> > Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! >> > >> > GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo >> > >> > Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. >> > >> > For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. >> > >> > Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! >> > >> > Rob Buels >> > OBF GSoC 2010 Administrator >> > >> > >> > >> > _______________________________________________ >> > Open-Bio-l mailing list >> > Open-Bio-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/open-bio-l >> > >> > >> > _______________________________________________ >> > BioRuby Project - http://www.bioruby.org/ >> > BioRuby mailing list >> > BioRuby at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioruby >> > >> >> >> -- >> ??? >> Kazuhiro Hayashi >> Department of Computational Biology, The University of Tokyo >> email: k_hayashi at cb.k.u-tokyo.ac.jp >> tel: 04-7136-3988 >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > -- Kazuhiro Hayashi Department of Computational Biology, The University of Tokyo email: k_hayashi at cb.k.u-tokyo.ac.jp tel: 04-7136-3988 From biopython at maubp.freeserve.co.uk Wed Mar 24 14:51:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:51:46 +0000 Subject: [BioRuby] [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> Message-ID: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: > > On Mar 24, 2010, at 9:08 AM, Peter wrote: > >> Hi, >> >> This is probably of interest to all the Bio* projects offering access >> to the NCBI Entrez utilities. See forwarded message below. >> >> I *think* the new guidelines basically say that the email & tool parameters are >> optional BUT if your IP address ever gets banned for excessive use you then >> have to register an email & tool combination. >> >> Regarding the email address, the NCBI say to use the email of the developer >> (not the end user). However, they do not distinguish between the developers >> of a library (like us), and the developers of an application or script using a >> library (who may also be the end user). >> >> Currently we (Biopython) and I think BioPerl ask developers using our libraries >> to populate the email address themselves. I *think* this is still the >> right action. >> >> Peter > > > Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I > think with the SOAP-based ones as well). ?We're providing a specific set of > tools for user to write up their own applications end applications. ?I can try > contacting them regarding this to get an official response to clarify this > somewhat. Please give the NCBI an email - you can CC me too if you like. > Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a > default, but always leave the email blank and issue a warning if it isn't > set. ?We could just as easily leave both blank and issue warnings for both. We currently leave out the email and set the tool parameter to "Biopython" by default but this can be overridden. Currently leaving out the email does cause Biopython to give a warning. Peter From hlapp at drycafe.net Wed Mar 24 15:27:37 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Mar 2010 11:27:37 -0400 Subject: [BioRuby] [Open-bio-l] [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> On Mar 24, 2010, at 10:51 AM, Peter wrote: > Please give the NCBI an email - you can CC me too if you like. Can't this be the developers' mailing list (or lists, the appropriate one for each toolkit)? We can even whitelist all NCBI sender addresses so they can easily email us if there are issues. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From k.hayashi.info at gmail.com Thu Mar 25 17:31:07 2010 From: k.hayashi.info at gmail.com (Kazuhiro Hayashi) Date: Fri, 26 Mar 2010 02:31:07 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Hi, Thank you for your replies. I'd like to communicate with you on this mailing list (and I will write e-mails in English as much as possible ). :- ) However, If I should do it on somewhere else, I will do so. I'm not sure where is the best place to talk about GSoC 2010. Anyway, I appreciate your advice. By the way, I have one more question. Could you tell me how much I have to write the proposal concretely? I have to write how to implement the programs and when I write each? Best regards Kazuhiro ( I'm sorry if you have already received the same mail. I sent it yesterday, but I haven't received yet....) -- ??? Kazuhiro Hayashi Department of Computational Biology, The University of Tokyo email: k_hayashi at cb.k.u-tokyo.ac.jp tel: 04-7136-3988 From czmasek at burnham.org Fri Mar 26 00:39:42 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Thu, 25 Mar 2010 17:39:42 -0700 Subject: [BioRuby] HMMER 3 parsers? In-Reply-To: <4B9917D1.9000702@molbio.su.se> References: <4B96A5FB.9060607@molbio.su.se> <20100311141250.789AA1CBC58F@idnmail.gen-info.osaka-u.ac.jp> <4B9917D1.9000702@molbio.su.se> Message-ID: <4BAC024E.6000009@burnham.org> Hi, Daniel: Sorry for the late reply, for some reasons my email reader suddenly sorts messages wrongly. In any case, the parser for hmmer3 hmmscan and hmmsearch is basically finished. So, if I could somehow get access to your test cases, that would be great! Thank you! Christian Daniel Lundin wrote: > Naohisa GOTO skrev: >> Hi, >> >> Christian Zmasek are now working for the HMMER 3 support. >> It will be great if you can help us. >> > Certainly. Since my alternative is writing a parser for myself, I might > as well put in my effort for the common good. > > Christian, is there anything in particular I could help with? I have > started collecting some test cases for my own needs. > > /Daniel > >> http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer/hmmer3report.rb >> http://github.com/cmzmasek/bioruby/blob/master/lib/bio/appl/hmmer3.rb >> >> Naohisa Goto >> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org >> >> On Tue, 09 Mar 2010 20:48:11 +0100 >> Daniel Lundin wrote: >> >>> Hi, >>> >>> HMMER 3 is currently available as a first release candidate. With it >>> comes several news both in the form of new tools and new kinds of data, >>> which means output formats are changed. Is anybody working on BioRuby >>> parsers for these? >>> >>> /D >>> >>> -- >>> Daniel Lundin >>> >>> Department of Molecular Biology & Functional Genomics >>> Arrhenius Laboratories for Natural Sciences >>> Stockholm University, SE-106 91 Stockholm, Sweden >>> >>> tel. +46 (0)8 16 41 95, mobile: +46 (0)708 123 922, fax. +46 (0)8 16 64 88 >>> >>> Email: daniel.lundin at molbio.su.se >>> _______________________________________________ >>> BioRuby Project - http://www.bioruby.org/ >>> BioRuby mailing list >>> BioRuby at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioruby > > From ngoto at gen-info.osaka-u.ac.jp Fri Mar 26 12:43:38 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 26 Mar 2010 21:43:38 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100326124339.A02641CBC50D@idnmail.gen-info.osaka-u.ac.jp> Hi, It is generally good to write many specific details. However, the most important thing now is whether the proposal is accepted by Google. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Fri, 26 Mar 2010 02:31:07 +0900 Kazuhiro Hayashi wrote: > Hi, > > Thank you for your replies. > > I'd like to communicate with you on this mailing list (and I will > write e-mails in English as much as possible ). :- ) > However, If I should do it on somewhere else, I will do so. > I'm not sure where is the best place to talk about GSoC 2010. > Anyway, I appreciate your advice. > > > By the way, I have one more question. > Could you tell me how much I have to write the proposal concretely? > I have to write how to implement the programs and when I write each? > > Best regards > > Kazuhiro > > ( I'm sorry if you have already received the same mail. I sent it > yesterday, but I haven't received yet....) > > -- > ??? > Kazuhiro Hayashi > Department of Computational Biology, The University of Tokyo > email: k_hayashi at cb.k.u-tokyo.ac.jp > tel: 04-7136-3988 From k.hayashi.info at gmail.com Fri Mar 26 15:21:41 2010 From: k.hayashi.info at gmail.com (Kazuhiro Hayashi) Date: Sat, 27 Mar 2010 00:21:41 +0900 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <20100326124339.A02641CBC50D@idnmail.gen-info.osaka-u.ac.jp> References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> <20100326124339.A02641CBC50D@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Hi Goto-san, > It is generally good to write many specific details. > However, the most important thing now is whether the proposal > is accepted by Google. Is it possible to show you a draft of my proposal? I'd like you to proofread my proposal before the deadline for application. Best regards Kazuhiro 2010?3?26?21:43 Naohisa GOTO : > Hi, > > It is generally good to write many specific details. > However, the most important thing now is whether the proposal > is accepted by Google. > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Fri, 26 Mar 2010 02:31:07 +0900 > Kazuhiro Hayashi wrote: > >> Hi, >> >> Thank you for your replies. >> >> I'd like to communicate with you on this mailing list (and I will >> write e-mails in English as much as possible ). :- ) >> However, If I should do it on somewhere else, I will do so. >> I'm not sure where is the best place to talk about GSoC 2010. >> Anyway, I appreciate your advice. >> >> >> By the way, I have one more question. >> Could you tell me how much I have to write the proposal concretely? >> I have to write how to implement the programs and when I write each? >> >> Best regards >> >> Kazuhiro >> >> ( I'm sorry if you have already received the same mail. I sent it >> yesterday, but I haven't received yet....) >> >> -- >> ??? >> Kazuhiro Hayashi >> Department of Computational Biology, The University of Tokyo >> email: k_hayashi at cb.k.u-tokyo.ac.jp >> tel: 04-7136-3988 > > -- Kazuhiro Hayashi Department of Computational Biology, The University of Tokyo email: k_hayashi at cb.k.u-tokyo.ac.jp tel: 04-7136-3988 From czmasek at burnham.org Fri Mar 26 18:26:54 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Fri, 26 Mar 2010 11:26:54 -0700 Subject: [BioRuby] Fw: [Open-bio-l] Fwd: [Bioperl-l] Google Summer o Code is *ON* for OBF projects! In-Reply-To: References: <20100319051842.7B8751CBC46A@idnmail.gen-info.osaka-u.ac.jp> <20100323152133.B310E1CBC409@idnmail.gen-info.osaka-u.ac.jp> <20100326124339.A02641CBC50D@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4BACFC6E.4010303@burnham.org> Hi, Re. "Is it possible to show you a draft of my proposal?" I think this is not only possible, it is highly recommended. From my experience, a detailed, well written, and realistic proposal is very important. Remember, not all projects will get accepted (currently, OBF has 14 projects, I would be very surprised if more than half would get accepted at the end). The better a student's proposal, the more likely it is that the project will get accepted. Christian Kazuhiro Hayashi wrote: > Hi Goto-san, > >> It is generally good to write many specific details. >> However, the most important thing now is whether the proposal >> is accepted by Google. > > Is it possible to show you a draft of my proposal? > I'd like you to proofread my proposal before the deadline for application. > > Best regards > > Kazuhiro > > 2010?3?26?21:43 Naohisa GOTO : >> Hi, >> >> It is generally good to write many specific details. >> However, the most important thing now is whether the proposal >> is accepted by Google. >> >> Naohisa Goto >> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org >> >> On Fri, 26 Mar 2010 02:31:07 +0900 >> Kazuhiro Hayashi wrote: >> >>> Hi, >>> >>> Thank you for your replies. >>> >>> I'd like to communicate with you on this mailing list (and I will >>> write e-mails in English as much as possible ). :- ) >>> However, If I should do it on somewhere else, I will do so. >>> I'm not sure where is the best place to talk about GSoC 2010. >>> Anyway, I appreciate your advice. >>> >>> >>> By the way, I have one more question. >>> Could you tell me how much I have to write the proposal concretely? >>> I have to write how to implement the programs and when I write each? >>> >>> Best regards >>> >>> Kazuhiro >>> >>> ( I'm sorry if you have already received the same mail. I sent it >>> yesterday, but I haven't received yet....) >>> >>> -- >>> ??? >>> Kazuhiro Hayashi >>> Department of Computational Biology, The University of Tokyo >>> email: k_hayashi at cb.k.u-tokyo.ac.jp >>> tel: 04-7136-3988 >> > > > From sararayburn at gmail.com Sat Mar 27 20:13:01 2010 From: sararayburn at gmail.com (Sara Rayburn) Date: Sat, 27 Mar 2010 15:13:01 -0500 Subject: [BioRuby] GSOC 2010 preliminary proposal question Message-ID: Hello all. My name is Sara Rayburn. I'm a doctoral student at the University of Louisiana at Lafayette. I am planning to submit a proposal to implement the speciation/duplication inference algorithm this summer. I'd like to tackle both the implementation and the extension to non-binary trees. In reading the posted reference on reconciliation in non-binary trees, there are two types of duplications referenced, required and conditional duplications. In an implementation of this approach, would it be better to identify only required duplications and clear speciations, or should there be an additional distinction for the conditional duplications? I hope to post a preliminary project plan and proposal for feedback in the next couple of days. Thanks in advance for your feedback. Sara Rayburn University of Louisiana at Lafayette sararayburn at gmail.com From czmasek at burnham.org Mon Mar 29 23:32:12 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 29 Mar 2010 16:32:12 -0700 Subject: [BioRuby] Beta application for review: BioRuby - Simple duplication inference implementation In-Reply-To: References: Message-ID: <4BB1387C.6090503@burnham.org> Hi, Jure: Your application seems to be on the right way. In general, your time table needs to be more detailed. For each step you should list: 1. Goal/deliverable (you have that) 2. Approach 3. Time estimation (you have that) 4. Anticipated problems & possible alternative approaches Some more comments: > > *The idea:* > > We would implement the simple and fast duplication inference algorithm > described by Zmasek and Eddy (Zmasek and Eddy, 2001, "A simple algorithm > to infer gene duplication and speciation events on a gene tree". Finding > gene duplications is an extremely important part of bioinformatics and > biomedical research, as duplications are thought to be powerful drivers > in the evolution of new protein function. I think 'extremely important part of bioinformatics' is a somewhat of an exaggeration and too vague. Better write about how gene duplications complicate efforts on gene function prediction, and their significance in (the theory of) molecular evolution. > It is thus important to find > gene duplication sequences, which when translated are more likely to be > functionally different, and distinguish them from gene speciation > sequences, which are more likely functionally equivalent. 'gene duplication sequences' should be 'genes related by a duplication' or similar. 'gene speciation sequence' should be 'genes related by a speciation' or similar. > Currently the algorithm supports rooted fully binary trees and we would > like to change that, by also implementing support for unrooted and > non-binary trees. Goals are like this: 1. Implement algorithm as it is 2. Allow rooting of unrooted gene trees by minimizing sum of duplications. Optional: 3. Extend algorithm to work on non-binary species trees 4. Extend algorithm to work on non-binary gene trees > > *The work:* > > There are several milestones to be reached in developing this idea and > this is the work plan I propose: > 1. Development of unit tests with known species and gene trees (1 week). > > 2. Making or reusing necessary data structures, made easier by last > years GSoC contribution implementing phyloXML in BioRuby (1/2 weeks - 1 > week): > - gene tree, > - species tree, > - tree node, > - children(), > - parent(). > > 3. Developing checks for the correctness of input data for rooted fully > binary trees SDI (1/2 weeks - 1 week): > - making sure trees are rooted and binary, > - all species/gene tree nodes have at least on type of taxonomic data. > - making a taxonomy base from a type of data present in all nodes > (scientific or common name, taxonomy code, id), > - making sure taxonomic data is unique throughout external nodes. > 4. Implementation of the recursive M function (1 week) > - traverse the gene tree in postorder (left subtree, right subtree, root), > - finding occurrences where M(parent) equals M(child 1 or 2) - this is > representative for finding a duplication. If M(parent) matches neither, > the processed node is a speciation. > > 5. Milestone - finished implementation of SDI for rooted fully binary > trees (1/2 week): > - Extensive testing, > - cleaning up. > > 6. Working on unrooted non-binary trees implementation (4-8 weeks): > - Look to the forester java library SDI module for insight (by the > mentor of this project, Zmasek), > - Doing some heavy lifting, > - at this point I consider this implementation a possible pitfall, > because of substantially increased complexity. This needs to much more detailed. Species trees are always rooted. Unrooted gene trees can be handled naively by rooting them in all possible places, and running the SDI algorithm on each differently rooted tree, and keeping the gene tree which has the lowest number of duplications. A more efficient approach for this is described in: Zmasek and Eddy (2002). RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics. 2002 May 16;3:14. See: http://evogsoc2010.wordpress.com/2010/03/25/references-for-gene-duplications-proposal/ > > 7. Finishing up (1 week): > - Extensive testing, > - cleaning up. > > *Why me?:* > > I like to set foot on unknown territory and challenge myself constantly. > That being said, I have long searched for something that would connect > my love of medicine to my love of programming, and now, thanks to GSoC > and OBF, I think I found it - bioinformatics. I am at a stage of my > medical study, where I have to decide what my future will entail, and I > am (now, after thinking about it for a long time) positive that > bioinformatics will be a big part of it. What better way to get future > off to a good start, than with a Google Summer of Code project? Based on > this enthusiasm alone you can be assured that I'll work really hard on > this project and that I will be happy to see it done. As this would be > my first serious open source engagement, you also have a chance of > forming a completely new addition to the open source world and making an > excellent contributor out of me. > > *Previous experience:* > > 1. I have been working on a simulation of an analytical chemistry method > for the past 2 years now, more specifically we have modeled laser > ablation + inductively coupled plasma mass spectrometry with a simple > model, which aids our elemental mapping projects. For the write-up of > this project I have been awarded with a "Pre?ernovo priznanje" in 2008 > (PDF upon request). This work entails several interesting components, > from basics such as: C# development, image input, output, multi-threaded > programming, UI development; to complex themes such as: genetic > algorithms and neural networks. All of which I learned as we worked on > the project without much hassle (source code upon request). This work is > not yet open source, because we are in the finalizing stages of the > paper and will release the source code after publication under an open > source license. > > 2. I have programmed since I was a child and I have developed a wide > specter of things in my lifetime (from a full CMS in PHP to an IRC > robot, source code upon request), but I have little experience in fully > open source projects, which I think so highly of. > > *Biography:* > > My name is Jure Triglav and I'm a 24 year old medical student from > Ljubljana, Slovenia. I was born in a small town of Murska Sobota in > Slovenia, where I went to grade school (graded excellent for all years, > awarded "Zoisova ?tipendija" for the gifted, which I still hold) and > high-school (excellent, finished as "Zlati maturant" in the company of > about 200 best students in the country). I moved to Ljubljana in 2004 to > study medicine. I am now in the last year of my medical study which I > find challenging and very interesting. > My hobbies are all over the place, from book design to photography, from > web design to typography, from guitar to poetry, from reading to > programming, from traveling to sports. > > > > *Other obligations for the summer:* > > I have 5-hour daily clinical practice every weekday in June, July and > August, which is not nearly as serious as it sounds, especially since > this is the summer rotation which is known for its laid back feel. These > practice start at 8 am and finish at 1 pm, and for students are not > really stressful or exhausting at all. I have in the past juggled many > research obligations with clinical practice and my studies without > hiccups, but I will not do this this summer and will dedicate 8 hours > daily to Google Summer of Code, as I realize what a great opportunity > this is and how much work is required. I have no other work, research or > vacation obligations for the period of Google Summer of Code. Neverthelessm, this sounds like a serious concern. > > *Contact information: * > > (I will provide additional contact information in the final application) > Name: Jure Triglav > E-mail: juretriglav at gmail.com > IRC handle: x` on #obf-soc, #gsoc > From czmasek at burnham.org Mon Mar 29 23:39:29 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 29 Mar 2010 16:39:29 -0700 Subject: [BioRuby] Google summer of code 2010 - Stathis Kamperis In-Reply-To: <2218b9af1003290119q1c6b2eeclc3c84ffdbaa97b2a@mail.gmail.com> References: <2218b9af1003290119q1c6b2eeclc3c84ffdbaa97b2a@mail.gmail.com> Message-ID: <4BB13A31.8020203@burnham.org> Hi, Stathis: Thank you for your interest in this proposal! Stathis Kamperis wrote: > Dear Dr. Zmasek, > > my name is Stathis Kamperis and I'm interested in this year's Google > Summer of Code project: > "Implementation of algorithm to infer gene duplications in BioRuby". > > I am a medicine graduate, physics undergraduate and computer > enthusiast. I come from Greece and I am 26 years old. > I have a long standing programming experience with a vast range of > programming languages including, since recently, Ruby. > I also have a decent molecular/biology background. > > I successfully participated in last years Google Summer of Code > working for the DragonFlyBSD[1] organisation. My work had to do with > POSIX standard conformance audit, regression testing and quality > assurance. > > As I understand, the project is about implementing your algorithm to > BioRuby. Is there any prototype implemented in any language/framework > at the moment ? Yes, there is: See: http://forester-atv.cvs.sourceforge.net/viewvc/forester-atv/forester-atv/java/src/org/forester/sdi/ Especially, SDI.java and SDIR.java (for unrooted trees) In your abstract you mention: > "We show empirically, using 1750 gene trees constructed from the Pfam > protein family database, that it appears to be a practical (and often > superior) algorithm for analyzing real gene trees." > So, I wonder, what does 'empirically' mean here or how did you conduct > your tests ? Essentially, my Java implementation was used to run this tests. Hope this helps, Christian From czmasek at burnham.org Tue Mar 30 00:01:10 2010 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 29 Mar 2010 17:01:10 -0700 Subject: [BioRuby] GSOC 2010 preliminary proposal question In-Reply-To: References: Message-ID: <4BB13F46.7010607@burnham.org> Hi, Sara: Thank you for your interest in this proposal! I think focusing on 'required' duplications is appropriate, since non-binary species trees are oftentimes a means to express uncertainty in the "tree-of-life" and to prevent introduction of spurious duplications due to this. Christian Sara Rayburn wrote: > Hello all. My name is Sara Rayburn. I'm a doctoral student at the University of Louisiana at Lafayette. I am planning to submit a proposal to implement the speciation/duplication inference algorithm this summer. I'd like to tackle both the implementation and the extension to non-binary trees. In reading the posted reference on reconciliation in non-binary trees, there are two types of duplications referenced, required and conditional duplications. In an implementation of this approach, would it be better to identify only required duplications and clear speciations, or should there be an additional distinction for the conditional duplications? > > I hope to post a preliminary project plan and proposal for feedback in the next couple of days. Thanks in advance for your feedback. > > > > Sara Rayburn > University of Louisiana at Lafayette > sararayburn at gmail.com > > > > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby