From pjotr.public14 at thebird.nl Wed Apr 1 04:08:08 2009 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 1 Apr 2009 10:08:08 +0200 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D2C051.9090002@burnham.org> References: <49D2C051.9090002@burnham.org> Message-ID: <20090401080808.GA30972@thebird.nl> Hi Christian, I am happy to help. Pj. On Tue, Mar 31, 2009 at 06:16:01PM -0700, Christian M Zmasek wrote: > Hi: > > As you probably read on previous messages, NESCent is again having a > Google Summer of Code program > (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009). > > I am serving as the mentor of a project entitled "phyloXML support in > BioRuby" (see: > https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby). > > In order to ensure a successful outcome (should the project be > accepted), as well as to improve the chances of being accepted, I am > looking for people willing to serve as co-mentors. > > Christian > > > PS: Here is the full description of the project: > > > phyloXML support in BioRuby > > Rationale Evolutionary trees are central to comparative genomics > studies. > Trees used in this context are usually annotated with a variety of > data elements, such as taxonomic information, genome-related data > (gene names, functional annotations) and gene duplication events, as > well as information related to the evolutionary tree itself (branch > lengths, support values). phyloXML is an XML data exchange standard > that can represent this data. Trees in phyloXML format can be > displayed and analyzed with Archaeopteryx > (the successor to ATV > ), > which also allows manipulation and navigation of the tree. While > tools exist to convert other formats (such as the widely used Newick > and Nexus formats) to phyloXML, there is currently support for > phyloXML in only one of the open source Bio* projects (in BioPerl > , as a result of > Google's Summer of Code 2008). Approach Build phyloXML support in > the increasingly popular, dynamic, and > fully objected oriented language Ruby. More specifically, extend the > open source BioRuby project to support phyloXML (BioRuby 1.3.0 has > just been released). This will entail (i) the development of objects > to represent all the elements of phyloXML (sequences, taxonomic > data, annotations, etc), (ii) the development of a parser to read in > phyloXML, and (iii) a phyloXML writer. Challenges Relating the data > elements specific to phyloXML to the tree classes > already in BioRuby while maintaining the standards of the BioRuby > project. Development of a time and memory efficient phyloXML parser > (the parser has to be able to process trees with thousands of > external nodes, at least). Involved toolkits or projects BioRuby > , phyloXML Degree of > difficulty and needed skills Medium. Requires experience in an object > oriented programming > language (such as C++, Java, or, ideally, Ruby). Experience in > genomics or a related biological field is also critical. Knowledge > of BioRuby will obviously help, as well as familiarity with XML. > Mentors Christian Zmasek > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From bonnalraoul at ingm.it Wed Apr 1 06:37:27 2009 From: bonnalraoul at ingm.it (Raoul JP Bonnal) Date: Wed, 01 Apr 2009 12:37:27 +0200 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D2C051.9090002@burnham.org> References: <49D2C051.9090002@burnham.org> Message-ID: <49D343E7.1080007@ingm.it> Dear Christian, one of the outcomes from Biohackathon 09, for me, was to implement Phylo into BioSQL, could be of interest for this proposal ? Let me know. Christian M Zmasek ha scritto: > Hi: > > As you probably read on previous messages, NESCent is again having a > Google Summer of Code program > (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009). > > I am serving as the mentor of a project entitled "phyloXML support in > BioRuby" (see: > https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby). > > > In order to ensure a successful outcome (should the project be > accepted), as well as to improve the chances of being accepted, I am > looking for people willing to serve as co-mentors. > > Christian > > > PS: Here is the full description of the project: > > > phyloXML support in BioRuby > > Rationale Evolutionary trees are central to comparative genomics > studies. > Trees used in this context are usually annotated with a variety of > data elements, such as taxonomic information, genome-related data > (gene names, functional annotations) and gene duplication events, as > well as information related to the evolutionary tree itself (branch > lengths, support values). phyloXML is an XML data exchange standard > that can represent this data. Trees in phyloXML format can be > displayed and analyzed with Archaeopteryx > (the successor to ATV > > ), > > which also allows manipulation and navigation of the tree. While > tools exist to convert other formats (such as the widely used Newick > and Nexus formats) to phyloXML, there is currently support for > phyloXML in only one of the open source Bio* projects (in BioPerl > , as a result of > Google's Summer of Code 2008). Approach Build phyloXML support > in the increasingly popular, dynamic, and > fully objected oriented language Ruby. More specifically, extend the > open source BioRuby project to support phyloXML (BioRuby 1.3.0 has > just been released). This will entail (i) the development of objects > to represent all the elements of phyloXML (sequences, taxonomic > data, annotations, etc), (ii) the development of a parser to read in > phyloXML, and (iii) a phyloXML writer. Challenges Relating the > data elements specific to phyloXML to the tree classes > already in BioRuby while maintaining the standards of the BioRuby > project. Development of a time and memory efficient phyloXML parser > (the parser has to be able to process trees with thousands of > external nodes, at least). Involved toolkits or projects BioRuby > , phyloXML Degree > of difficulty and needed skills Medium. Requires experience in an > object oriented programming > language (such as C++, Java, or, ideally, Ruby). Experience in > genomics or a related biological field is also critical. Knowledge > of BioRuby will obviously help, as well as familiarity with XML. > Mentors Christian Zmasek > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From czmasek at burnham.org Wed Apr 1 14:20:23 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Wed, 1 Apr 2009 11:20:23 -0700 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D343E7.1080007@ingm.it> References: <49D2C051.9090002@burnham.org> <49D343E7.1080007@ingm.it> Message-ID: <49D3B067.3050101@burnham.org> Hi Raoul: Thank you for your reply. While your project sounds interesting and much needed, the project I was referring to is to implement phyloXML (www.phyloxml.org) support in BioRuby. Given that the deadline for Google Summer of Code projects is April 3rd, I think it would be almost impossible to propose a new project and find a student for it before this deadline. Sincerely, Christian Raoul JP Bonnal wrote: > Dear Christian, > one of the outcomes from Biohackathon 09, for me, was to implement Phylo > into BioSQL, could be of interest for this proposal ? > > Let me know. > > Christian M Zmasek ha scritto: > > >> Hi: >> >> As you probably read on previous messages, NESCent is again having a >> Google Summer of Code program >> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009). >> >> I am serving as the mentor of a project entitled "phyloXML support in >> BioRuby" (see: >> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby). >> >> >> In order to ensure a successful outcome (should the project be >> accepted), as well as to improve the chances of being accepted, I am >> looking for people willing to serve as co-mentors. >> >> Christian >> >> >> PS: Here is the full description of the project: >> >> >> phyloXML support in BioRuby >> >> Rationale Evolutionary trees are central to comparative genomics >> studies. >> Trees used in this context are usually annotated with a variety of >> data elements, such as taxonomic information, genome-related data >> (gene names, functional annotations) and gene duplication events, as >> well as information related to the evolutionary tree itself (branch >> lengths, support values). phyloXML is an XML data exchange standard >> that can represent this data. Trees in phyloXML format can be >> displayed and analyzed with Archaeopteryx >> (the successor to ATV >> >> ), >> >> which also allows manipulation and navigation of the tree. While >> tools exist to convert other formats (such as the widely used Newick >> and Nexus formats) to phyloXML, there is currently support for >> phyloXML in only one of the open source Bio* projects (in BioPerl >> , as a result of >> Google's Summer of Code 2008). Approach Build phyloXML support >> in the increasingly popular, dynamic, and >> fully objected oriented language Ruby. More specifically, extend the >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >> just been released). This will entail (i) the development of objects >> to represent all the elements of phyloXML (sequences, taxonomic >> data, annotations, etc), (ii) the development of a parser to read in >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the >> data elements specific to phyloXML to the tree classes >> already in BioRuby while maintaining the standards of the BioRuby >> project. Development of a time and memory efficient phyloXML parser >> (the parser has to be able to process trees with thousands of >> external nodes, at least). Involved toolkits or projects BioRuby >> , phyloXML Degree >> of difficulty and needed skills Medium. Requires experience in an >> object oriented programming >> language (such as C++, Java, or, ideally, Ruby). Experience in >> genomics or a related biological field is also critical. Knowledge >> of BioRuby will obviously help, as well as familiarity with XML. >> Mentors Christian Zmasek >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > > From czmasek at burnham.org Wed Apr 1 14:31:12 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Wed, 1 Apr 2009 11:31:12 -0700 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu> References: <49D2C051.9090002@burnham.org> <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu> Message-ID: <49D3B2F0.5030605@burnham.org> Hi, Joel: From my own experience as a co-mentor last year, the time requirement is rather limited. The project I co-mentored was "phyloXML in BioRuby". Since I am knowledgeable about phyloXML I mainly helped out with questions related to it. Of course, for this year's project "phyloXML in BioRuby", it will be the opposite. Since I am the main-mentor I will/am handling all the issues related to the application process/midterm evaluation/final-evaluation/phyloXML. But I would feel more comfortable, if I had one or two co-mentors with significant Ruby and BioRuby experience who could help out if questions specific to these languages arise. Christian PS: The project still needs to be accepted by the Nescent organization. Last year, there were about 15 projects, and only five got accepted (since Google gave Nescent five slots, Nescent had to decide which projects they liked the most). Joel Dudley wrote: > I am interested. What's involved in being a co-mentor? I am one of the > authors of MEGA (http://www.megasoftware.net) and I have been > programming with Ruby daily for about 3 years now. > > -- Joel > > On Mar 31, 2009, at 6:16 PM, Christian M Zmasek wrote: > > >> Hi: >> >> As you probably read on previous messages, NESCent is again having a >> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 >> ). >> >> I am serving as the mentor of a project entitled "phyloXML support >> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 >> #phyloXML_support_in_BioRuby). >> >> In order to ensure a successful outcome (should the project be >> accepted), as well as to improve the chances of being accepted, I am >> looking for people willing to serve as co-mentors. >> >> Christian >> >> >> PS: Here is the full description of the project: >> >> >> phyloXML support in BioRuby >> >> Rationale Evolutionary trees are central to comparative genomics >> studies. >> Trees used in this context are usually annotated with a variety of >> data elements, such as taxonomic information, genome-related data >> (gene names, functional annotations) and gene duplication events, as >> well as information related to the evolutionary tree itself (branch >> lengths, support values). phyloXML is an XML data exchange standard >> that can represent this data. Trees in phyloXML format can be >> displayed and analyzed with Archaeopteryx >> (the successor to ATV >> > >>> ), >>> >> which also allows manipulation and navigation of the tree. While >> tools exist to convert other formats (such as the widely used Newick >> and Nexus formats) to phyloXML, there is currently support for >> phyloXML in only one of the open source Bio* projects (in BioPerl >> , as a result of >> Google's Summer of Code 2008). Approach Build phyloXML support >> in the increasingly popular, dynamic, and >> fully objected oriented language Ruby. More specifically, extend the >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >> just been released). This will entail (i) the development of objects >> to represent all the elements of phyloXML (sequences, taxonomic >> data, annotations, etc), (ii) the development of a parser to read in >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the >> data elements specific to phyloXML to the tree classes >> already in BioRuby while maintaining the standards of the BioRuby >> project. Development of a time and memory efficient phyloXML parser >> (the parser has to be able to process trees with thousands of >> external nodes, at least). Involved toolkits or projects >> BioRuby , phyloXML > www.phyloxml.org> Degree of difficulty and needed skills Medium. >> Requires experience in an object oriented programming >> language (such as C++, Java, or, ideally, Ruby). Experience in >> genomics or a related biological field is also critical. Knowledge >> of BioRuby will obviously help, as well as familiarity with XML. >> Mentors Christian Zmasek >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > =============Email Policy============= > I now only check mail at the following times: > - 8 a.m., Noon, 5 p.m. Mon-Fri > - 8 a.m. Sat-Sun > If this is a true emergency, or you require an > immediate response from me, please contact > me via my mobile phone: (408) 464-8258 > ==================================== > > From ngoto at gen-info.osaka-u.ac.jp Thu Apr 2 01:08:38 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 2 Apr 2009 14:08:38 +0900 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D3B2F0.5030605@burnham.org> References: <49D2C051.9090002@burnham.org> <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu> <49D3B2F0.5030605@burnham.org> Message-ID: <20090402050839.73F151CBC3BD@idnmail.gen-info.osaka-u.ac.jp> Hi Christian, I'll answer any questions about BioRuby as much as I can. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Wed, 1 Apr 2009 11:31:12 -0700 Christian M Zmasek wrote: > Hi, Joel: > > From my own experience as a co-mentor last year, the time requirement > is rather limited. > The project I co-mentored was "phyloXML in BioRuby". Since I am > knowledgeable about phyloXML I mainly helped out with questions related > to it. > > Of course, for this year's project "phyloXML in BioRuby", it will be the > opposite. Since I am the main-mentor I will/am handling all the issues > related to the application process/midterm > evaluation/final-evaluation/phyloXML. But I would feel more comfortable, > if I had one or two co-mentors with significant Ruby and BioRuby > experience who could help out if questions specific to these languages > arise. > > Christian > > PS: The project still needs to be accepted by the Nescent organization. > Last year, there were about 15 projects, and only five got accepted > (since Google gave Nescent five slots, Nescent had to decide which > projects they liked the most). > > > > Joel Dudley wrote: > > I am interested. What's involved in being a co-mentor? I am one of the > > authors of MEGA (http://www.megasoftware.net) and I have been > > programming with Ruby daily for about 3 years now. > > > > -- Joel > > > > On Mar 31, 2009, at 6:16 PM, Christian M Zmasek wrote: > > > > > >> Hi: > >> > >> As you probably read on previous messages, NESCent is again having a > >> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 > >> ). > >> > >> I am serving as the mentor of a project entitled "phyloXML support > >> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 > >> #phyloXML_support_in_BioRuby). > >> > >> In order to ensure a successful outcome (should the project be > >> accepted), as well as to improve the chances of being accepted, I am > >> looking for people willing to serve as co-mentors. > >> > >> Christian > >> > >> > >> PS: Here is the full description of the project: > >> > >> > >> phyloXML support in BioRuby > >> > >> Rationale Evolutionary trees are central to comparative genomics > >> studies. > >> Trees used in this context are usually annotated with a variety of > >> data elements, such as taxonomic information, genome-related data > >> (gene names, functional annotations) and gene duplication events, as > >> well as information related to the evolutionary tree itself (branch > >> lengths, support values). phyloXML is an XML data exchange standard > >> that can represent this data. Trees in phyloXML format can be > >> displayed and analyzed with Archaeopteryx > >> (the successor to ATV > >> >> > >>> ), > >>> > >> which also allows manipulation and navigation of the tree. While > >> tools exist to convert other formats (such as the widely used Newick > >> and Nexus formats) to phyloXML, there is currently support for > >> phyloXML in only one of the open source Bio* projects (in BioPerl > >> , as a result of > >> Google's Summer of Code 2008). Approach Build phyloXML support > >> in the increasingly popular, dynamic, and > >> fully objected oriented language Ruby. More specifically, extend the > >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has > >> just been released). This will entail (i) the development of objects > >> to represent all the elements of phyloXML (sequences, taxonomic > >> data, annotations, etc), (ii) the development of a parser to read in > >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the > >> data elements specific to phyloXML to the tree classes > >> already in BioRuby while maintaining the standards of the BioRuby > >> project. Development of a time and memory efficient phyloXML parser > >> (the parser has to be able to process trees with thousands of > >> external nodes, at least). Involved toolkits or projects > >> BioRuby , phyloXML >> www.phyloxml.org> Degree of difficulty and needed skills Medium. > >> Requires experience in an object oriented programming > >> language (such as C++, Java, or, ideally, Ruby). Experience in > >> genomics or a related biological field is also critical. Knowledge > >> of BioRuby will obviously help, as well as familiarity with XML. > >> Mentors Christian Zmasek > >> > >> > >> _______________________________________________ > >> BioRuby mailing list > >> BioRuby at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioruby > >> From hlapp at gmx.net Thu Apr 2 01:25:43 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 2 Apr 2009 01:25:43 -0400 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D3B2F0.5030605@burnham.org> References: <49D2C051.9090002@burnham.org> <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu> <49D3B2F0.5030605@burnham.org> Message-ID: On Apr 1, 2009, at 2:31 PM, Christian M Zmasek wrote: > But I would feel more comfortable, if I had one or two co-mentors > with significant Ruby and BioRuby experience I would echo this and add that from NESCent's perspective as the mentoring organization having a committed co-mentor as a backup when the primary mentor for whatever reason is temporarily incommunicado is highly desirable. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bonnalraoul at ingm.it Thu Apr 2 04:43:34 2009 From: bonnalraoul at ingm.it (Raoul JP Bonnal) Date: Thu, 02 Apr 2009 10:43:34 +0200 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D3B067.3050101@burnham.org> References: <49D2C051.9090002@burnham.org> <49D343E7.1080007@ingm.it> <49D3B067.3050101@burnham.org> Message-ID: <49D47AB6.8090304@ingm.it> Ok, probably was not so clear, I'm available for co-mentoring if you want. Christian M Zmasek ha scritto: > Hi Raoul: > > Thank you for your reply. > While your project sounds interesting and much needed, the project I > was referring to is to implement phyloXML (www.phyloxml.org) support > in BioRuby. > Given that the deadline for Google Summer of Code projects is April > 3rd, I think it would be almost impossible to propose a new project > and find a student for it before this deadline. > > Sincerely, > > Christian > > > > Raoul JP Bonnal wrote: >> Dear Christian, >> one of the outcomes from Biohackathon 09, for me, was to implement >> Phylo into BioSQL, could be of interest for this proposal ? >> >> Let me know. >> >> Christian M Zmasek ha scritto: >> >> >>> Hi: >>> >>> As you probably read on previous messages, NESCent is again having a >>> Google Summer of Code program >>> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009). >>> >>> I am serving as the mentor of a project entitled "phyloXML support >>> in BioRuby" (see: >>> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby). >>> >>> >>> In order to ensure a successful outcome (should the project be >>> accepted), as well as to improve the chances of being accepted, I am >>> looking for people willing to serve as co-mentors. >>> >>> Christian >>> >>> >>> PS: Here is the full description of the project: >>> >>> >>> phyloXML support in BioRuby >>> >>> Rationale Evolutionary trees are central to comparative genomics >>> studies. >>> Trees used in this context are usually annotated with a variety of >>> data elements, such as taxonomic information, genome-related data >>> (gene names, functional annotations) and gene duplication events, as >>> well as information related to the evolutionary tree itself (branch >>> lengths, support values). phyloXML is an XML data exchange standard >>> that can represent this data. Trees in phyloXML format can be >>> displayed and analyzed with Archaeopteryx >>> (the successor to ATV >>> >>> ), >>> >>> which also allows manipulation and navigation of the tree. While >>> tools exist to convert other formats (such as the widely used Newick >>> and Nexus formats) to phyloXML, there is currently support for >>> phyloXML in only one of the open source Bio* projects (in BioPerl >>> , as a result of >>> Google's Summer of Code 2008). Approach Build phyloXML support >>> in the increasingly popular, dynamic, and >>> fully objected oriented language Ruby. More specifically, extend the >>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >>> just been released). This will entail (i) the development of objects >>> to represent all the elements of phyloXML (sequences, taxonomic >>> data, annotations, etc), (ii) the development of a parser to read in >>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the >>> data elements specific to phyloXML to the tree classes >>> already in BioRuby while maintaining the standards of the BioRuby >>> project. Development of a time and memory efficient phyloXML parser >>> (the parser has to be able to process trees with thousands of >>> external nodes, at least). Involved toolkits or projects >>> BioRuby , phyloXML >>> Degree of difficulty and needed skills >>> Medium. Requires experience in an object oriented programming >>> language (such as C++, Java, or, ideally, Ruby). Experience in >>> genomics or a related biological field is also critical. Knowledge >>> of BioRuby will obviously help, as well as familiarity with XML. >>> Mentors Christian Zmasek >>> >>> >>> _______________________________________________ >>> BioRuby mailing list >>> BioRuby at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioruby >>> >> >> >> > From czmasek at burnham.org Thu Apr 2 19:01:13 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Thu, 2 Apr 2009 16:01:13 -0700 Subject: [BioRuby] [PhyloSoC] Project plan for phyloXML integration with BioRuby In-Reply-To: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com> References: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com> Message-ID: <49D543B9.2050200@burnham.org> Diane: Thank you for your interest in this project! Indeed, the hour is late, and your proposal still needs significant work in order to be competitive. I think you already got some comments from Hilmar (I am at work and I cannot use IRC). Beside those, I'd like to suggest: 1. please make sure that all the studying is done before the coding begins (May 23) (i.e. your "week 1" should be during the "community bonding period"). 2. You do not need to develop classes for objects already present in BioRuby (such as phylogenetic trees). Actually, the less new classes you have to introduce to better -- reuse! 3. I am happy to see that you include unit test early on, this is good! You have to make extensive use of BioRuby's test suite. 4. In general, your weekly goals are not described in enough detail. It might be a good idea to discuss goals, deliverables, anticipated problems/difficulties (and possible solutions) for each week. 5. Documentation is very important(!) 6. Do you plan to maintain the code after the summer? Since the abstract is due April 3rd and cannot be changed after that it is best to concentrate on the abstract first, though. (The project plan can still be tweaked after April 3rd, I understand.) In the abstract you have to make clear that you understand the _biology_ behind the project. Why does phyloXML have the elements it has? Why is it useful? Might have a look at: http://www.phyloxml.org and http://www.tdwg.org/proceedings/article/view/437. Can you show that you understand what evolutionary trees are? Where and why are they used? Are they only important in 'tree of life' applications (e.g. see: http://www.liebertonline.com/doi/pdf/10.1089/omi.2006.10.231) What is "phylogenomics", what is "comparative genomics"? You could also go to [http://monochrome-effect.net/publications.html] and have a quick look at some of the papers there, most are related to the issues at hand, and some show real world applications of phylogenetic trees. How might participating in this project help your career? What do plan to learn? Why are you a good candidate for this? Hope this helps some, CZ Diana Jaunzeikare wrote: > Hi everybody, > > I know this is kinda late and I should have contacted you earlier, but > better later than never. I found out about Phyloinformatics Summer of > Code just last night when I was doing homework for Bioinformatics lab > on Phylogenetic reconstruction and Parsimony. I was reading various > bio-tech related blogs in Google reader and in the Google Top > Recomendations bar I saw the blog of The Tree of Life. There I read > the blog post about Phyloinformatics Summer of Code. This was very > exciting news for me! I almost jumped off the sofa of the excitement > :) Already for two years I wanted to participate in Google summer of > code, but it never really fit with what i was doing at the time. When > I saw the project about integrating phyloXML with BioRuby I knew it > was for me! I am a big fun of Ruby! Last semester for my Computational > Biology seminar I wrote bunch of scripts to deal with PDB database for > my final project. Also it has been pleasure to develop in Ruby on > Rails. What is even more exciting, is that my research interests lie > in Bioinformatics. In fact, I had thoughts before to develop for > BioRuby, but I didn't have a good enough reason before. > > Here is my project plan for building support for phyloXML for BioRuby. > I think the emphasis should be on the ease of use for biologists and a > lot of example code. > > > Week1 : > * Get familiar with BioRuby, its structure, classes (like Bio:Tree), > coding conventions, documentation conventions. See other > implementations of XML parsers in BioRuby (like BLAST XML). > * Get familiar with phyloXML, its structure, typical uses. Get data > set of many different files in phyloXML format for testing. > > Week 2: > * Try to write a program which would use phyloXMl data in order to > understand what would be the easiest way to use it. (Later will be > used for unit testing). > * Desing the architecture of phyloXML class, parser and writer, > interface with other classes (like alignment class). > > Week 3: Develop the some/basic/the most essential objects of phyloXML > (Phylogeny, Clade, Taxonomy). > > Weeks 4-5: Develop phyloXML parser. > > Weeks 6-7: Develop phyloXML writer. > > Week 8: Develop the rest of the objects of phyloXML. > > Week 9: Update parser. > > Week 10: Update writer. > > Week 11: Finish up documentation. Write extensive examples of how to > use the code. > > Week 12: Do write up of the project. > > > What do you think about the project plan? Anything missing? > > Thanks, > > Diana > > > Diana Jaunzeikare > Smith College > Computer Science and Math double major '10 > CS Department Liaison and Master Tutor > email: djaunzei at email.smith.edu > cell: 413-387-2083 From czmasek at burnham.org Thu Apr 2 23:15:41 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Thu, 2 Apr 2009 20:15:41 -0700 Subject: [BioRuby] [PhyloSoC] Project plan for phyloXML integration with BioRuby In-Reply-To: <4057d3bf0904021957o73caa39bu835b949493fcab85@mail.gmail.com> References: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com> <49D543B9.2050200@burnham.org> <4057d3bf0904021957o73caa39bu835b949493fcab85@mail.gmail.com> Message-ID: <49D57F5D.7070703@burnham.org> Hi, Diana: Looks better. I think you need to point out how evolutionary trees are used outside of 'tree of life' applications (i.e. phylogenomics, phylogeography, gene function prediction, ...) as those are import applications for which phyloXML has been designed for. Also, it is not expected that _you_ benchmark various XML parsers. It's good enough to rely on published results. The important point is that you, together with the BioRuby community, determine which one integrates best with BioRuby (i.e. ideally create no additional dependencies) and still provides acceptable performance. CZ Diana Jaunzeikare wrote: > Hi, > > I posted Abstract and new project plan. > > http://socghop.appspot.com/student_proposal/show/google/gsoc2009/dianaj/t123872262150 > > Diana > > On Thu, Apr 2, 2009 at 7:01 PM, Christian M Zmasek > > wrote: > > Diane: > > Thank you for your interest in this project! > > Indeed, the hour is late, and your proposal still needs > significant work in order to be competitive. > > I think you already got some comments from Hilmar (I am at work > and I cannot use IRC). > > Beside those, I'd like to suggest: > > 1. please make sure that all the studying is done before the > coding begins (May 23) (i.e. your "week 1" should be during the > "community bonding period"). > 2. You do not need to develop classes for objects already present > in BioRuby (such as phylogenetic trees). Actually, the less new > classes you have to introduce to better -- reuse! > 3. I am happy to see that you include unit test early on, this is > good! You have to make extensive use of BioRuby's test suite. > 4. In general, your weekly goals are not described in enough > detail. It might be a good idea to discuss goals, deliverables, > anticipated problems/difficulties (and possible solutions) for > each week. > 5. Documentation is very important(!) > 6. Do you plan to maintain the code after the summer? > > Since the abstract is due April 3rd and cannot be changed after > that it is best to concentrate on the abstract first, though. > (The project plan can still be tweaked after April 3rd, I understand.) > > In the abstract you have to make clear that you understand the > _biology_ behind the project. Why does phyloXML have the elements > it has? Why is it useful? Might have a look at: > http://www.phyloxml.org and > http://www.tdwg.org/proceedings/article/view/437. > > Can you show that you understand what evolutionary trees are? > Where and why are they used? > Are they only important in 'tree of life' applications (e.g. see: > http://www.liebertonline.com/doi/pdf/10.1089/omi.2006.10.231) > What is "phylogenomics", what is "comparative genomics"? > You could also go to > [http://monochrome-effect.net/publications.html] and have a quick > look at some of the papers there, most are related to the issues > at hand, and some show real world applications of phylogenetic trees. > > How might participating in this project help your career? What do > plan to learn? Why are you a good candidate for this? > > Hope this helps some, > > CZ > > > Diana Jaunzeikare wrote: > > Hi everybody, > > I know this is kinda late and I should have contacted you > earlier, but better later than never. I found out about > Phyloinformatics Summer of Code just last night when I was > doing homework for Bioinformatics lab on Phylogenetic > reconstruction and Parsimony. I was reading various bio-tech > related blogs in Google reader and in the Google Top > Recomendations bar I saw the blog of The Tree of Life. There I > read the blog post about Phyloinformatics Summer of Code. This > was very exciting news for me! I almost jumped off the sofa of > the excitement :) Already for two years I wanted to > participate in Google summer of code, but it never really fit > with what i was doing at the time. When I saw the project > about integrating phyloXML with BioRuby I knew it was for me! > I am a big fun of Ruby! Last semester for my Computational > Biology seminar I wrote bunch of scripts to deal with PDB > database for my final project. Also it has been pleasure to > develop in Ruby on Rails. What is even more exciting, is that > my research interests lie in Bioinformatics. In fact, I had > thoughts before to develop for BioRuby, but I didn't have a > good enough reason before. > > Here is my project plan for building support for phyloXML for > BioRuby. I think the emphasis should be on the ease of use for > biologists and a lot of example code. > > > Week1 : > * Get familiar with BioRuby, its structure, classes (like > Bio:Tree), coding conventions, documentation conventions. See > other implementations of XML parsers in BioRuby (like BLAST XML). > * Get familiar with phyloXML, its structure, typical uses. > Get data set of many different files in phyloXML format for > testing. > Week 2: > * Try to write a program which would use phyloXMl data in > order to understand what would be the easiest way to use it. > (Later will be used for unit testing). > * Desing the architecture of phyloXML class, parser and > writer, interface with other classes (like alignment class). > Week 3: Develop the some/basic/the most essential objects of > phyloXML (Phylogeny, Clade, Taxonomy). > Weeks 4-5: Develop phyloXML parser. > Weeks 6-7: Develop phyloXML writer. > Week 8: Develop the rest of the objects of phyloXML. > Week 9: Update parser. > > Week 10: Update writer. Week 11: Finish up documentation. > Write extensive examples of how to use the code. > > Week 12: Do write up of the project. > > What do you think about the project plan? Anything missing? > > Thanks, > > Diana > > > Diana Jaunzeikare > Smith College > Computer Science and Math double major '10 > CS Department Liaison and Master Tutor > email: djaunzei at email.smith.edu > > > > cell: 413-387-2083 > > > From kenglish at gmail.com Sun Apr 5 20:13:37 2009 From: kenglish at gmail.com (Kevin English) Date: Sun, 5 Apr 2009 14:13:37 -1000 Subject: [BioRuby] Blast with file as a query option? Message-ID: Hello, I have to very large local fasta files that wish to blast against one another and parse the results in bio ruby. I'm wondering if there is a way to mimic the behavior of this blast command: blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2 where Large_list_sequences_2 is a formatted fasta db. My current implementation opens Large_list_sequences_1.fasta and goes through it sequence by sequence. It seems to run pretty slow. I'm wondering if I can in some way do the above blast command and loop through the results and get a performance gain. For any curious, my code is on github: http://github.com/kenglishhi/bioflexrails/tree/master The file that is doing the blasts is under app/model/biodatabase.rb. I'm trying to write a rails app uses biosql db and allows this biologist to organize his sequences. I'm very new to bioinformatics but have a lot experience with Ruby on Rails. Thanks in advance for you help. Kevin From ngoto at gen-info.osaka-u.ac.jp Tue Apr 7 00:03:03 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 7 Apr 2009 13:03:03 +0900 Subject: [BioRuby] Blast with file as a query option? In-Reply-To: References: Message-ID: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp> Hi, On Sun, 5 Apr 2009 14:13:37 -1000 Kevin English wrote: > Hello, > I have to very large local fasta files that wish to blast against one > another and parse the results in bio ruby. I'm wondering if there is a way > to mimic the behavior of this blast command: > > blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2 > > > where Large_list_sequences_2 is a formatted fasta db. My current > implementation opens Large_list_sequences_1.fasta and goes through it > sequence by sequence. It seems to run pretty slow. I'm wondering if I can in > some way do the above blast command and loop through the results and get a > performance gain. To gain performance, adding options to BLAST is strongly recommended. -e Expectation value (E) [Real] default = 10.0 -v Number of database sequences to show one-line descriptions for (V) [Integer] default = 500 -b Number of database sequence to show alignments for (B) [Integer] default = 250 Changing above to smaller values will reduce output report size which means performance gain. Executing BLAST with multiple query sequences can also gain performance. In addition, when you have query sequences in a local file, calling blastall command directly without Bio::Blast may be good. For example, require 'bio' require 'tempfile' command = %( blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 ) tempfile = Tempfile.new('blastout') tempfile.close(false) command = command + [ "-o", tempfile.path ] system(*command) # After system(), error checks will be needed but skipped. tempfile.open ff = Bio::FlatFile.open(tempfile) ff.each do |report| # For example, prints query_def and target_def report.each do |hit| print report.query_def, "\t", hit.target_def, "\n" end end ff.close tempfile.close(true) > For any curious, my code is on github: > > http://github.com/kenglishhi/bioflexrails/tree/master > > The file that is doing the blasts is under app/model/biodatabase.rb. > > I'm trying to write a rails app uses biosql db and allows this biologist to > organize his sequences. I'm very new to bioinformatics but have a lot > experience with Ruby on Rails. > > Thanks in advance for you help. In general, a BLAST search against a very large database takes very long time, and using batch queueing system might be needed. Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From donttrustben at gmail.com Tue Apr 7 00:30:09 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 7 Apr 2009 14:30:09 +1000 Subject: [BioRuby] Blast with file as a query option? In-Reply-To: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp> References: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp> Message-ID: And there is the -a flag, for specifying you want to use multiple CPUs. ben 2009/4/7 Naohisa GOTO > Hi, > > On Sun, 5 Apr 2009 14:13:37 -1000 > Kevin English wrote: > > > Hello, > > I have to very large local fasta files that wish to blast against one > > another and parse the results in bio ruby. I'm wondering if there is a > way > > to mimic the behavior of this blast command: > > > > blastall -p blastn -i Large_list_sequences_1.fasta -d > Large_list_sequences_2 > > > > > > where Large_list_sequences_2 is a formatted fasta db. My current > > implementation opens Large_list_sequences_1.fasta and goes through it > > sequence by sequence. It seems to run pretty slow. I'm wondering if I can > in > > some way do the above blast command and loop through the results and get > a > > performance gain. > > To gain performance, adding options to BLAST is strongly recommended. > -e Expectation value (E) [Real] > default = 10.0 > -v Number of database sequences to show one-line descriptions for (V) > [Integer] > default = 500 > -b Number of database sequence to show alignments for (B) [Integer] > default = 250 > > Changing above to smaller values will reduce output report size > which means performance gain. > > Executing BLAST with multiple query sequences can also gain performance. > In addition, when you have query sequences in a local file, calling > blastall command directly without Bio::Blast may be good. > > For example, > > require 'bio' > require 'tempfile' > > command = %( blastall -p blastn -i Large_list_sequences_1.fasta > -d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 ) > tempfile = Tempfile.new('blastout') > tempfile.close(false) > command = command + [ "-o", tempfile.path ] > system(*command) > # After system(), error checks will be needed but skipped. > tempfile.open > ff = Bio::FlatFile.open(tempfile) > ff.each do |report| > # For example, prints query_def and target_def > report.each do |hit| > print report.query_def, "\t", hit.target_def, "\n" > end > end > ff.close > tempfile.close(true) > > > For any curious, my code is on github: > > > > http://github.com/kenglishhi/bioflexrails/tree/master > > > > The file that is doing the blasts is under app/model/biodatabase.rb. > > > > I'm trying to write a rails app uses biosql db and allows this biologist > to > > organize his sequences. I'm very new to bioinformatics but have a lot > > experience with Ruby on Rails. > > > > Thanks in advance for you help. > > In general, a BLAST search against a very large database takes > very long time, and using batch queueing system might be needed. > > Thanks, > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From davide.rambaldi at ifom-ieo-campus.it Tue Apr 7 05:05:48 2009 From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi) Date: Tue, 7 Apr 2009 11:05:48 +0200 Subject: [BioRuby] bioruby XML and libruby1.8 (ubuntu/debian) Message-ID: Hi, I got this ugly bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502535 Reported also by Naohisa Goto. There is any solution provided by ubuntu/debian? Or is better to recompile ruby from source? thanks. Davide Rambaldi, Bioinformatics PhD student. ----------------------------------------------------- Bioinformatic Group IFOM-IEO Campus Via Adamello 16, Milano I-20139 Italy [t] +39 02574303 066 [e] davide.rambaldi at ifom-ieo-campus.it [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi (homepage) [i] http://www.semm.it (PhD school) [i] http://www.btbs.unimib.it/ (Master) ----------------------------------------------------- From davide.rambaldi at ifom-ieo-campus.it Tue Apr 7 05:09:46 2009 From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi) Date: Tue, 7 Apr 2009 11:09:46 +0200 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D2C051.9090002@burnham.org> References: <49D2C051.9090002@burnham.org> Message-ID: <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it> Hi, I am available to help is there is need. I have a small experience with bioruby but I am really interested in partecipate in a project with bioruby. May be I can help in testing the code? Davide Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto: > Hi: > > As you probably read on previous messages, NESCent is again having a > Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 > ). > > I am serving as the mentor of a project entitled "phyloXML support > in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 > #phyloXML_support_in_BioRuby). > > In order to ensure a successful outcome (should the project be > accepted), as well as to improve the chances of being accepted, I am > looking for people willing to serve as co-mentors. > > Christian > > > PS: Here is the full description of the project: > > > phyloXML support in BioRuby > > Rationale Evolutionary trees are central to comparative genomics > studies. > Trees used in this context are usually annotated with a variety of > data elements, such as taxonomic information, genome-related data > (gene names, functional annotations) and gene duplication events, as > well as information related to the evolutionary tree itself (branch > lengths, support values). phyloXML is an XML data exchange standard > that can represent this data. Trees in phyloXML format can be > displayed and analyzed with Archaeopteryx > (the successor to ATV > >), > which also allows manipulation and navigation of the tree. While > tools exist to convert other formats (such as the widely used Newick > and Nexus formats) to phyloXML, there is currently support for > phyloXML in only one of the open source Bio* projects (in BioPerl > , as a result of > Google's Summer of Code 2008). Approach Build phyloXML support > in the increasingly popular, dynamic, and > fully objected oriented language Ruby. More specifically, extend the > open source BioRuby project to support phyloXML (BioRuby 1.3.0 has > just been released). This will entail (i) the development of objects > to represent all the elements of phyloXML (sequences, taxonomic > data, annotations, etc), (ii) the development of a parser to read in > phyloXML, and (iii) a phyloXML writer. Challenges Relating the > data elements specific to phyloXML to the tree classes > already in BioRuby while maintaining the standards of the BioRuby > project. Development of a time and memory efficient phyloXML parser > (the parser has to be able to process trees with thousands of > external nodes, at least). Involved toolkits or projects BioRuby > , phyloXML Degree > of difficulty and needed skills Medium. Requires experience in an > object oriented programming > language (such as C++, Java, or, ideally, Ruby). Experience in > genomics or a related biological field is also critical. Knowledge > of BioRuby will obviously help, as well as familiarity with XML. > Mentors Christian Zmasek > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby Davide Rambaldi, Bioinformatics PhD student. ----------------------------------------------------- Bioinformatic Group IFOM-IEO Campus Via Adamello 16, Milano I-20139 Italy [t] +39 02574303 066 [e] davide.rambaldi at ifom-ieo-campus.it [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi (homepage) [i] http://www.semm.it (PhD school) [i] http://www.btbs.unimib.it/ (Master) ----------------------------------------------------- From ngoto at gen-info.osaka-u.ac.jp Tue Apr 7 07:06:57 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 7 Apr 2009 20:06:57 +0900 Subject: [BioRuby] bioruby XML and libruby1.8 (ubuntu/debian) In-Reply-To: References: Message-ID: <20090407110659.0CD521CBC3F4@idnmail.gen-info.osaka-u.ac.jp> Hi, For Debian, the regression have been fixed with DSA-1695-1. http://www.debian.org/security/2009/dsa-1695 Normally, aptitude update && aptitude upgrade will fix the problem. I know nothing about Ubuntu. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Tue, 7 Apr 2009 11:05:48 +0200 Davide Rambaldi wrote: > Hi, I got this ugly bug: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502535 > > Reported also by Naohisa Goto. > > There is any solution provided by ubuntu/debian? > > Or is better to recompile ruby from source? > > thanks. > > Davide Rambaldi, > Bioinformatics PhD student. > ----------------------------------------------------- > Bioinformatic Group IFOM-IEO Campus > Via Adamello 16, Milano > I-20139 Italy > > [t] +39 02574303 066 > [e] davide.rambaldi at ifom-ieo-campus.it > [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi > (homepage) > [i] http://www.semm.it (PhD school) > [i] http://www.btbs.unimib.it/ (Master) > > ----------------------------------------------------- > > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From czmasek at burnham.org Wed Apr 8 19:14:35 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Wed, 8 Apr 2009 16:14:35 -0700 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it> References: <49D2C051.9090002@burnham.org> <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it> Message-ID: <49DD2FDB.4080909@burnham.org> Hi, Davide: Yes, help in testing code is always very welcome! Hopefully the project will get accepted! Christian Davide Rambaldi wrote: > Hi, I am available to help is there is need. > > I have a small experience with bioruby but I am really interested in > partecipate in a project with bioruby. > > May be I can help in testing the code? > > Davide > > > Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto: > > >> Hi: >> >> As you probably read on previous messages, NESCent is again having a >> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 >> ). >> >> I am serving as the mentor of a project entitled "phyloXML support >> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 >> #phyloXML_support_in_BioRuby). >> >> In order to ensure a successful outcome (should the project be >> accepted), as well as to improve the chances of being accepted, I am >> looking for people willing to serve as co-mentors. >> >> Christian >> >> >> PS: Here is the full description of the project: >> >> >> phyloXML support in BioRuby >> >> Rationale Evolutionary trees are central to comparative genomics >> studies. >> Trees used in this context are usually annotated with a variety of >> data elements, such as taxonomic information, genome-related data >> (gene names, functional annotations) and gene duplication events, as >> well as information related to the evolutionary tree itself (branch >> lengths, support values). phyloXML is an XML data exchange standard >> that can represent this data. Trees in phyloXML format can be >> displayed and analyzed with Archaeopteryx >> (the successor to ATV >> > >>> ), >>> >> which also allows manipulation and navigation of the tree. While >> tools exist to convert other formats (such as the widely used Newick >> and Nexus formats) to phyloXML, there is currently support for >> phyloXML in only one of the open source Bio* projects (in BioPerl >> , as a result of >> Google's Summer of Code 2008). Approach Build phyloXML support >> in the increasingly popular, dynamic, and >> fully objected oriented language Ruby. More specifically, extend the >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >> just been released). This will entail (i) the development of objects >> to represent all the elements of phyloXML (sequences, taxonomic >> data, annotations, etc), (ii) the development of a parser to read in >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the >> data elements specific to phyloXML to the tree classes >> already in BioRuby while maintaining the standards of the BioRuby >> project. Development of a time and memory efficient phyloXML parser >> (the parser has to be able to process trees with thousands of >> external nodes, at least). Involved toolkits or projects >> BioRuby , phyloXML > www.phyloxml.org> Degree of difficulty and needed skills Medium. >> Requires experience in an object oriented programming >> language (such as C++, Java, or, ideally, Ruby). Experience in >> genomics or a related biological field is also critical. Knowledge >> of BioRuby will obviously help, as well as familiarity with XML. >> Mentors Christian Zmasek >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > Davide Rambaldi, > Bioinformatics PhD student. > ----------------------------------------------------- > Bioinformatic Group IFOM-IEO Campus > Via Adamello 16, Milano > I-20139 Italy > > [t] +39 02574303 066 > [e] davide.rambaldi at ifom-ieo-campus.it > [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi > (homepage) > [i] http://www.semm.it (PhD school) > [i] http://www.btbs.unimib.it/ (Master) > > ----------------------------------------------------- > > > > > From ixpata at gmail.com Wed Apr 8 19:23:47 2009 From: ixpata at gmail.com (S M) Date: Wed, 8 Apr 2009 19:23:47 -0400 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49DD2FDB.4080909@burnham.org> References: <49D2C051.9090002@burnham.org> <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it> <49DD2FDB.4080909@burnham.org> Message-ID: I'd be interested in helping test too or in any other way I could help. I'm not experienced enough to program but maybe I can help on other ways. Sent from my iPhone On Apr 8, 2009, at 7:14 PM, Christian M Zmasek wrote: > Hi, Davide: > > Yes, help in testing code is always very welcome! > > Hopefully the project will get accepted! > > Christian > > > Davide Rambaldi wrote: >> Hi, I am available to help is there is need. >> >> I have a small experience with bioruby but I am really interested >> in partecipate in a project with bioruby. >> >> May be I can help in testing the code? >> >> Davide >> >> >> Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto: >> >> >>> Hi: >>> >>> As you probably read on previous messages, NESCent is again having >>> a Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 >>> ). >>> >>> I am serving as the mentor of a project entitled "phyloXML >>> support in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 >>> #phyloXML_support_in_BioRuby). >>> >>> In order to ensure a successful outcome (should the project be >>> accepted), as well as to improve the chances of being accepted, I >>> am looking for people willing to serve as co-mentors. >>> >>> Christian >>> >>> >>> PS: Here is the full description of the project: >>> >>> >>> phyloXML support in BioRuby >>> >>> Rationale Evolutionary trees are central to comparative >>> genomics studies. >>> Trees used in this context are usually annotated with a variety of >>> data elements, such as taxonomic information, genome-related data >>> (gene names, functional annotations) and gene duplication events, >>> as >>> well as information related to the evolutionary tree itself (branch >>> lengths, support values). phyloXML is an XML data exchange standard >>> that can represent this data. Trees in phyloXML format can be >>> displayed and analyzed with Archaeopteryx >>> (the successor to ATV >>> >>> ), >>>> >>> which also allows manipulation and navigation of the tree. While >>> tools exist to convert other formats (such as the widely used >>> Newick >>> and Nexus formats) to phyloXML, there is currently support for >>> phyloXML in only one of the open source Bio* projects (in BioPerl >>> , as a result of >>> Google's Summer of Code 2008). Approach Build phyloXML >>> support in the increasingly popular, dynamic, and >>> fully objected oriented language Ruby. More specifically, extend >>> the >>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >>> just been released). This will entail (i) the development of >>> objects >>> to represent all the elements of phyloXML (sequences, taxonomic >>> data, annotations, etc), (ii) the development of a parser to read >>> in >>> phyloXML, and (iii) a phyloXML writer. Challenges Relating >>> the data elements specific to phyloXML to the tree classes >>> already in BioRuby while maintaining the standards of the BioRuby >>> project. Development of a time and memory efficient phyloXML parser >>> (the parser has to be able to process trees with thousands of >>> external nodes, at least). Involved toolkits or projects >>> BioRuby , phyloXML >> > Degree of difficulty and needed skills Medium. Requires >>> experience in an object oriented programming >>> language (such as C++, Java, or, ideally, Ruby). Experience in >>> genomics or a related biological field is also critical. Knowledge >>> of BioRuby will obviously help, as well as familiarity with XML. >>> Mentors Christian Zmasek >>> >>> >>> _______________________________________________ >>> BioRuby mailing list >>> BioRuby at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioruby >>> >> >> Davide Rambaldi, >> Bioinformatics PhD student. >> ----------------------------------------------------- >> Bioinformatic Group IFOM-IEO Campus >> Via Adamello 16, Milano >> I-20139 Italy >> >> [t] +39 02574303 066 >> [e] davide.rambaldi at ifom-ieo-campus.it >> [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/ >> DavideRambaldi (homepage) >> [i] http://www.semm.it (PhD school) >> [i] http://www.btbs.unimib.it/ (Master) >> >> ----------------------------------------------------- >> >> >> >> >> > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From mkikkawa at gmail.com Fri Apr 10 03:20:19 2009 From: mkikkawa at gmail.com (Masahide Kikkawa) Date: Fri, 10 Apr 2009 16:20:19 +0900 Subject: [BioRuby] Bio::PubMed.efetch, bug? Message-ID: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com> Hi, I recently installed bioruby-1.3.0 and encountered a bug. Here is what I did result = Bio::PubMed.efetch([123,456,789]) I expected the result to be Array of MEDLINE formatted String as documented here: http://bioruby.org/rdoc/classes/Bio/PubMed.html However, the return value was one String containing all the results. In the bioruby-1.2.1, it does return Array of MEDLINE formatted String. Therefore I would recommend to apply following patch to ncbirest.rb < < result.strip! < if opts["retmode"] == "text" < result = result.split(/\n\n+/) < end < < return result --- > return result.strip > #return result.strip.split(/\n\n+/) --------------------------------------------------------------- Masahide Kikkawa, M.D., Ph. D. Professor Structural Biology Graduate School of Science Kyoto University Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502 JAPAN http://structure.biophys.kyoto-u.ac.jp/ Tel: +81-75-753-9421 FAX: +81-75-753-4218 --------------------------------------------------------------- From jan.aerts at gmail.com Fri Apr 10 03:30:32 2009 From: jan.aerts at gmail.com (Jan Aerts) Date: Fri, 10 Apr 2009 08:30:32 +0100 Subject: [BioRuby] searching through large sets of features Message-ID: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com> Hi all, I need to be able to handle millions of features for a visualization and therefore be able to search through them efficiently. I've been made aware of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data structure that could do just that. One of the issues that this approach would solve is that it doesn't need to load every single feature when the display resolution is not high enough (no use trying to plot 5 million points next to each other on a screen of 800 pixels wide; you can only plot 800 points....) I'm now trying to write a ruby-implementation of an R-Tree that is more fit for loci (i.e. one less dimension than an R-Tree, and has to handle chromosome as well) at http://github.com/jandot/locustree. I'm trying to make it a module that can be mixed in, so anything which has a chromosome, start and stop can be indexed and searched using this R-Tree adaptation. Does anyone know a ruby-implementation of R-Tree? Or is anyone interested in helping me out? Thanks, jan. From mail at michaelbarton.me.uk Fri Apr 10 09:33:30 2009 From: mail at michaelbarton.me.uk (Michael Barton) Date: Fri, 10 Apr 2009 14:33:30 +0100 Subject: [BioRuby] searching through large sets of features In-Reply-To: References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com> Message-ID: There's also some ruby searching algorithms here http://algorithms.rubyforge.org/ I'm happy to try and help if I can Cheers Mike 2009/4/10 Michael Barton > Would a bloom tree fit this problem? > http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/ > > 2009/4/10 Jan Aerts > > Hi all, >> I need to be able to handle millions of features for a visualization and >> therefore be able to search through them efficiently. I've been made aware >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data >> structure >> that could do just that. One of the issues that this approach would solve >> is >> that it doesn't need to load every single feature when the display >> resolution is not high enough (no use trying to plot 5 million points next >> to each other on a screen of 800 pixels wide; you can only plot 800 >> points....) >> >> I'm now trying to write a ruby-implementation of an R-Tree that is more >> fit >> for loci (i.e. one less dimension than an R-Tree, and has to handle >> chromosome as well) at http://github.com/jandot/locustree. I'm trying to >> make it a module that can be mixed in, so anything which has a chromosome, >> start and stop can be indexed and searched using this R-Tree adaptation. >> >> Does anyone know a ruby-implementation of R-Tree? Or is anyone interested >> in >> helping me out? >> >> Thanks, >> jan. >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > From jan.aerts at gmail.com Fri Apr 10 09:57:20 2009 From: jan.aerts at gmail.com (Jan Aerts) Date: Fri, 10 Apr 2009 14:57:20 +0100 Subject: [BioRuby] searching through large sets of features In-Reply-To: References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com> Message-ID: <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com> That is a _really_ good resource. Unfortunately, R-Trees are not in there (or I didn't look well enough). I was thinking of sending him an email to ask if he could quickly add that to his library :-), but saw that he's switching to python. (But that particular announcement was on the 1st of April, so let's hope it's a joke) j. 2009/4/10 Michael Barton > There's also some ruby searching algorithms here > http://algorithms.rubyforge.org/ > > I'm happy to try and help if I can > > Cheers > > Mike > > 2009/4/10 Michael Barton > > > Would a bloom tree fit this problem? > > > http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/ > > > > 2009/4/10 Jan Aerts > > > > Hi all, > >> I need to be able to handle millions of features for a visualization and > >> therefore be able to search through them efficiently. I've been made > aware > >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data > >> structure > >> that could do just that. One of the issues that this approach would > solve > >> is > >> that it doesn't need to load every single feature when the display > >> resolution is not high enough (no use trying to plot 5 million points > next > >> to each other on a screen of 800 pixels wide; you can only plot 800 > >> points....) > >> > >> I'm now trying to write a ruby-implementation of an R-Tree that is more > >> fit > >> for loci (i.e. one less dimension than an R-Tree, and has to handle > >> chromosome as well) at http://github.com/jandot/locustree. I'm trying > to > >> make it a module that can be mixed in, so anything which has a > chromosome, > >> start and stop can be indexed and searched using this R-Tree adaptation. > >> > >> Does anyone know a ruby-implementation of R-Tree? Or is anyone > interested > >> in > >> helping me out? > >> > >> Thanks, > >> jan. > >> _______________________________________________ > >> BioRuby mailing list > >> BioRuby at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioruby > >> > > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From mail at michaelbarton.me.uk Fri Apr 10 10:51:10 2009 From: mail at michaelbarton.me.uk (Michael Barton) Date: Fri, 10 Apr 2009 15:51:10 +0100 Subject: [BioRuby] searching through large sets of features In-Reply-To: <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com> References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com> <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com> Message-ID: You could use the kd-tree to narrow the search to the 1000 features each side of the middle point, then traverse this set to exclude those outside of the visualisation range. It's a bit of hack but it could save creating an R-tree if there's no existing implementation. 2009/4/10 Jan Aerts > That is a _really_ good resource. Unfortunately, R-Trees are not in there > (or I didn't look well enough). I was thinking of sending him an email to > ask if he could quickly add that to his library :-), but saw that he's > switching to python. (But that particular announcement was on the 1st of > April, so let's hope it's a joke) > j. > > > 2009/4/10 Michael Barton > >> There's also some ruby searching algorithms here >> http://algorithms.rubyforge.org/ >> >> I'm happy to try and help if I can >> >> Cheers >> >> Mike >> >> 2009/4/10 Michael Barton >> >> > Would a bloom tree fit this problem? >> > >> http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/ >> > >> > 2009/4/10 Jan Aerts >> > >> > Hi all, >> >> I need to be able to handle millions of features for a visualization >> and >> >> therefore be able to search through them efficiently. I've been made >> aware >> >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data >> >> structure >> >> that could do just that. One of the issues that this approach would >> solve >> >> is >> >> that it doesn't need to load every single feature when the display >> >> resolution is not high enough (no use trying to plot 5 million points >> next >> >> to each other on a screen of 800 pixels wide; you can only plot 800 >> >> points....) >> >> >> >> I'm now trying to write a ruby-implementation of an R-Tree that is more >> >> fit >> >> for loci (i.e. one less dimension than an R-Tree, and has to handle >> >> chromosome as well) at http://github.com/jandot/locustree. I'm trying >> to >> >> make it a module that can be mixed in, so anything which has a >> chromosome, >> >> start and stop can be indexed and searched using this R-Tree >> adaptation. >> >> >> >> Does anyone know a ruby-implementation of R-Tree? Or is anyone >> interested >> >> in >> >> helping me out? >> >> >> >> Thanks, >> >> jan. >> >> _______________________________________________ >> >> BioRuby mailing list >> >> BioRuby at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioruby >> >> >> > >> > >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 13 00:16:00 2009 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Mon, 13 Apr 2009 13:16:00 +0900 Subject: [BioRuby] Parsing MSF alignment file Message-ID: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp> I tried to parse an alignment file given in the MSF format by using Bio::GCG::Msf. It turned out though that points/dots (.) were used as a gap character in the alignment and that can't be handled by Bio::GCG::Msf. So, for what it's worth, I made these changes to bio/appl/gcg/msf.rb: $ diff msf.rb.old msf.rb.new 33,35c33,36 < if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ str[/.*/] then < @heading = str[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this < str.sub!(/.*/, '') --- > preamble, at data = str.split(/^\/\/$/) > if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ preamble[/.*/] then > @heading = preamble[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this > preamble.sub!(/.*/, '') 37c38 < str.sub!(/.*\.\.$/m, '') --- > preamble.sub!(/.*\.\.$/m, '') 48,49d48 < str.sub!(/.*\/\/$/m, '') < a = $&.to_s.split(/^/) 51c50 < a.each do |x| --- > preamble.split(/^/).each do |x| 59d57 < @data = str Best regards, Fredrik Johansson -- *********************************** Fredrik Johansson, grad. student Division of Bioinformatics Medical Institute of Bioregulation Kyushu University 3-1-1 Maidashi, Higashi-ku Fukuoka 812-8582, Japan fredjoha at bioreg.kyushu-u.ac.jp *********************************** From ngoto at gen-info.osaka-u.ac.jp Mon Apr 13 08:54:54 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Mon, 13 Apr 2009 21:54:54 +0900 Subject: [BioRuby] Parsing MSF alignment file In-Reply-To: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp> References: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp> Message-ID: <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp> Hi, On Mon, 13 Apr 2009 13:16:00 +0900 Fredrik Johansson wrote: > I tried to parse an alignment file given in the MSF format by using > Bio::GCG::Msf. It turned out though that points/dots (.) were used as a > gap character in the alignment and that can't be handled by > Bio::GCG::Msf. So, for what it's worth, I made these changes to > bio/appl/gcg/msf.rb: Thank you very much. Do you mean that Bio::GCG::Msf fails to parse an alignment that contain two contiguous gaps (..) in the end of a line? This is a bug, and the patch will soon be applied to the git repository. > $ diff msf.rb.old msf.rb.new In the next time, please use "diff -u" (unified context format). Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 13 09:19:27 2009 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Mon, 13 Apr 2009 22:19:27 +0900 Subject: [BioRuby] Parsing MSF alignment file In-Reply-To: <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp> References: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp> <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <49E33BDF.70200@bioreg.kyushu-u.ac.jp> Yes, that's what happened. A regular expression matched all the way to the last occurrence of two dots at the end of a line. > Thank you very much. Do you mean that Bio::GCG::Msf fails to > parse an alignment that contain two contiguous gaps (..) in the > end of a line? This is a bug, and the patch will soon be > applied to the git repository. > > >> $ diff msf.rb.old msf.rb.new >> > > In the next time, please use "diff -u" (unified context format). > > Ok, I'll attach the output of diff -u here: --- /usr/lib/ruby/gems/1.8/gems/bio-1.3.0/lib/bio/appl/gcg/msf.rb.old 2009-04-13 11:32:53.000000000 +0900 +++ /usr/lib/ruby/gems/1.8/gems/bio-1.3.0/lib/bio/appl/gcg/msf.rb 2009-04-13 13:36:26.000000000 +0900 @@ -30,11 +30,12 @@ # Creates a new Msf object. def initialize(str) str = str.sub(/\A[\r\n]+/, '') - if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ str[/.*/] then - @heading = str[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this - str.sub!(/.*/, '') + preamble, at data = str.split(/^\/\/$/) + if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ preamble[/.*/] then + @heading = preamble[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this + preamble.sub!(/.*/, '') end - str.sub!(/.*\.\.$/m, '') + preamble.sub!(/.*\.\.$/m, '') @description = $&.to_s.sub(/^.*\.\.$/, '').to_s d = $&.to_s if m = /(.+)\s+MSF\:\s+(\d+)\s+Type\:\s+(\w)\s+(.+)\s+(Comp)?Check\:\s+(\d+)/.match(d) then @@ -45,10 +46,8 @@ @checksum = (m[6] ? m[6].to_i : nil) end - str.sub!(/.*\/\/$/m, '') - a = $&.to_s.split(/^/) @seq_info = [] - a.each do |x| + preamble.split(/^/).each do |x| if /Name\: / =~ x then s = {} x.scan(/(\S+)\: +(\S*)/) { |y| s[$1] = $2 } @@ -56,7 +55,6 @@ end end - @data = str @description.sub!(/\A(\r\n|\r|\n)/, '') @align = nil end From andrew.j.grimm at gmail.com Tue Apr 14 05:30:16 2009 From: andrew.j.grimm at gmail.com (Andrew Grimm) Date: Tue, 14 Apr 2009 19:30:16 +1000 Subject: [BioRuby] Status of bioruby with Debian packages Message-ID: According to http://packages.qa.debian.org/b/bioruby.html , the latest bioruby available as a Debian package is 1.1.0 or 1.1.0-1. Are there plans to update bioruby's debian package? This isn't an major issue for me - our ubuntu system already has bioruby installed for ruby 1.8, and I was merely hoping to use ruby 1.9 merely for faster performance. If a debian update isn't likely, then we might try using rubygems on ubuntu and hope that the package management police don't arrest us! Andrew PS: Is there anything better than just plain web search for searching bioruby's mailing list archives? From davide.rambaldi at ifom-ieo-campus.it Wed Apr 15 05:24:03 2009 From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi) Date: Wed, 15 Apr 2009 11:24:03 +0200 Subject: [BioRuby] ACE assembly format Message-ID: <7d40e94d36b1af5fd01fb9a908512aa2@ifom-ieo-campus.it> Hi all, there is in bioruby a parser for the .ace format? Something like Bio::Assembly::IO::ace in BioPerl? Many thanks. Davide Rambaldi -- [- Il Tuo 5 per Mille a favore della Ricerca dell'Istituto Europeo di Oncologia Tutti coloro che presentano il modello Unico, il modello 730 o pi? semplicemente che ricevono dal proprio datore di lavoro il modello CUD, hanno la facolt? di scegliere la destinazione del proprio 5 per mille. Nella casella riservata al Finanziamento agli Enti della Ricerca Sanitaria inserisci il codice fiscale dello IEO (08 69 14 40 153) ed apponi la Tua firma. Il Tuo 5 per Mille verr? destinato alla ricerca contro il cancro dell'Istituto Europeo di Oncologia NON COSTA NULLA E NON COMPORTA ALCUN AUMENTO DELLE IMPOSTE DA VERSARE NON ? UN'ALTERNATIVA ALL'8 PER MILLE E' UN GESTO CONCRETO E DI GRANDE VALORE Per saperne di pi? vai al sito dello IEO www.ieo.it < http://www.ieo.it > oppure scrivi a fondazione at ieo.it < mailto:fondazione at ieo.it> Segnala ad un amico questa opportunit? -] From ngoto at gen-info.osaka-u.ac.jp Wed Apr 15 07:04:00 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 15 Apr 2009 20:04:00 +0900 Subject: [BioRuby] Status of bioruby with Debian packages In-Reply-To: References: Message-ID: <20090415110401.744641CBC544@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 14 Apr 2009 19:30:16 +1000 Andrew Grimm wrote: > According to http://packages.qa.debian.org/b/bioruby.html , the latest > bioruby available as a Debian package is 1.1.0 or 1.1.0-1. > > Are there plans to update bioruby's debian package? I don't know, but I'm also a debian user, and I've sent a wishlist to the Debian BTS. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=524180 In general, Debian specific issues should be asked to Debian mailing lists, Bug Tracking System, and/or package maintainers. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From fredjoha at bioreg.kyushu-u.ac.jp Sun Apr 19 23:35:27 2009 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Mon, 20 Apr 2009 12:35:27 +0900 Subject: [BioRuby] PubMed.efetch error Message-ID: <49EBED7F.4010509@bioreg.kyushu-u.ac.jp> PubMed.efetch does not give me correct output if I give an array as argument. I think it is supposed to return an array of MEDLINE entries, but it returns all entries in one string. I can get an array if I type Bio::PubMed.efetch(entries).split("\n\n") But that can't be the intended behaviour, right? Regards, Fredrik Johansson -- *********************************** Fredrik Johansson, grad. student Division of Bioinformatics Medical Institute of Bioregulation Kyushu University 3-1-1 Maidashi, Higashi-ku Fukuoka 812-8582, Japan fredjoha at bioreg.kyushu-u.ac.jp *********************************** From tomoakin at kenroku.kanazawa-u.ac.jp Tue Apr 21 04:29:26 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Tue, 21 Apr 2009 17:29:26 +0900 Subject: [BioRuby] Read/write of simple fasta file increases the identifier. Message-ID: Hi, With bioruby-1.3.0, reading a fasta file and convert to Bio::Sequence with to_seq and then write with output(:fasta) causes the definition being longer each time the file is processed... Is there a better interface to keep the definition line as the original? Perhaps this is because, in some format there are cleary separate entry_id and definition, while it is ambiguous in FASTA format. However, it is better to be able to easily recover the original definition line. The following is the simple example case. Surely this is quite simple that the entry need not converted to Bio::Sequence, but in fact I would like to manipulate the sequence such as getting subsequence or adding some sequence, translate, etc... % cat > simple_fasta >abc acgttgac % cat fastacat #!/usr/local/bin/ruby require 'bio' ff = Bio::FlatFile.open(Bio::FastaFormat, ARGF) while fe = ff.next_entry seq = fe.to_seq; puts seq.output(:fasta) end % ruby fastacat simple_fasta >abc abc acgttgac % ruby fastacat simple_fasta | ruby fastacat >abc abc abc acgttgac Sincerely yours, -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan From ngoto at gen-info.osaka-u.ac.jp Tue Apr 21 10:17:26 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 21 Apr 2009 23:17:26 +0900 Subject: [BioRuby] Bio::PubMed.efetch, bug? In-Reply-To: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com> References: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com> Message-ID: <20090421141727.C5B6A1CBC4A8@idnmail.gen-info.osaka-u.ac.jp> Hi, On Fri, 10 Apr 2009 16:20:19 +0900 Masahide Kikkawa wrote: > Hi, > > > I recently installed bioruby-1.3.0 and encountered a bug. > > Here is what I did > result = Bio::PubMed.efetch([123,456,789]) > > I expected the result to be Array of MEDLINE formatted String > as documented here: > http://bioruby.org/rdoc/classes/Bio/PubMed.html > > However, the return value was one String containing all the results. > > In the bioruby-1.2.1, it does return Array of MEDLINE formatted String. Thank you for reporting a bug. This is a regression introduced with the refactoring of Bio::PubMed between 1.2.1 and 1.3.0. > Therefore I would recommend to apply following patch to ncbirest.rb > > < > < result.strip! > < if opts["retmode"] == "text" > < result = result.split(/\n\n+/) > < end > < > < return result > --- > > return result.strip > > #return result.strip.split(/\n\n+/) > Bio::NCBI::REST is not only for PubMed but also for many databases in NCBI, and efetch may return free text, depending on databases. Thus, the processing of the result should only be applied to Bio::PubMed. http://github.com/bioruby/bioruby/commit/a48a9a35b87dead069fe328ba7086977304af995 To confirm bug fix, tests for Bio::PubMed are also added. http://github.com/bioruby/bioruby/commit/bf5ba6d4503f3ddb0ca31673882f5b396a932bbe In addition, I found a bug about "retstart" and fixed the bug. http://github.com/bioruby/bioruby/commit/fc0339fe8a42cd00199cfdc938590ae9626551bc Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From rozziite at gmail.com Thu Apr 30 21:37:07 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Thu, 30 Apr 2009 21:37:07 -0400 Subject: [BioRuby] Google Summer of Code Intro: PhyloXML support in BioRuby Message-ID: <4057d3bf0904301837r302bfb2buaa8a644c448267fa@mail.gmail.com> Hi all, I would like to introduce myself. My name is Diana and I have been accepted for Google Summer of Code to implement PhyloXML support for BioRuby. I am a junior at Smith College double majoring in Computer Science and Math. I am interested in Bioinformatics, especially protein structure based phylogenetics. Here is the project abstract: === Phylogenetic trees are used in important applications, including phylogenomics, phylogeography, gene function prediction, cladistics and the study of molecular evolution. In order to foster successful analysis, exchange, storage and reuse of phylogenetic trees and associated data, the phyloXML format was developed. It can store all necessary information about the phylogenetic tree, like clade, sequence, name and distance. The goal of this project is to implement support for phyloXML in BioRuby. === Here is wiki: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:PhyloXML_support_in_BioRuby Any comments are welcome! Cheers, Diana From czmasek at burnham.org Wed Apr 1 01:16:01 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Tue, 31 Mar 2009 18:16:01 -0700 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors Message-ID: <49D2C051.9090002@burnham.org> Hi: As you probably read on previous messages, NESCent is again having a Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009). I am serving as the mentor of a project entitled "phyloXML support in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby). In order to ensure a successful outcome (should the project be accepted), as well as to improve the chances of being accepted, I am looking for people willing to serve as co-mentors. Christian PS: Here is the full description of the project: phyloXML support in BioRuby Rationale Evolutionary trees are central to comparative genomics studies. Trees used in this context are usually annotated with a variety of data elements, such as taxonomic information, genome-related data (gene names, functional annotations) and gene duplication events, as well as information related to the evolutionary tree itself (branch lengths, support values). phyloXML is an XML data exchange standard that can represent this data. Trees in phyloXML format can be displayed and analyzed with Archaeopteryx (the successor to ATV ), which also allows manipulation and navigation of the tree. While tools exist to convert other formats (such as the widely used Newick and Nexus formats) to phyloXML, there is currently support for phyloXML in only one of the open source Bio* projects (in BioPerl , as a result of Google's Summer of Code 2008). Approach Build phyloXML support in the increasingly popular, dynamic, and fully objected oriented language Ruby. More specifically, extend the open source BioRuby project to support phyloXML (BioRuby 1.3.0 has just been released). This will entail (i) the development of objects to represent all the elements of phyloXML (sequences, taxonomic data, annotations, etc), (ii) the development of a parser to read in phyloXML, and (iii) a phyloXML writer. Challenges Relating the data elements specific to phyloXML to the tree classes already in BioRuby while maintaining the standards of the BioRuby project. Development of a time and memory efficient phyloXML parser (the parser has to be able to process trees with thousands of external nodes, at least). Involved toolkits or projects BioRuby , phyloXML Degree of difficulty and needed skills Medium. Requires experience in an object oriented programming language (such as C++, Java, or, ideally, Ruby). Experience in genomics or a related biological field is also critical. Knowledge of BioRuby will obviously help, as well as familiarity with XML. Mentors Christian Zmasek From pjotr.public14 at thebird.nl Wed Apr 1 08:08:08 2009 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 1 Apr 2009 10:08:08 +0200 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D2C051.9090002@burnham.org> References: <49D2C051.9090002@burnham.org> Message-ID: <20090401080808.GA30972@thebird.nl> Hi Christian, I am happy to help. Pj. On Tue, Mar 31, 2009 at 06:16:01PM -0700, Christian M Zmasek wrote: > Hi: > > As you probably read on previous messages, NESCent is again having a > Google Summer of Code program > (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009). > > I am serving as the mentor of a project entitled "phyloXML support in > BioRuby" (see: > https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby). > > In order to ensure a successful outcome (should the project be > accepted), as well as to improve the chances of being accepted, I am > looking for people willing to serve as co-mentors. > > Christian > > > PS: Here is the full description of the project: > > > phyloXML support in BioRuby > > Rationale Evolutionary trees are central to comparative genomics > studies. > Trees used in this context are usually annotated with a variety of > data elements, such as taxonomic information, genome-related data > (gene names, functional annotations) and gene duplication events, as > well as information related to the evolutionary tree itself (branch > lengths, support values). phyloXML is an XML data exchange standard > that can represent this data. Trees in phyloXML format can be > displayed and analyzed with Archaeopteryx > (the successor to ATV > ), > which also allows manipulation and navigation of the tree. While > tools exist to convert other formats (such as the widely used Newick > and Nexus formats) to phyloXML, there is currently support for > phyloXML in only one of the open source Bio* projects (in BioPerl > , as a result of > Google's Summer of Code 2008). Approach Build phyloXML support in > the increasingly popular, dynamic, and > fully objected oriented language Ruby. More specifically, extend the > open source BioRuby project to support phyloXML (BioRuby 1.3.0 has > just been released). This will entail (i) the development of objects > to represent all the elements of phyloXML (sequences, taxonomic > data, annotations, etc), (ii) the development of a parser to read in > phyloXML, and (iii) a phyloXML writer. Challenges Relating the data > elements specific to phyloXML to the tree classes > already in BioRuby while maintaining the standards of the BioRuby > project. Development of a time and memory efficient phyloXML parser > (the parser has to be able to process trees with thousands of > external nodes, at least). Involved toolkits or projects BioRuby > , phyloXML Degree of > difficulty and needed skills Medium. Requires experience in an object > oriented programming > language (such as C++, Java, or, ideally, Ruby). Experience in > genomics or a related biological field is also critical. Knowledge > of BioRuby will obviously help, as well as familiarity with XML. > Mentors Christian Zmasek > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From bonnalraoul at ingm.it Wed Apr 1 10:37:27 2009 From: bonnalraoul at ingm.it (Raoul JP Bonnal) Date: Wed, 01 Apr 2009 12:37:27 +0200 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D2C051.9090002@burnham.org> References: <49D2C051.9090002@burnham.org> Message-ID: <49D343E7.1080007@ingm.it> Dear Christian, one of the outcomes from Biohackathon 09, for me, was to implement Phylo into BioSQL, could be of interest for this proposal ? Let me know. Christian M Zmasek ha scritto: > Hi: > > As you probably read on previous messages, NESCent is again having a > Google Summer of Code program > (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009). > > I am serving as the mentor of a project entitled "phyloXML support in > BioRuby" (see: > https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby). > > > In order to ensure a successful outcome (should the project be > accepted), as well as to improve the chances of being accepted, I am > looking for people willing to serve as co-mentors. > > Christian > > > PS: Here is the full description of the project: > > > phyloXML support in BioRuby > > Rationale Evolutionary trees are central to comparative genomics > studies. > Trees used in this context are usually annotated with a variety of > data elements, such as taxonomic information, genome-related data > (gene names, functional annotations) and gene duplication events, as > well as information related to the evolutionary tree itself (branch > lengths, support values). phyloXML is an XML data exchange standard > that can represent this data. Trees in phyloXML format can be > displayed and analyzed with Archaeopteryx > (the successor to ATV > > ), > > which also allows manipulation and navigation of the tree. While > tools exist to convert other formats (such as the widely used Newick > and Nexus formats) to phyloXML, there is currently support for > phyloXML in only one of the open source Bio* projects (in BioPerl > , as a result of > Google's Summer of Code 2008). Approach Build phyloXML support > in the increasingly popular, dynamic, and > fully objected oriented language Ruby. More specifically, extend the > open source BioRuby project to support phyloXML (BioRuby 1.3.0 has > just been released). This will entail (i) the development of objects > to represent all the elements of phyloXML (sequences, taxonomic > data, annotations, etc), (ii) the development of a parser to read in > phyloXML, and (iii) a phyloXML writer. Challenges Relating the > data elements specific to phyloXML to the tree classes > already in BioRuby while maintaining the standards of the BioRuby > project. Development of a time and memory efficient phyloXML parser > (the parser has to be able to process trees with thousands of > external nodes, at least). Involved toolkits or projects BioRuby > , phyloXML Degree > of difficulty and needed skills Medium. Requires experience in an > object oriented programming > language (such as C++, Java, or, ideally, Ruby). Experience in > genomics or a related biological field is also critical. Knowledge > of BioRuby will obviously help, as well as familiarity with XML. > Mentors Christian Zmasek > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From czmasek at burnham.org Wed Apr 1 18:20:23 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Wed, 1 Apr 2009 11:20:23 -0700 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D343E7.1080007@ingm.it> References: <49D2C051.9090002@burnham.org> <49D343E7.1080007@ingm.it> Message-ID: <49D3B067.3050101@burnham.org> Hi Raoul: Thank you for your reply. While your project sounds interesting and much needed, the project I was referring to is to implement phyloXML (www.phyloxml.org) support in BioRuby. Given that the deadline for Google Summer of Code projects is April 3rd, I think it would be almost impossible to propose a new project and find a student for it before this deadline. Sincerely, Christian Raoul JP Bonnal wrote: > Dear Christian, > one of the outcomes from Biohackathon 09, for me, was to implement Phylo > into BioSQL, could be of interest for this proposal ? > > Let me know. > > Christian M Zmasek ha scritto: > > >> Hi: >> >> As you probably read on previous messages, NESCent is again having a >> Google Summer of Code program >> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009). >> >> I am serving as the mentor of a project entitled "phyloXML support in >> BioRuby" (see: >> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby). >> >> >> In order to ensure a successful outcome (should the project be >> accepted), as well as to improve the chances of being accepted, I am >> looking for people willing to serve as co-mentors. >> >> Christian >> >> >> PS: Here is the full description of the project: >> >> >> phyloXML support in BioRuby >> >> Rationale Evolutionary trees are central to comparative genomics >> studies. >> Trees used in this context are usually annotated with a variety of >> data elements, such as taxonomic information, genome-related data >> (gene names, functional annotations) and gene duplication events, as >> well as information related to the evolutionary tree itself (branch >> lengths, support values). phyloXML is an XML data exchange standard >> that can represent this data. Trees in phyloXML format can be >> displayed and analyzed with Archaeopteryx >> (the successor to ATV >> >> ), >> >> which also allows manipulation and navigation of the tree. While >> tools exist to convert other formats (such as the widely used Newick >> and Nexus formats) to phyloXML, there is currently support for >> phyloXML in only one of the open source Bio* projects (in BioPerl >> , as a result of >> Google's Summer of Code 2008). Approach Build phyloXML support >> in the increasingly popular, dynamic, and >> fully objected oriented language Ruby. More specifically, extend the >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >> just been released). This will entail (i) the development of objects >> to represent all the elements of phyloXML (sequences, taxonomic >> data, annotations, etc), (ii) the development of a parser to read in >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the >> data elements specific to phyloXML to the tree classes >> already in BioRuby while maintaining the standards of the BioRuby >> project. Development of a time and memory efficient phyloXML parser >> (the parser has to be able to process trees with thousands of >> external nodes, at least). Involved toolkits or projects BioRuby >> , phyloXML Degree >> of difficulty and needed skills Medium. Requires experience in an >> object oriented programming >> language (such as C++, Java, or, ideally, Ruby). Experience in >> genomics or a related biological field is also critical. Knowledge >> of BioRuby will obviously help, as well as familiarity with XML. >> Mentors Christian Zmasek >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > > From czmasek at burnham.org Wed Apr 1 18:31:12 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Wed, 1 Apr 2009 11:31:12 -0700 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu> References: <49D2C051.9090002@burnham.org> <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu> Message-ID: <49D3B2F0.5030605@burnham.org> Hi, Joel: From my own experience as a co-mentor last year, the time requirement is rather limited. The project I co-mentored was "phyloXML in BioRuby". Since I am knowledgeable about phyloXML I mainly helped out with questions related to it. Of course, for this year's project "phyloXML in BioRuby", it will be the opposite. Since I am the main-mentor I will/am handling all the issues related to the application process/midterm evaluation/final-evaluation/phyloXML. But I would feel more comfortable, if I had one or two co-mentors with significant Ruby and BioRuby experience who could help out if questions specific to these languages arise. Christian PS: The project still needs to be accepted by the Nescent organization. Last year, there were about 15 projects, and only five got accepted (since Google gave Nescent five slots, Nescent had to decide which projects they liked the most). Joel Dudley wrote: > I am interested. What's involved in being a co-mentor? I am one of the > authors of MEGA (http://www.megasoftware.net) and I have been > programming with Ruby daily for about 3 years now. > > -- Joel > > On Mar 31, 2009, at 6:16 PM, Christian M Zmasek wrote: > > >> Hi: >> >> As you probably read on previous messages, NESCent is again having a >> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 >> ). >> >> I am serving as the mentor of a project entitled "phyloXML support >> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 >> #phyloXML_support_in_BioRuby). >> >> In order to ensure a successful outcome (should the project be >> accepted), as well as to improve the chances of being accepted, I am >> looking for people willing to serve as co-mentors. >> >> Christian >> >> >> PS: Here is the full description of the project: >> >> >> phyloXML support in BioRuby >> >> Rationale Evolutionary trees are central to comparative genomics >> studies. >> Trees used in this context are usually annotated with a variety of >> data elements, such as taxonomic information, genome-related data >> (gene names, functional annotations) and gene duplication events, as >> well as information related to the evolutionary tree itself (branch >> lengths, support values). phyloXML is an XML data exchange standard >> that can represent this data. Trees in phyloXML format can be >> displayed and analyzed with Archaeopteryx >> (the successor to ATV >> > >>> ), >>> >> which also allows manipulation and navigation of the tree. While >> tools exist to convert other formats (such as the widely used Newick >> and Nexus formats) to phyloXML, there is currently support for >> phyloXML in only one of the open source Bio* projects (in BioPerl >> , as a result of >> Google's Summer of Code 2008). Approach Build phyloXML support >> in the increasingly popular, dynamic, and >> fully objected oriented language Ruby. More specifically, extend the >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >> just been released). This will entail (i) the development of objects >> to represent all the elements of phyloXML (sequences, taxonomic >> data, annotations, etc), (ii) the development of a parser to read in >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the >> data elements specific to phyloXML to the tree classes >> already in BioRuby while maintaining the standards of the BioRuby >> project. Development of a time and memory efficient phyloXML parser >> (the parser has to be able to process trees with thousands of >> external nodes, at least). Involved toolkits or projects >> BioRuby , phyloXML > www.phyloxml.org> Degree of difficulty and needed skills Medium. >> Requires experience in an object oriented programming >> language (such as C++, Java, or, ideally, Ruby). Experience in >> genomics or a related biological field is also critical. Knowledge >> of BioRuby will obviously help, as well as familiarity with XML. >> Mentors Christian Zmasek >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > =============Email Policy============= > I now only check mail at the following times: > - 8 a.m., Noon, 5 p.m. Mon-Fri > - 8 a.m. Sat-Sun > If this is a true emergency, or you require an > immediate response from me, please contact > me via my mobile phone: (408) 464-8258 > ==================================== > > From ngoto at gen-info.osaka-u.ac.jp Thu Apr 2 05:08:38 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 2 Apr 2009 14:08:38 +0900 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D3B2F0.5030605@burnham.org> References: <49D2C051.9090002@burnham.org> <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu> <49D3B2F0.5030605@burnham.org> Message-ID: <20090402050839.73F151CBC3BD@idnmail.gen-info.osaka-u.ac.jp> Hi Christian, I'll answer any questions about BioRuby as much as I can. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Wed, 1 Apr 2009 11:31:12 -0700 Christian M Zmasek wrote: > Hi, Joel: > > From my own experience as a co-mentor last year, the time requirement > is rather limited. > The project I co-mentored was "phyloXML in BioRuby". Since I am > knowledgeable about phyloXML I mainly helped out with questions related > to it. > > Of course, for this year's project "phyloXML in BioRuby", it will be the > opposite. Since I am the main-mentor I will/am handling all the issues > related to the application process/midterm > evaluation/final-evaluation/phyloXML. But I would feel more comfortable, > if I had one or two co-mentors with significant Ruby and BioRuby > experience who could help out if questions specific to these languages > arise. > > Christian > > PS: The project still needs to be accepted by the Nescent organization. > Last year, there were about 15 projects, and only five got accepted > (since Google gave Nescent five slots, Nescent had to decide which > projects they liked the most). > > > > Joel Dudley wrote: > > I am interested. What's involved in being a co-mentor? I am one of the > > authors of MEGA (http://www.megasoftware.net) and I have been > > programming with Ruby daily for about 3 years now. > > > > -- Joel > > > > On Mar 31, 2009, at 6:16 PM, Christian M Zmasek wrote: > > > > > >> Hi: > >> > >> As you probably read on previous messages, NESCent is again having a > >> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 > >> ). > >> > >> I am serving as the mentor of a project entitled "phyloXML support > >> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 > >> #phyloXML_support_in_BioRuby). > >> > >> In order to ensure a successful outcome (should the project be > >> accepted), as well as to improve the chances of being accepted, I am > >> looking for people willing to serve as co-mentors. > >> > >> Christian > >> > >> > >> PS: Here is the full description of the project: > >> > >> > >> phyloXML support in BioRuby > >> > >> Rationale Evolutionary trees are central to comparative genomics > >> studies. > >> Trees used in this context are usually annotated with a variety of > >> data elements, such as taxonomic information, genome-related data > >> (gene names, functional annotations) and gene duplication events, as > >> well as information related to the evolutionary tree itself (branch > >> lengths, support values). phyloXML is an XML data exchange standard > >> that can represent this data. Trees in phyloXML format can be > >> displayed and analyzed with Archaeopteryx > >> (the successor to ATV > >> >> > >>> ), > >>> > >> which also allows manipulation and navigation of the tree. While > >> tools exist to convert other formats (such as the widely used Newick > >> and Nexus formats) to phyloXML, there is currently support for > >> phyloXML in only one of the open source Bio* projects (in BioPerl > >> , as a result of > >> Google's Summer of Code 2008). Approach Build phyloXML support > >> in the increasingly popular, dynamic, and > >> fully objected oriented language Ruby. More specifically, extend the > >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has > >> just been released). This will entail (i) the development of objects > >> to represent all the elements of phyloXML (sequences, taxonomic > >> data, annotations, etc), (ii) the development of a parser to read in > >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the > >> data elements specific to phyloXML to the tree classes > >> already in BioRuby while maintaining the standards of the BioRuby > >> project. Development of a time and memory efficient phyloXML parser > >> (the parser has to be able to process trees with thousands of > >> external nodes, at least). Involved toolkits or projects > >> BioRuby , phyloXML >> www.phyloxml.org> Degree of difficulty and needed skills Medium. > >> Requires experience in an object oriented programming > >> language (such as C++, Java, or, ideally, Ruby). Experience in > >> genomics or a related biological field is also critical. Knowledge > >> of BioRuby will obviously help, as well as familiarity with XML. > >> Mentors Christian Zmasek > >> > >> > >> _______________________________________________ > >> BioRuby mailing list > >> BioRuby at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioruby > >> From hlapp at gmx.net Thu Apr 2 05:25:43 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 2 Apr 2009 01:25:43 -0400 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D3B2F0.5030605@burnham.org> References: <49D2C051.9090002@burnham.org> <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu> <49D3B2F0.5030605@burnham.org> Message-ID: On Apr 1, 2009, at 2:31 PM, Christian M Zmasek wrote: > But I would feel more comfortable, if I had one or two co-mentors > with significant Ruby and BioRuby experience I would echo this and add that from NESCent's perspective as the mentoring organization having a committed co-mentor as a backup when the primary mentor for whatever reason is temporarily incommunicado is highly desirable. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bonnalraoul at ingm.it Thu Apr 2 08:43:34 2009 From: bonnalraoul at ingm.it (Raoul JP Bonnal) Date: Thu, 02 Apr 2009 10:43:34 +0200 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D3B067.3050101@burnham.org> References: <49D2C051.9090002@burnham.org> <49D343E7.1080007@ingm.it> <49D3B067.3050101@burnham.org> Message-ID: <49D47AB6.8090304@ingm.it> Ok, probably was not so clear, I'm available for co-mentoring if you want. Christian M Zmasek ha scritto: > Hi Raoul: > > Thank you for your reply. > While your project sounds interesting and much needed, the project I > was referring to is to implement phyloXML (www.phyloxml.org) support > in BioRuby. > Given that the deadline for Google Summer of Code projects is April > 3rd, I think it would be almost impossible to propose a new project > and find a student for it before this deadline. > > Sincerely, > > Christian > > > > Raoul JP Bonnal wrote: >> Dear Christian, >> one of the outcomes from Biohackathon 09, for me, was to implement >> Phylo into BioSQL, could be of interest for this proposal ? >> >> Let me know. >> >> Christian M Zmasek ha scritto: >> >> >>> Hi: >>> >>> As you probably read on previous messages, NESCent is again having a >>> Google Summer of Code program >>> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009). >>> >>> I am serving as the mentor of a project entitled "phyloXML support >>> in BioRuby" (see: >>> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby). >>> >>> >>> In order to ensure a successful outcome (should the project be >>> accepted), as well as to improve the chances of being accepted, I am >>> looking for people willing to serve as co-mentors. >>> >>> Christian >>> >>> >>> PS: Here is the full description of the project: >>> >>> >>> phyloXML support in BioRuby >>> >>> Rationale Evolutionary trees are central to comparative genomics >>> studies. >>> Trees used in this context are usually annotated with a variety of >>> data elements, such as taxonomic information, genome-related data >>> (gene names, functional annotations) and gene duplication events, as >>> well as information related to the evolutionary tree itself (branch >>> lengths, support values). phyloXML is an XML data exchange standard >>> that can represent this data. Trees in phyloXML format can be >>> displayed and analyzed with Archaeopteryx >>> (the successor to ATV >>> >>> ), >>> >>> which also allows manipulation and navigation of the tree. While >>> tools exist to convert other formats (such as the widely used Newick >>> and Nexus formats) to phyloXML, there is currently support for >>> phyloXML in only one of the open source Bio* projects (in BioPerl >>> , as a result of >>> Google's Summer of Code 2008). Approach Build phyloXML support >>> in the increasingly popular, dynamic, and >>> fully objected oriented language Ruby. More specifically, extend the >>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >>> just been released). This will entail (i) the development of objects >>> to represent all the elements of phyloXML (sequences, taxonomic >>> data, annotations, etc), (ii) the development of a parser to read in >>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the >>> data elements specific to phyloXML to the tree classes >>> already in BioRuby while maintaining the standards of the BioRuby >>> project. Development of a time and memory efficient phyloXML parser >>> (the parser has to be able to process trees with thousands of >>> external nodes, at least). Involved toolkits or projects >>> BioRuby , phyloXML >>> Degree of difficulty and needed skills >>> Medium. Requires experience in an object oriented programming >>> language (such as C++, Java, or, ideally, Ruby). Experience in >>> genomics or a related biological field is also critical. Knowledge >>> of BioRuby will obviously help, as well as familiarity with XML. >>> Mentors Christian Zmasek >>> >>> >>> _______________________________________________ >>> BioRuby mailing list >>> BioRuby at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioruby >>> >> >> >> > From czmasek at burnham.org Thu Apr 2 23:01:13 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Thu, 2 Apr 2009 16:01:13 -0700 Subject: [BioRuby] [PhyloSoC] Project plan for phyloXML integration with BioRuby In-Reply-To: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com> References: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com> Message-ID: <49D543B9.2050200@burnham.org> Diane: Thank you for your interest in this project! Indeed, the hour is late, and your proposal still needs significant work in order to be competitive. I think you already got some comments from Hilmar (I am at work and I cannot use IRC). Beside those, I'd like to suggest: 1. please make sure that all the studying is done before the coding begins (May 23) (i.e. your "week 1" should be during the "community bonding period"). 2. You do not need to develop classes for objects already present in BioRuby (such as phylogenetic trees). Actually, the less new classes you have to introduce to better -- reuse! 3. I am happy to see that you include unit test early on, this is good! You have to make extensive use of BioRuby's test suite. 4. In general, your weekly goals are not described in enough detail. It might be a good idea to discuss goals, deliverables, anticipated problems/difficulties (and possible solutions) for each week. 5. Documentation is very important(!) 6. Do you plan to maintain the code after the summer? Since the abstract is due April 3rd and cannot be changed after that it is best to concentrate on the abstract first, though. (The project plan can still be tweaked after April 3rd, I understand.) In the abstract you have to make clear that you understand the _biology_ behind the project. Why does phyloXML have the elements it has? Why is it useful? Might have a look at: http://www.phyloxml.org and http://www.tdwg.org/proceedings/article/view/437. Can you show that you understand what evolutionary trees are? Where and why are they used? Are they only important in 'tree of life' applications (e.g. see: http://www.liebertonline.com/doi/pdf/10.1089/omi.2006.10.231) What is "phylogenomics", what is "comparative genomics"? You could also go to [http://monochrome-effect.net/publications.html] and have a quick look at some of the papers there, most are related to the issues at hand, and some show real world applications of phylogenetic trees. How might participating in this project help your career? What do plan to learn? Why are you a good candidate for this? Hope this helps some, CZ Diana Jaunzeikare wrote: > Hi everybody, > > I know this is kinda late and I should have contacted you earlier, but > better later than never. I found out about Phyloinformatics Summer of > Code just last night when I was doing homework for Bioinformatics lab > on Phylogenetic reconstruction and Parsimony. I was reading various > bio-tech related blogs in Google reader and in the Google Top > Recomendations bar I saw the blog of The Tree of Life. There I read > the blog post about Phyloinformatics Summer of Code. This was very > exciting news for me! I almost jumped off the sofa of the excitement > :) Already for two years I wanted to participate in Google summer of > code, but it never really fit with what i was doing at the time. When > I saw the project about integrating phyloXML with BioRuby I knew it > was for me! I am a big fun of Ruby! Last semester for my Computational > Biology seminar I wrote bunch of scripts to deal with PDB database for > my final project. Also it has been pleasure to develop in Ruby on > Rails. What is even more exciting, is that my research interests lie > in Bioinformatics. In fact, I had thoughts before to develop for > BioRuby, but I didn't have a good enough reason before. > > Here is my project plan for building support for phyloXML for BioRuby. > I think the emphasis should be on the ease of use for biologists and a > lot of example code. > > > Week1 : > * Get familiar with BioRuby, its structure, classes (like Bio:Tree), > coding conventions, documentation conventions. See other > implementations of XML parsers in BioRuby (like BLAST XML). > * Get familiar with phyloXML, its structure, typical uses. Get data > set of many different files in phyloXML format for testing. > > Week 2: > * Try to write a program which would use phyloXMl data in order to > understand what would be the easiest way to use it. (Later will be > used for unit testing). > * Desing the architecture of phyloXML class, parser and writer, > interface with other classes (like alignment class). > > Week 3: Develop the some/basic/the most essential objects of phyloXML > (Phylogeny, Clade, Taxonomy). > > Weeks 4-5: Develop phyloXML parser. > > Weeks 6-7: Develop phyloXML writer. > > Week 8: Develop the rest of the objects of phyloXML. > > Week 9: Update parser. > > Week 10: Update writer. > > Week 11: Finish up documentation. Write extensive examples of how to > use the code. > > Week 12: Do write up of the project. > > > What do you think about the project plan? Anything missing? > > Thanks, > > Diana > > > Diana Jaunzeikare > Smith College > Computer Science and Math double major '10 > CS Department Liaison and Master Tutor > email: djaunzei at email.smith.edu > cell: 413-387-2083 From czmasek at burnham.org Fri Apr 3 03:15:41 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Thu, 2 Apr 2009 20:15:41 -0700 Subject: [BioRuby] [PhyloSoC] Project plan for phyloXML integration with BioRuby In-Reply-To: <4057d3bf0904021957o73caa39bu835b949493fcab85@mail.gmail.com> References: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com> <49D543B9.2050200@burnham.org> <4057d3bf0904021957o73caa39bu835b949493fcab85@mail.gmail.com> Message-ID: <49D57F5D.7070703@burnham.org> Hi, Diana: Looks better. I think you need to point out how evolutionary trees are used outside of 'tree of life' applications (i.e. phylogenomics, phylogeography, gene function prediction, ...) as those are import applications for which phyloXML has been designed for. Also, it is not expected that _you_ benchmark various XML parsers. It's good enough to rely on published results. The important point is that you, together with the BioRuby community, determine which one integrates best with BioRuby (i.e. ideally create no additional dependencies) and still provides acceptable performance. CZ Diana Jaunzeikare wrote: > Hi, > > I posted Abstract and new project plan. > > http://socghop.appspot.com/student_proposal/show/google/gsoc2009/dianaj/t123872262150 > > Diana > > On Thu, Apr 2, 2009 at 7:01 PM, Christian M Zmasek > > wrote: > > Diane: > > Thank you for your interest in this project! > > Indeed, the hour is late, and your proposal still needs > significant work in order to be competitive. > > I think you already got some comments from Hilmar (I am at work > and I cannot use IRC). > > Beside those, I'd like to suggest: > > 1. please make sure that all the studying is done before the > coding begins (May 23) (i.e. your "week 1" should be during the > "community bonding period"). > 2. You do not need to develop classes for objects already present > in BioRuby (such as phylogenetic trees). Actually, the less new > classes you have to introduce to better -- reuse! > 3. I am happy to see that you include unit test early on, this is > good! You have to make extensive use of BioRuby's test suite. > 4. In general, your weekly goals are not described in enough > detail. It might be a good idea to discuss goals, deliverables, > anticipated problems/difficulties (and possible solutions) for > each week. > 5. Documentation is very important(!) > 6. Do you plan to maintain the code after the summer? > > Since the abstract is due April 3rd and cannot be changed after > that it is best to concentrate on the abstract first, though. > (The project plan can still be tweaked after April 3rd, I understand.) > > In the abstract you have to make clear that you understand the > _biology_ behind the project. Why does phyloXML have the elements > it has? Why is it useful? Might have a look at: > http://www.phyloxml.org and > http://www.tdwg.org/proceedings/article/view/437. > > Can you show that you understand what evolutionary trees are? > Where and why are they used? > Are they only important in 'tree of life' applications (e.g. see: > http://www.liebertonline.com/doi/pdf/10.1089/omi.2006.10.231) > What is "phylogenomics", what is "comparative genomics"? > You could also go to > [http://monochrome-effect.net/publications.html] and have a quick > look at some of the papers there, most are related to the issues > at hand, and some show real world applications of phylogenetic trees. > > How might participating in this project help your career? What do > plan to learn? Why are you a good candidate for this? > > Hope this helps some, > > CZ > > > Diana Jaunzeikare wrote: > > Hi everybody, > > I know this is kinda late and I should have contacted you > earlier, but better later than never. I found out about > Phyloinformatics Summer of Code just last night when I was > doing homework for Bioinformatics lab on Phylogenetic > reconstruction and Parsimony. I was reading various bio-tech > related blogs in Google reader and in the Google Top > Recomendations bar I saw the blog of The Tree of Life. There I > read the blog post about Phyloinformatics Summer of Code. This > was very exciting news for me! I almost jumped off the sofa of > the excitement :) Already for two years I wanted to > participate in Google summer of code, but it never really fit > with what i was doing at the time. When I saw the project > about integrating phyloXML with BioRuby I knew it was for me! > I am a big fun of Ruby! Last semester for my Computational > Biology seminar I wrote bunch of scripts to deal with PDB > database for my final project. Also it has been pleasure to > develop in Ruby on Rails. What is even more exciting, is that > my research interests lie in Bioinformatics. In fact, I had > thoughts before to develop for BioRuby, but I didn't have a > good enough reason before. > > Here is my project plan for building support for phyloXML for > BioRuby. I think the emphasis should be on the ease of use for > biologists and a lot of example code. > > > Week1 : > * Get familiar with BioRuby, its structure, classes (like > Bio:Tree), coding conventions, documentation conventions. See > other implementations of XML parsers in BioRuby (like BLAST XML). > * Get familiar with phyloXML, its structure, typical uses. > Get data set of many different files in phyloXML format for > testing. > Week 2: > * Try to write a program which would use phyloXMl data in > order to understand what would be the easiest way to use it. > (Later will be used for unit testing). > * Desing the architecture of phyloXML class, parser and > writer, interface with other classes (like alignment class). > Week 3: Develop the some/basic/the most essential objects of > phyloXML (Phylogeny, Clade, Taxonomy). > Weeks 4-5: Develop phyloXML parser. > Weeks 6-7: Develop phyloXML writer. > Week 8: Develop the rest of the objects of phyloXML. > Week 9: Update parser. > > Week 10: Update writer. Week 11: Finish up documentation. > Write extensive examples of how to use the code. > > Week 12: Do write up of the project. > > What do you think about the project plan? Anything missing? > > Thanks, > > Diana > > > Diana Jaunzeikare > Smith College > Computer Science and Math double major '10 > CS Department Liaison and Master Tutor > email: djaunzei at email.smith.edu > > > > cell: 413-387-2083 > > > From kenglish at gmail.com Mon Apr 6 00:13:37 2009 From: kenglish at gmail.com (Kevin English) Date: Sun, 5 Apr 2009 14:13:37 -1000 Subject: [BioRuby] Blast with file as a query option? Message-ID: Hello, I have to very large local fasta files that wish to blast against one another and parse the results in bio ruby. I'm wondering if there is a way to mimic the behavior of this blast command: blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2 where Large_list_sequences_2 is a formatted fasta db. My current implementation opens Large_list_sequences_1.fasta and goes through it sequence by sequence. It seems to run pretty slow. I'm wondering if I can in some way do the above blast command and loop through the results and get a performance gain. For any curious, my code is on github: http://github.com/kenglishhi/bioflexrails/tree/master The file that is doing the blasts is under app/model/biodatabase.rb. I'm trying to write a rails app uses biosql db and allows this biologist to organize his sequences. I'm very new to bioinformatics but have a lot experience with Ruby on Rails. Thanks in advance for you help. Kevin From ngoto at gen-info.osaka-u.ac.jp Tue Apr 7 04:03:03 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 7 Apr 2009 13:03:03 +0900 Subject: [BioRuby] Blast with file as a query option? In-Reply-To: References: Message-ID: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp> Hi, On Sun, 5 Apr 2009 14:13:37 -1000 Kevin English wrote: > Hello, > I have to very large local fasta files that wish to blast against one > another and parse the results in bio ruby. I'm wondering if there is a way > to mimic the behavior of this blast command: > > blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2 > > > where Large_list_sequences_2 is a formatted fasta db. My current > implementation opens Large_list_sequences_1.fasta and goes through it > sequence by sequence. It seems to run pretty slow. I'm wondering if I can in > some way do the above blast command and loop through the results and get a > performance gain. To gain performance, adding options to BLAST is strongly recommended. -e Expectation value (E) [Real] default = 10.0 -v Number of database sequences to show one-line descriptions for (V) [Integer] default = 500 -b Number of database sequence to show alignments for (B) [Integer] default = 250 Changing above to smaller values will reduce output report size which means performance gain. Executing BLAST with multiple query sequences can also gain performance. In addition, when you have query sequences in a local file, calling blastall command directly without Bio::Blast may be good. For example, require 'bio' require 'tempfile' command = %( blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 ) tempfile = Tempfile.new('blastout') tempfile.close(false) command = command + [ "-o", tempfile.path ] system(*command) # After system(), error checks will be needed but skipped. tempfile.open ff = Bio::FlatFile.open(tempfile) ff.each do |report| # For example, prints query_def and target_def report.each do |hit| print report.query_def, "\t", hit.target_def, "\n" end end ff.close tempfile.close(true) > For any curious, my code is on github: > > http://github.com/kenglishhi/bioflexrails/tree/master > > The file that is doing the blasts is under app/model/biodatabase.rb. > > I'm trying to write a rails app uses biosql db and allows this biologist to > organize his sequences. I'm very new to bioinformatics but have a lot > experience with Ruby on Rails. > > Thanks in advance for you help. In general, a BLAST search against a very large database takes very long time, and using batch queueing system might be needed. Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From donttrustben at gmail.com Tue Apr 7 04:30:09 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 7 Apr 2009 14:30:09 +1000 Subject: [BioRuby] Blast with file as a query option? In-Reply-To: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp> References: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp> Message-ID: And there is the -a flag, for specifying you want to use multiple CPUs. ben 2009/4/7 Naohisa GOTO > Hi, > > On Sun, 5 Apr 2009 14:13:37 -1000 > Kevin English wrote: > > > Hello, > > I have to very large local fasta files that wish to blast against one > > another and parse the results in bio ruby. I'm wondering if there is a > way > > to mimic the behavior of this blast command: > > > > blastall -p blastn -i Large_list_sequences_1.fasta -d > Large_list_sequences_2 > > > > > > where Large_list_sequences_2 is a formatted fasta db. My current > > implementation opens Large_list_sequences_1.fasta and goes through it > > sequence by sequence. It seems to run pretty slow. I'm wondering if I can > in > > some way do the above blast command and loop through the results and get > a > > performance gain. > > To gain performance, adding options to BLAST is strongly recommended. > -e Expectation value (E) [Real] > default = 10.0 > -v Number of database sequences to show one-line descriptions for (V) > [Integer] > default = 500 > -b Number of database sequence to show alignments for (B) [Integer] > default = 250 > > Changing above to smaller values will reduce output report size > which means performance gain. > > Executing BLAST with multiple query sequences can also gain performance. > In addition, when you have query sequences in a local file, calling > blastall command directly without Bio::Blast may be good. > > For example, > > require 'bio' > require 'tempfile' > > command = %( blastall -p blastn -i Large_list_sequences_1.fasta > -d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 ) > tempfile = Tempfile.new('blastout') > tempfile.close(false) > command = command + [ "-o", tempfile.path ] > system(*command) > # After system(), error checks will be needed but skipped. > tempfile.open > ff = Bio::FlatFile.open(tempfile) > ff.each do |report| > # For example, prints query_def and target_def > report.each do |hit| > print report.query_def, "\t", hit.target_def, "\n" > end > end > ff.close > tempfile.close(true) > > > For any curious, my code is on github: > > > > http://github.com/kenglishhi/bioflexrails/tree/master > > > > The file that is doing the blasts is under app/model/biodatabase.rb. > > > > I'm trying to write a rails app uses biosql db and allows this biologist > to > > organize his sequences. I'm very new to bioinformatics but have a lot > > experience with Ruby on Rails. > > > > Thanks in advance for you help. > > In general, a BLAST search against a very large database takes > very long time, and using batch queueing system might be needed. > > Thanks, > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From davide.rambaldi at ifom-ieo-campus.it Tue Apr 7 09:05:48 2009 From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi) Date: Tue, 7 Apr 2009 11:05:48 +0200 Subject: [BioRuby] bioruby XML and libruby1.8 (ubuntu/debian) Message-ID: Hi, I got this ugly bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502535 Reported also by Naohisa Goto. There is any solution provided by ubuntu/debian? Or is better to recompile ruby from source? thanks. Davide Rambaldi, Bioinformatics PhD student. ----------------------------------------------------- Bioinformatic Group IFOM-IEO Campus Via Adamello 16, Milano I-20139 Italy [t] +39 02574303 066 [e] davide.rambaldi at ifom-ieo-campus.it [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi (homepage) [i] http://www.semm.it (PhD school) [i] http://www.btbs.unimib.it/ (Master) ----------------------------------------------------- From davide.rambaldi at ifom-ieo-campus.it Tue Apr 7 09:09:46 2009 From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi) Date: Tue, 7 Apr 2009 11:09:46 +0200 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49D2C051.9090002@burnham.org> References: <49D2C051.9090002@burnham.org> Message-ID: <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it> Hi, I am available to help is there is need. I have a small experience with bioruby but I am really interested in partecipate in a project with bioruby. May be I can help in testing the code? Davide Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto: > Hi: > > As you probably read on previous messages, NESCent is again having a > Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 > ). > > I am serving as the mentor of a project entitled "phyloXML support > in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 > #phyloXML_support_in_BioRuby). > > In order to ensure a successful outcome (should the project be > accepted), as well as to improve the chances of being accepted, I am > looking for people willing to serve as co-mentors. > > Christian > > > PS: Here is the full description of the project: > > > phyloXML support in BioRuby > > Rationale Evolutionary trees are central to comparative genomics > studies. > Trees used in this context are usually annotated with a variety of > data elements, such as taxonomic information, genome-related data > (gene names, functional annotations) and gene duplication events, as > well as information related to the evolutionary tree itself (branch > lengths, support values). phyloXML is an XML data exchange standard > that can represent this data. Trees in phyloXML format can be > displayed and analyzed with Archaeopteryx > (the successor to ATV > >), > which also allows manipulation and navigation of the tree. While > tools exist to convert other formats (such as the widely used Newick > and Nexus formats) to phyloXML, there is currently support for > phyloXML in only one of the open source Bio* projects (in BioPerl > , as a result of > Google's Summer of Code 2008). Approach Build phyloXML support > in the increasingly popular, dynamic, and > fully objected oriented language Ruby. More specifically, extend the > open source BioRuby project to support phyloXML (BioRuby 1.3.0 has > just been released). This will entail (i) the development of objects > to represent all the elements of phyloXML (sequences, taxonomic > data, annotations, etc), (ii) the development of a parser to read in > phyloXML, and (iii) a phyloXML writer. Challenges Relating the > data elements specific to phyloXML to the tree classes > already in BioRuby while maintaining the standards of the BioRuby > project. Development of a time and memory efficient phyloXML parser > (the parser has to be able to process trees with thousands of > external nodes, at least). Involved toolkits or projects BioRuby > , phyloXML Degree > of difficulty and needed skills Medium. Requires experience in an > object oriented programming > language (such as C++, Java, or, ideally, Ruby). Experience in > genomics or a related biological field is also critical. Knowledge > of BioRuby will obviously help, as well as familiarity with XML. > Mentors Christian Zmasek > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby Davide Rambaldi, Bioinformatics PhD student. ----------------------------------------------------- Bioinformatic Group IFOM-IEO Campus Via Adamello 16, Milano I-20139 Italy [t] +39 02574303 066 [e] davide.rambaldi at ifom-ieo-campus.it [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi (homepage) [i] http://www.semm.it (PhD school) [i] http://www.btbs.unimib.it/ (Master) ----------------------------------------------------- From ngoto at gen-info.osaka-u.ac.jp Tue Apr 7 11:06:57 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 7 Apr 2009 20:06:57 +0900 Subject: [BioRuby] bioruby XML and libruby1.8 (ubuntu/debian) In-Reply-To: References: Message-ID: <20090407110659.0CD521CBC3F4@idnmail.gen-info.osaka-u.ac.jp> Hi, For Debian, the regression have been fixed with DSA-1695-1. http://www.debian.org/security/2009/dsa-1695 Normally, aptitude update && aptitude upgrade will fix the problem. I know nothing about Ubuntu. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Tue, 7 Apr 2009 11:05:48 +0200 Davide Rambaldi wrote: > Hi, I got this ugly bug: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502535 > > Reported also by Naohisa Goto. > > There is any solution provided by ubuntu/debian? > > Or is better to recompile ruby from source? > > thanks. > > Davide Rambaldi, > Bioinformatics PhD student. > ----------------------------------------------------- > Bioinformatic Group IFOM-IEO Campus > Via Adamello 16, Milano > I-20139 Italy > > [t] +39 02574303 066 > [e] davide.rambaldi at ifom-ieo-campus.it > [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi > (homepage) > [i] http://www.semm.it (PhD school) > [i] http://www.btbs.unimib.it/ (Master) > > ----------------------------------------------------- > > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From czmasek at burnham.org Wed Apr 8 23:14:35 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Wed, 8 Apr 2009 16:14:35 -0700 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it> References: <49D2C051.9090002@burnham.org> <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it> Message-ID: <49DD2FDB.4080909@burnham.org> Hi, Davide: Yes, help in testing code is always very welcome! Hopefully the project will get accepted! Christian Davide Rambaldi wrote: > Hi, I am available to help is there is need. > > I have a small experience with bioruby but I am really interested in > partecipate in a project with bioruby. > > May be I can help in testing the code? > > Davide > > > Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto: > > >> Hi: >> >> As you probably read on previous messages, NESCent is again having a >> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 >> ). >> >> I am serving as the mentor of a project entitled "phyloXML support >> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 >> #phyloXML_support_in_BioRuby). >> >> In order to ensure a successful outcome (should the project be >> accepted), as well as to improve the chances of being accepted, I am >> looking for people willing to serve as co-mentors. >> >> Christian >> >> >> PS: Here is the full description of the project: >> >> >> phyloXML support in BioRuby >> >> Rationale Evolutionary trees are central to comparative genomics >> studies. >> Trees used in this context are usually annotated with a variety of >> data elements, such as taxonomic information, genome-related data >> (gene names, functional annotations) and gene duplication events, as >> well as information related to the evolutionary tree itself (branch >> lengths, support values). phyloXML is an XML data exchange standard >> that can represent this data. Trees in phyloXML format can be >> displayed and analyzed with Archaeopteryx >> (the successor to ATV >> > >>> ), >>> >> which also allows manipulation and navigation of the tree. While >> tools exist to convert other formats (such as the widely used Newick >> and Nexus formats) to phyloXML, there is currently support for >> phyloXML in only one of the open source Bio* projects (in BioPerl >> , as a result of >> Google's Summer of Code 2008). Approach Build phyloXML support >> in the increasingly popular, dynamic, and >> fully objected oriented language Ruby. More specifically, extend the >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >> just been released). This will entail (i) the development of objects >> to represent all the elements of phyloXML (sequences, taxonomic >> data, annotations, etc), (ii) the development of a parser to read in >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the >> data elements specific to phyloXML to the tree classes >> already in BioRuby while maintaining the standards of the BioRuby >> project. Development of a time and memory efficient phyloXML parser >> (the parser has to be able to process trees with thousands of >> external nodes, at least). Involved toolkits or projects >> BioRuby , phyloXML > www.phyloxml.org> Degree of difficulty and needed skills Medium. >> Requires experience in an object oriented programming >> language (such as C++, Java, or, ideally, Ruby). Experience in >> genomics or a related biological field is also critical. Knowledge >> of BioRuby will obviously help, as well as familiarity with XML. >> Mentors Christian Zmasek >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > Davide Rambaldi, > Bioinformatics PhD student. > ----------------------------------------------------- > Bioinformatic Group IFOM-IEO Campus > Via Adamello 16, Milano > I-20139 Italy > > [t] +39 02574303 066 > [e] davide.rambaldi at ifom-ieo-campus.it > [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi > (homepage) > [i] http://www.semm.it (PhD school) > [i] http://www.btbs.unimib.it/ (Master) > > ----------------------------------------------------- > > > > > From ixpata at gmail.com Wed Apr 8 23:23:47 2009 From: ixpata at gmail.com (S M) Date: Wed, 8 Apr 2009 19:23:47 -0400 Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors In-Reply-To: <49DD2FDB.4080909@burnham.org> References: <49D2C051.9090002@burnham.org> <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it> <49DD2FDB.4080909@burnham.org> Message-ID: I'd be interested in helping test too or in any other way I could help. I'm not experienced enough to program but maybe I can help on other ways. Sent from my iPhone On Apr 8, 2009, at 7:14 PM, Christian M Zmasek wrote: > Hi, Davide: > > Yes, help in testing code is always very welcome! > > Hopefully the project will get accepted! > > Christian > > > Davide Rambaldi wrote: >> Hi, I am available to help is there is need. >> >> I have a small experience with bioruby but I am really interested >> in partecipate in a project with bioruby. >> >> May be I can help in testing the code? >> >> Davide >> >> >> Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto: >> >> >>> Hi: >>> >>> As you probably read on previous messages, NESCent is again having >>> a Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 >>> ). >>> >>> I am serving as the mentor of a project entitled "phyloXML >>> support in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009 >>> #phyloXML_support_in_BioRuby). >>> >>> In order to ensure a successful outcome (should the project be >>> accepted), as well as to improve the chances of being accepted, I >>> am looking for people willing to serve as co-mentors. >>> >>> Christian >>> >>> >>> PS: Here is the full description of the project: >>> >>> >>> phyloXML support in BioRuby >>> >>> Rationale Evolutionary trees are central to comparative >>> genomics studies. >>> Trees used in this context are usually annotated with a variety of >>> data elements, such as taxonomic information, genome-related data >>> (gene names, functional annotations) and gene duplication events, >>> as >>> well as information related to the evolutionary tree itself (branch >>> lengths, support values). phyloXML is an XML data exchange standard >>> that can represent this data. Trees in phyloXML format can be >>> displayed and analyzed with Archaeopteryx >>> (the successor to ATV >>> >>> ), >>>> >>> which also allows manipulation and navigation of the tree. While >>> tools exist to convert other formats (such as the widely used >>> Newick >>> and Nexus formats) to phyloXML, there is currently support for >>> phyloXML in only one of the open source Bio* projects (in BioPerl >>> , as a result of >>> Google's Summer of Code 2008). Approach Build phyloXML >>> support in the increasingly popular, dynamic, and >>> fully objected oriented language Ruby. More specifically, extend >>> the >>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has >>> just been released). This will entail (i) the development of >>> objects >>> to represent all the elements of phyloXML (sequences, taxonomic >>> data, annotations, etc), (ii) the development of a parser to read >>> in >>> phyloXML, and (iii) a phyloXML writer. Challenges Relating >>> the data elements specific to phyloXML to the tree classes >>> already in BioRuby while maintaining the standards of the BioRuby >>> project. Development of a time and memory efficient phyloXML parser >>> (the parser has to be able to process trees with thousands of >>> external nodes, at least). Involved toolkits or projects >>> BioRuby , phyloXML >> > Degree of difficulty and needed skills Medium. Requires >>> experience in an object oriented programming >>> language (such as C++, Java, or, ideally, Ruby). Experience in >>> genomics or a related biological field is also critical. Knowledge >>> of BioRuby will obviously help, as well as familiarity with XML. >>> Mentors Christian Zmasek >>> >>> >>> _______________________________________________ >>> BioRuby mailing list >>> BioRuby at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioruby >>> >> >> Davide Rambaldi, >> Bioinformatics PhD student. >> ----------------------------------------------------- >> Bioinformatic Group IFOM-IEO Campus >> Via Adamello 16, Milano >> I-20139 Italy >> >> [t] +39 02574303 066 >> [e] davide.rambaldi at ifom-ieo-campus.it >> [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/ >> DavideRambaldi (homepage) >> [i] http://www.semm.it (PhD school) >> [i] http://www.btbs.unimib.it/ (Master) >> >> ----------------------------------------------------- >> >> >> >> >> > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From mkikkawa at gmail.com Fri Apr 10 07:20:19 2009 From: mkikkawa at gmail.com (Masahide Kikkawa) Date: Fri, 10 Apr 2009 16:20:19 +0900 Subject: [BioRuby] Bio::PubMed.efetch, bug? Message-ID: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com> Hi, I recently installed bioruby-1.3.0 and encountered a bug. Here is what I did result = Bio::PubMed.efetch([123,456,789]) I expected the result to be Array of MEDLINE formatted String as documented here: http://bioruby.org/rdoc/classes/Bio/PubMed.html However, the return value was one String containing all the results. In the bioruby-1.2.1, it does return Array of MEDLINE formatted String. Therefore I would recommend to apply following patch to ncbirest.rb < < result.strip! < if opts["retmode"] == "text" < result = result.split(/\n\n+/) < end < < return result --- > return result.strip > #return result.strip.split(/\n\n+/) --------------------------------------------------------------- Masahide Kikkawa, M.D., Ph. D. Professor Structural Biology Graduate School of Science Kyoto University Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502 JAPAN http://structure.biophys.kyoto-u.ac.jp/ Tel: +81-75-753-9421 FAX: +81-75-753-4218 --------------------------------------------------------------- From jan.aerts at gmail.com Fri Apr 10 07:30:32 2009 From: jan.aerts at gmail.com (Jan Aerts) Date: Fri, 10 Apr 2009 08:30:32 +0100 Subject: [BioRuby] searching through large sets of features Message-ID: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com> Hi all, I need to be able to handle millions of features for a visualization and therefore be able to search through them efficiently. I've been made aware of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data structure that could do just that. One of the issues that this approach would solve is that it doesn't need to load every single feature when the display resolution is not high enough (no use trying to plot 5 million points next to each other on a screen of 800 pixels wide; you can only plot 800 points....) I'm now trying to write a ruby-implementation of an R-Tree that is more fit for loci (i.e. one less dimension than an R-Tree, and has to handle chromosome as well) at http://github.com/jandot/locustree. I'm trying to make it a module that can be mixed in, so anything which has a chromosome, start and stop can be indexed and searched using this R-Tree adaptation. Does anyone know a ruby-implementation of R-Tree? Or is anyone interested in helping me out? Thanks, jan. From mail at michaelbarton.me.uk Fri Apr 10 13:33:30 2009 From: mail at michaelbarton.me.uk (Michael Barton) Date: Fri, 10 Apr 2009 14:33:30 +0100 Subject: [BioRuby] searching through large sets of features In-Reply-To: References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com> Message-ID: There's also some ruby searching algorithms here http://algorithms.rubyforge.org/ I'm happy to try and help if I can Cheers Mike 2009/4/10 Michael Barton > Would a bloom tree fit this problem? > http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/ > > 2009/4/10 Jan Aerts > > Hi all, >> I need to be able to handle millions of features for a visualization and >> therefore be able to search through them efficiently. I've been made aware >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data >> structure >> that could do just that. One of the issues that this approach would solve >> is >> that it doesn't need to load every single feature when the display >> resolution is not high enough (no use trying to plot 5 million points next >> to each other on a screen of 800 pixels wide; you can only plot 800 >> points....) >> >> I'm now trying to write a ruby-implementation of an R-Tree that is more >> fit >> for loci (i.e. one less dimension than an R-Tree, and has to handle >> chromosome as well) at http://github.com/jandot/locustree. I'm trying to >> make it a module that can be mixed in, so anything which has a chromosome, >> start and stop can be indexed and searched using this R-Tree adaptation. >> >> Does anyone know a ruby-implementation of R-Tree? Or is anyone interested >> in >> helping me out? >> >> Thanks, >> jan. >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > From jan.aerts at gmail.com Fri Apr 10 13:57:20 2009 From: jan.aerts at gmail.com (Jan Aerts) Date: Fri, 10 Apr 2009 14:57:20 +0100 Subject: [BioRuby] searching through large sets of features In-Reply-To: References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com> Message-ID: <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com> That is a _really_ good resource. Unfortunately, R-Trees are not in there (or I didn't look well enough). I was thinking of sending him an email to ask if he could quickly add that to his library :-), but saw that he's switching to python. (But that particular announcement was on the 1st of April, so let's hope it's a joke) j. 2009/4/10 Michael Barton > There's also some ruby searching algorithms here > http://algorithms.rubyforge.org/ > > I'm happy to try and help if I can > > Cheers > > Mike > > 2009/4/10 Michael Barton > > > Would a bloom tree fit this problem? > > > http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/ > > > > 2009/4/10 Jan Aerts > > > > Hi all, > >> I need to be able to handle millions of features for a visualization and > >> therefore be able to search through them efficiently. I've been made > aware > >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data > >> structure > >> that could do just that. One of the issues that this approach would > solve > >> is > >> that it doesn't need to load every single feature when the display > >> resolution is not high enough (no use trying to plot 5 million points > next > >> to each other on a screen of 800 pixels wide; you can only plot 800 > >> points....) > >> > >> I'm now trying to write a ruby-implementation of an R-Tree that is more > >> fit > >> for loci (i.e. one less dimension than an R-Tree, and has to handle > >> chromosome as well) at http://github.com/jandot/locustree. I'm trying > to > >> make it a module that can be mixed in, so anything which has a > chromosome, > >> start and stop can be indexed and searched using this R-Tree adaptation. > >> > >> Does anyone know a ruby-implementation of R-Tree? Or is anyone > interested > >> in > >> helping me out? > >> > >> Thanks, > >> jan. > >> _______________________________________________ > >> BioRuby mailing list > >> BioRuby at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioruby > >> > > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From mail at michaelbarton.me.uk Fri Apr 10 14:51:10 2009 From: mail at michaelbarton.me.uk (Michael Barton) Date: Fri, 10 Apr 2009 15:51:10 +0100 Subject: [BioRuby] searching through large sets of features In-Reply-To: <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com> References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com> <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com> Message-ID: You could use the kd-tree to narrow the search to the 1000 features each side of the middle point, then traverse this set to exclude those outside of the visualisation range. It's a bit of hack but it could save creating an R-tree if there's no existing implementation. 2009/4/10 Jan Aerts > That is a _really_ good resource. Unfortunately, R-Trees are not in there > (or I didn't look well enough). I was thinking of sending him an email to > ask if he could quickly add that to his library :-), but saw that he's > switching to python. (But that particular announcement was on the 1st of > April, so let's hope it's a joke) > j. > > > 2009/4/10 Michael Barton > >> There's also some ruby searching algorithms here >> http://algorithms.rubyforge.org/ >> >> I'm happy to try and help if I can >> >> Cheers >> >> Mike >> >> 2009/4/10 Michael Barton >> >> > Would a bloom tree fit this problem? >> > >> http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/ >> > >> > 2009/4/10 Jan Aerts >> > >> > Hi all, >> >> I need to be able to handle millions of features for a visualization >> and >> >> therefore be able to search through them efficiently. I've been made >> aware >> >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data >> >> structure >> >> that could do just that. One of the issues that this approach would >> solve >> >> is >> >> that it doesn't need to load every single feature when the display >> >> resolution is not high enough (no use trying to plot 5 million points >> next >> >> to each other on a screen of 800 pixels wide; you can only plot 800 >> >> points....) >> >> >> >> I'm now trying to write a ruby-implementation of an R-Tree that is more >> >> fit >> >> for loci (i.e. one less dimension than an R-Tree, and has to handle >> >> chromosome as well) at http://github.com/jandot/locustree. I'm trying >> to >> >> make it a module that can be mixed in, so anything which has a >> chromosome, >> >> start and stop can be indexed and searched using this R-Tree >> adaptation. >> >> >> >> Does anyone know a ruby-implementation of R-Tree? Or is anyone >> interested >> >> in >> >> helping me out? >> >> >> >> Thanks, >> >> jan. >> >> _______________________________________________ >> >> BioRuby mailing list >> >> BioRuby at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioruby >> >> >> > >> > >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 13 04:16:00 2009 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Mon, 13 Apr 2009 13:16:00 +0900 Subject: [BioRuby] Parsing MSF alignment file Message-ID: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp> I tried to parse an alignment file given in the MSF format by using Bio::GCG::Msf. It turned out though that points/dots (.) were used as a gap character in the alignment and that can't be handled by Bio::GCG::Msf. So, for what it's worth, I made these changes to bio/appl/gcg/msf.rb: $ diff msf.rb.old msf.rb.new 33,35c33,36 < if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ str[/.*/] then < @heading = str[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this < str.sub!(/.*/, '') --- > preamble, at data = str.split(/^\/\/$/) > if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ preamble[/.*/] then > @heading = preamble[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this > preamble.sub!(/.*/, '') 37c38 < str.sub!(/.*\.\.$/m, '') --- > preamble.sub!(/.*\.\.$/m, '') 48,49d48 < str.sub!(/.*\/\/$/m, '') < a = $&.to_s.split(/^/) 51c50 < a.each do |x| --- > preamble.split(/^/).each do |x| 59d57 < @data = str Best regards, Fredrik Johansson -- *********************************** Fredrik Johansson, grad. student Division of Bioinformatics Medical Institute of Bioregulation Kyushu University 3-1-1 Maidashi, Higashi-ku Fukuoka 812-8582, Japan fredjoha at bioreg.kyushu-u.ac.jp *********************************** From ngoto at gen-info.osaka-u.ac.jp Mon Apr 13 12:54:54 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Mon, 13 Apr 2009 21:54:54 +0900 Subject: [BioRuby] Parsing MSF alignment file In-Reply-To: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp> References: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp> Message-ID: <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp> Hi, On Mon, 13 Apr 2009 13:16:00 +0900 Fredrik Johansson wrote: > I tried to parse an alignment file given in the MSF format by using > Bio::GCG::Msf. It turned out though that points/dots (.) were used as a > gap character in the alignment and that can't be handled by > Bio::GCG::Msf. So, for what it's worth, I made these changes to > bio/appl/gcg/msf.rb: Thank you very much. Do you mean that Bio::GCG::Msf fails to parse an alignment that contain two contiguous gaps (..) in the end of a line? This is a bug, and the patch will soon be applied to the git repository. > $ diff msf.rb.old msf.rb.new In the next time, please use "diff -u" (unified context format). Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 13 13:19:27 2009 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Mon, 13 Apr 2009 22:19:27 +0900 Subject: [BioRuby] Parsing MSF alignment file In-Reply-To: <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp> References: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp> <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <49E33BDF.70200@bioreg.kyushu-u.ac.jp> Yes, that's what happened. A regular expression matched all the way to the last occurrence of two dots at the end of a line. > Thank you very much. Do you mean that Bio::GCG::Msf fails to > parse an alignment that contain two contiguous gaps (..) in the > end of a line? This is a bug, and the patch will soon be > applied to the git repository. > > >> $ diff msf.rb.old msf.rb.new >> > > In the next time, please use "diff -u" (unified context format). > > Ok, I'll attach the output of diff -u here: --- /usr/lib/ruby/gems/1.8/gems/bio-1.3.0/lib/bio/appl/gcg/msf.rb.old 2009-04-13 11:32:53.000000000 +0900 +++ /usr/lib/ruby/gems/1.8/gems/bio-1.3.0/lib/bio/appl/gcg/msf.rb 2009-04-13 13:36:26.000000000 +0900 @@ -30,11 +30,12 @@ # Creates a new Msf object. def initialize(str) str = str.sub(/\A[\r\n]+/, '') - if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ str[/.*/] then - @heading = str[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this - str.sub!(/.*/, '') + preamble, at data = str.split(/^\/\/$/) + if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ preamble[/.*/] then + @heading = preamble[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this + preamble.sub!(/.*/, '') end - str.sub!(/.*\.\.$/m, '') + preamble.sub!(/.*\.\.$/m, '') @description = $&.to_s.sub(/^.*\.\.$/, '').to_s d = $&.to_s if m = /(.+)\s+MSF\:\s+(\d+)\s+Type\:\s+(\w)\s+(.+)\s+(Comp)?Check\:\s+(\d+)/.match(d) then @@ -45,10 +46,8 @@ @checksum = (m[6] ? m[6].to_i : nil) end - str.sub!(/.*\/\/$/m, '') - a = $&.to_s.split(/^/) @seq_info = [] - a.each do |x| + preamble.split(/^/).each do |x| if /Name\: / =~ x then s = {} x.scan(/(\S+)\: +(\S*)/) { |y| s[$1] = $2 } @@ -56,7 +55,6 @@ end end - @data = str @description.sub!(/\A(\r\n|\r|\n)/, '') @align = nil end From andrew.j.grimm at gmail.com Tue Apr 14 09:30:16 2009 From: andrew.j.grimm at gmail.com (Andrew Grimm) Date: Tue, 14 Apr 2009 19:30:16 +1000 Subject: [BioRuby] Status of bioruby with Debian packages Message-ID: According to http://packages.qa.debian.org/b/bioruby.html , the latest bioruby available as a Debian package is 1.1.0 or 1.1.0-1. Are there plans to update bioruby's debian package? This isn't an major issue for me - our ubuntu system already has bioruby installed for ruby 1.8, and I was merely hoping to use ruby 1.9 merely for faster performance. If a debian update isn't likely, then we might try using rubygems on ubuntu and hope that the package management police don't arrest us! Andrew PS: Is there anything better than just plain web search for searching bioruby's mailing list archives? From davide.rambaldi at ifom-ieo-campus.it Wed Apr 15 09:24:03 2009 From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi) Date: Wed, 15 Apr 2009 11:24:03 +0200 Subject: [BioRuby] ACE assembly format Message-ID: <7d40e94d36b1af5fd01fb9a908512aa2@ifom-ieo-campus.it> Hi all, there is in bioruby a parser for the .ace format? Something like Bio::Assembly::IO::ace in BioPerl? Many thanks. Davide Rambaldi -- [- Il Tuo 5 per Mille a favore della Ricerca dell'Istituto Europeo di Oncologia Tutti coloro che presentano il modello Unico, il modello 730 o pi? semplicemente che ricevono dal proprio datore di lavoro il modello CUD, hanno la facolt? di scegliere la destinazione del proprio 5 per mille. Nella casella riservata al Finanziamento agli Enti della Ricerca Sanitaria inserisci il codice fiscale dello IEO (08 69 14 40 153) ed apponi la Tua firma. Il Tuo 5 per Mille verr? destinato alla ricerca contro il cancro dell'Istituto Europeo di Oncologia NON COSTA NULLA E NON COMPORTA ALCUN AUMENTO DELLE IMPOSTE DA VERSARE NON ? UN'ALTERNATIVA ALL'8 PER MILLE E' UN GESTO CONCRETO E DI GRANDE VALORE Per saperne di pi? vai al sito dello IEO www.ieo.it < http://www.ieo.it > oppure scrivi a fondazione at ieo.it < mailto:fondazione at ieo.it> Segnala ad un amico questa opportunit? -] From ngoto at gen-info.osaka-u.ac.jp Wed Apr 15 11:04:00 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 15 Apr 2009 20:04:00 +0900 Subject: [BioRuby] Status of bioruby with Debian packages In-Reply-To: References: Message-ID: <20090415110401.744641CBC544@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 14 Apr 2009 19:30:16 +1000 Andrew Grimm wrote: > According to http://packages.qa.debian.org/b/bioruby.html , the latest > bioruby available as a Debian package is 1.1.0 or 1.1.0-1. > > Are there plans to update bioruby's debian package? I don't know, but I'm also a debian user, and I've sent a wishlist to the Debian BTS. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=524180 In general, Debian specific issues should be asked to Debian mailing lists, Bug Tracking System, and/or package maintainers. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 20 03:35:27 2009 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Mon, 20 Apr 2009 12:35:27 +0900 Subject: [BioRuby] PubMed.efetch error Message-ID: <49EBED7F.4010509@bioreg.kyushu-u.ac.jp> PubMed.efetch does not give me correct output if I give an array as argument. I think it is supposed to return an array of MEDLINE entries, but it returns all entries in one string. I can get an array if I type Bio::PubMed.efetch(entries).split("\n\n") But that can't be the intended behaviour, right? Regards, Fredrik Johansson -- *********************************** Fredrik Johansson, grad. student Division of Bioinformatics Medical Institute of Bioregulation Kyushu University 3-1-1 Maidashi, Higashi-ku Fukuoka 812-8582, Japan fredjoha at bioreg.kyushu-u.ac.jp *********************************** From tomoakin at kenroku.kanazawa-u.ac.jp Tue Apr 21 08:29:26 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Tue, 21 Apr 2009 17:29:26 +0900 Subject: [BioRuby] Read/write of simple fasta file increases the identifier. Message-ID: Hi, With bioruby-1.3.0, reading a fasta file and convert to Bio::Sequence with to_seq and then write with output(:fasta) causes the definition being longer each time the file is processed... Is there a better interface to keep the definition line as the original? Perhaps this is because, in some format there are cleary separate entry_id and definition, while it is ambiguous in FASTA format. However, it is better to be able to easily recover the original definition line. The following is the simple example case. Surely this is quite simple that the entry need not converted to Bio::Sequence, but in fact I would like to manipulate the sequence such as getting subsequence or adding some sequence, translate, etc... % cat > simple_fasta >abc acgttgac % cat fastacat #!/usr/local/bin/ruby require 'bio' ff = Bio::FlatFile.open(Bio::FastaFormat, ARGF) while fe = ff.next_entry seq = fe.to_seq; puts seq.output(:fasta) end % ruby fastacat simple_fasta >abc abc acgttgac % ruby fastacat simple_fasta | ruby fastacat >abc abc abc acgttgac Sincerely yours, -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan From ngoto at gen-info.osaka-u.ac.jp Tue Apr 21 14:17:26 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 21 Apr 2009 23:17:26 +0900 Subject: [BioRuby] Bio::PubMed.efetch, bug? In-Reply-To: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com> References: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com> Message-ID: <20090421141727.C5B6A1CBC4A8@idnmail.gen-info.osaka-u.ac.jp> Hi, On Fri, 10 Apr 2009 16:20:19 +0900 Masahide Kikkawa wrote: > Hi, > > > I recently installed bioruby-1.3.0 and encountered a bug. > > Here is what I did > result = Bio::PubMed.efetch([123,456,789]) > > I expected the result to be Array of MEDLINE formatted String > as documented here: > http://bioruby.org/rdoc/classes/Bio/PubMed.html > > However, the return value was one String containing all the results. > > In the bioruby-1.2.1, it does return Array of MEDLINE formatted String. Thank you for reporting a bug. This is a regression introduced with the refactoring of Bio::PubMed between 1.2.1 and 1.3.0. > Therefore I would recommend to apply following patch to ncbirest.rb > > < > < result.strip! > < if opts["retmode"] == "text" > < result = result.split(/\n\n+/) > < end > < > < return result > --- > > return result.strip > > #return result.strip.split(/\n\n+/) > Bio::NCBI::REST is not only for PubMed but also for many databases in NCBI, and efetch may return free text, depending on databases. Thus, the processing of the result should only be applied to Bio::PubMed. http://github.com/bioruby/bioruby/commit/a48a9a35b87dead069fe328ba7086977304af995 To confirm bug fix, tests for Bio::PubMed are also added. http://github.com/bioruby/bioruby/commit/bf5ba6d4503f3ddb0ca31673882f5b396a932bbe In addition, I found a bug about "retstart" and fixed the bug. http://github.com/bioruby/bioruby/commit/fc0339fe8a42cd00199cfdc938590ae9626551bc Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org