From pjotr.public14 at thebird.nl Wed Apr 1 04:08:08 2009
From: pjotr.public14 at thebird.nl (Pjotr Prins)
Date: Wed, 1 Apr 2009 10:08:08 +0200
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking
for co-mentors
In-Reply-To: <49D2C051.9090002@burnham.org>
References: <49D2C051.9090002@burnham.org>
Message-ID: <20090401080808.GA30972@thebird.nl>
Hi Christian,
I am happy to help.
Pj.
On Tue, Mar 31, 2009 at 06:16:01PM -0700, Christian M Zmasek wrote:
> Hi:
>
> As you probably read on previous messages, NESCent is again having a
> Google Summer of Code program
> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
>
> I am serving as the mentor of a project entitled "phyloXML support in
> BioRuby" (see:
> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
>
> In order to ensure a successful outcome (should the project be
> accepted), as well as to improve the chances of being accepted, I am
> looking for people willing to serve as co-mentors.
>
> Christian
>
>
> PS: Here is the full description of the project:
>
>
> phyloXML support in BioRuby
>
> Rationale Evolutionary trees are central to comparative genomics
> studies.
> Trees used in this context are usually annotated with a variety of
> data elements, such as taxonomic information, genome-related data
> (gene names, functional annotations) and gene duplication events, as
> well as information related to the evolutionary tree itself (branch
> lengths, support values). phyloXML is an XML data exchange standard
> that can represent this data. Trees in phyloXML format can be
> displayed and analyzed with Archaeopteryx
> (the successor to ATV
> ),
> which also allows manipulation and navigation of the tree. While
> tools exist to convert other formats (such as the widely used Newick
> and Nexus formats) to phyloXML, there is currently support for
> phyloXML in only one of the open source Bio* projects (in BioPerl
> , as a result of
> Google's Summer of Code 2008). Approach Build phyloXML support in
> the increasingly popular, dynamic, and
> fully objected oriented language Ruby. More specifically, extend the
> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
> just been released). This will entail (i) the development of objects
> to represent all the elements of phyloXML (sequences, taxonomic
> data, annotations, etc), (ii) the development of a parser to read in
> phyloXML, and (iii) a phyloXML writer. Challenges Relating the data
> elements specific to phyloXML to the tree classes
> already in BioRuby while maintaining the standards of the BioRuby
> project. Development of a time and memory efficient phyloXML parser
> (the parser has to be able to process trees with thousands of
> external nodes, at least). Involved toolkits or projects BioRuby
> , phyloXML Degree of
> difficulty and needed skills Medium. Requires experience in an object
> oriented programming
> language (such as C++, Java, or, ideally, Ruby). Experience in
> genomics or a related biological field is also critical. Knowledge
> of BioRuby will obviously help, as well as familiarity with XML.
> Mentors Christian Zmasek
>
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
From bonnalraoul at ingm.it Wed Apr 1 06:37:27 2009
From: bonnalraoul at ingm.it (Raoul JP Bonnal)
Date: Wed, 01 Apr 2009 12:37:27 +0200
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D2C051.9090002@burnham.org>
References: <49D2C051.9090002@burnham.org>
Message-ID: <49D343E7.1080007@ingm.it>
Dear Christian,
one of the outcomes from Biohackathon 09, for me, was to implement Phylo
into BioSQL, could be of interest for this proposal ?
Let me know.
Christian M Zmasek ha scritto:
> Hi:
>
> As you probably read on previous messages, NESCent is again having a
> Google Summer of Code program
> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
>
> I am serving as the mentor of a project entitled "phyloXML support in
> BioRuby" (see:
> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
>
>
> In order to ensure a successful outcome (should the project be
> accepted), as well as to improve the chances of being accepted, I am
> looking for people willing to serve as co-mentors.
>
> Christian
>
>
> PS: Here is the full description of the project:
>
>
> phyloXML support in BioRuby
>
> Rationale Evolutionary trees are central to comparative genomics
> studies.
> Trees used in this context are usually annotated with a variety of
> data elements, such as taxonomic information, genome-related data
> (gene names, functional annotations) and gene duplication events, as
> well as information related to the evolutionary tree itself (branch
> lengths, support values). phyloXML is an XML data exchange standard
> that can represent this data. Trees in phyloXML format can be
> displayed and analyzed with Archaeopteryx
> (the successor to ATV
>
> ),
>
> which also allows manipulation and navigation of the tree. While
> tools exist to convert other formats (such as the widely used Newick
> and Nexus formats) to phyloXML, there is currently support for
> phyloXML in only one of the open source Bio* projects (in BioPerl
> , as a result of
> Google's Summer of Code 2008). Approach Build phyloXML support
> in the increasingly popular, dynamic, and
> fully objected oriented language Ruby. More specifically, extend the
> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
> just been released). This will entail (i) the development of objects
> to represent all the elements of phyloXML (sequences, taxonomic
> data, annotations, etc), (ii) the development of a parser to read in
> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
> data elements specific to phyloXML to the tree classes
> already in BioRuby while maintaining the standards of the BioRuby
> project. Development of a time and memory efficient phyloXML parser
> (the parser has to be able to process trees with thousands of
> external nodes, at least). Involved toolkits or projects BioRuby
> , phyloXML Degree
> of difficulty and needed skills Medium. Requires experience in an
> object oriented programming
> language (such as C++, Java, or, ideally, Ruby). Experience in
> genomics or a related biological field is also critical. Knowledge
> of BioRuby will obviously help, as well as familiarity with XML.
> Mentors Christian Zmasek
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
From czmasek at burnham.org Wed Apr 1 14:20:23 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Wed, 1 Apr 2009 11:20:23 -0700
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D343E7.1080007@ingm.it>
References: <49D2C051.9090002@burnham.org> <49D343E7.1080007@ingm.it>
Message-ID: <49D3B067.3050101@burnham.org>
Hi Raoul:
Thank you for your reply.
While your project sounds interesting and much needed, the project I was
referring to is to implement phyloXML (www.phyloxml.org) support in BioRuby.
Given that the deadline for Google Summer of Code projects is April 3rd,
I think it would be almost impossible to propose a new project and find
a student for it before this deadline.
Sincerely,
Christian
Raoul JP Bonnal wrote:
> Dear Christian,
> one of the outcomes from Biohackathon 09, for me, was to implement Phylo
> into BioSQL, could be of interest for this proposal ?
>
> Let me know.
>
> Christian M Zmasek ha scritto:
>
>
>> Hi:
>>
>> As you probably read on previous messages, NESCent is again having a
>> Google Summer of Code program
>> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
>>
>> I am serving as the mentor of a project entitled "phyloXML support in
>> BioRuby" (see:
>> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
>>
>>
>> In order to ensure a successful outcome (should the project be
>> accepted), as well as to improve the chances of being accepted, I am
>> looking for people willing to serve as co-mentors.
>>
>> Christian
>>
>>
>> PS: Here is the full description of the project:
>>
>>
>> phyloXML support in BioRuby
>>
>> Rationale Evolutionary trees are central to comparative genomics
>> studies.
>> Trees used in this context are usually annotated with a variety of
>> data elements, such as taxonomic information, genome-related data
>> (gene names, functional annotations) and gene duplication events, as
>> well as information related to the evolutionary tree itself (branch
>> lengths, support values). phyloXML is an XML data exchange standard
>> that can represent this data. Trees in phyloXML format can be
>> displayed and analyzed with Archaeopteryx
>> (the successor to ATV
>>
>> ),
>>
>> which also allows manipulation and navigation of the tree. While
>> tools exist to convert other formats (such as the widely used Newick
>> and Nexus formats) to phyloXML, there is currently support for
>> phyloXML in only one of the open source Bio* projects (in BioPerl
>> , as a result of
>> Google's Summer of Code 2008). Approach Build phyloXML support
>> in the increasingly popular, dynamic, and
>> fully objected oriented language Ruby. More specifically, extend the
>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>> just been released). This will entail (i) the development of objects
>> to represent all the elements of phyloXML (sequences, taxonomic
>> data, annotations, etc), (ii) the development of a parser to read in
>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
>> data elements specific to phyloXML to the tree classes
>> already in BioRuby while maintaining the standards of the BioRuby
>> project. Development of a time and memory efficient phyloXML parser
>> (the parser has to be able to process trees with thousands of
>> external nodes, at least). Involved toolkits or projects BioRuby
>> , phyloXML Degree
>> of difficulty and needed skills Medium. Requires experience in an
>> object oriented programming
>> language (such as C++, Java, or, ideally, Ruby). Experience in
>> genomics or a related biological field is also critical. Knowledge
>> of BioRuby will obviously help, as well as familiarity with XML.
>> Mentors Christian Zmasek
>>
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
>
>
From czmasek at burnham.org Wed Apr 1 14:31:12 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Wed, 1 Apr 2009 11:31:12 -0700
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu>
References: <49D2C051.9090002@burnham.org>
<2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu>
Message-ID: <49D3B2F0.5030605@burnham.org>
Hi, Joel:
From my own experience as a co-mentor last year, the time requirement
is rather limited.
The project I co-mentored was "phyloXML in BioRuby". Since I am
knowledgeable about phyloXML I mainly helped out with questions related
to it.
Of course, for this year's project "phyloXML in BioRuby", it will be the
opposite. Since I am the main-mentor I will/am handling all the issues
related to the application process/midterm
evaluation/final-evaluation/phyloXML. But I would feel more comfortable,
if I had one or two co-mentors with significant Ruby and BioRuby
experience who could help out if questions specific to these languages
arise.
Christian
PS: The project still needs to be accepted by the Nescent organization.
Last year, there were about 15 projects, and only five got accepted
(since Google gave Nescent five slots, Nescent had to decide which
projects they liked the most).
Joel Dudley wrote:
> I am interested. What's involved in being a co-mentor? I am one of the
> authors of MEGA (http://www.megasoftware.net) and I have been
> programming with Ruby daily for about 3 years now.
>
> -- Joel
>
> On Mar 31, 2009, at 6:16 PM, Christian M Zmasek wrote:
>
>
>> Hi:
>>
>> As you probably read on previous messages, NESCent is again having a
>> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
>> ).
>>
>> I am serving as the mentor of a project entitled "phyloXML support
>> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
>> #phyloXML_support_in_BioRuby).
>>
>> In order to ensure a successful outcome (should the project be
>> accepted), as well as to improve the chances of being accepted, I am
>> looking for people willing to serve as co-mentors.
>>
>> Christian
>>
>>
>> PS: Here is the full description of the project:
>>
>>
>> phyloXML support in BioRuby
>>
>> Rationale Evolutionary trees are central to comparative genomics
>> studies.
>> Trees used in this context are usually annotated with a variety of
>> data elements, such as taxonomic information, genome-related data
>> (gene names, functional annotations) and gene duplication events, as
>> well as information related to the evolutionary tree itself (branch
>> lengths, support values). phyloXML is an XML data exchange standard
>> that can represent this data. Trees in phyloXML format can be
>> displayed and analyzed with Archaeopteryx
>> (the successor to ATV
>> >
>>> ),
>>>
>> which also allows manipulation and navigation of the tree. While
>> tools exist to convert other formats (such as the widely used Newick
>> and Nexus formats) to phyloXML, there is currently support for
>> phyloXML in only one of the open source Bio* projects (in BioPerl
>> , as a result of
>> Google's Summer of Code 2008). Approach Build phyloXML support
>> in the increasingly popular, dynamic, and
>> fully objected oriented language Ruby. More specifically, extend the
>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>> just been released). This will entail (i) the development of objects
>> to represent all the elements of phyloXML (sequences, taxonomic
>> data, annotations, etc), (ii) the development of a parser to read in
>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
>> data elements specific to phyloXML to the tree classes
>> already in BioRuby while maintaining the standards of the BioRuby
>> project. Development of a time and memory efficient phyloXML parser
>> (the parser has to be able to process trees with thousands of
>> external nodes, at least). Involved toolkits or projects
>> BioRuby , phyloXML > www.phyloxml.org> Degree of difficulty and needed skills Medium.
>> Requires experience in an object oriented programming
>> language (such as C++, Java, or, ideally, Ruby). Experience in
>> genomics or a related biological field is also critical. Knowledge
>> of BioRuby will obviously help, as well as familiarity with XML.
>> Mentors Christian Zmasek
>>
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
> =============Email Policy=============
> I now only check mail at the following times:
> - 8 a.m., Noon, 5 p.m. Mon-Fri
> - 8 a.m. Sat-Sun
> If this is a true emergency, or you require an
> immediate response from me, please contact
> me via my mobile phone: (408) 464-8258
> ====================================
>
>
From ngoto at gen-info.osaka-u.ac.jp Thu Apr 2 01:08:38 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Thu, 2 Apr 2009 14:08:38 +0900
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D3B2F0.5030605@burnham.org>
References: <49D2C051.9090002@burnham.org>
<2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu>
<49D3B2F0.5030605@burnham.org>
Message-ID: <20090402050839.73F151CBC3BD@idnmail.gen-info.osaka-u.ac.jp>
Hi Christian,
I'll answer any questions about BioRuby as much as I can.
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
On Wed, 1 Apr 2009 11:31:12 -0700
Christian M Zmasek wrote:
> Hi, Joel:
>
> From my own experience as a co-mentor last year, the time requirement
> is rather limited.
> The project I co-mentored was "phyloXML in BioRuby". Since I am
> knowledgeable about phyloXML I mainly helped out with questions related
> to it.
>
> Of course, for this year's project "phyloXML in BioRuby", it will be the
> opposite. Since I am the main-mentor I will/am handling all the issues
> related to the application process/midterm
> evaluation/final-evaluation/phyloXML. But I would feel more comfortable,
> if I had one or two co-mentors with significant Ruby and BioRuby
> experience who could help out if questions specific to these languages
> arise.
>
> Christian
>
> PS: The project still needs to be accepted by the Nescent organization.
> Last year, there were about 15 projects, and only five got accepted
> (since Google gave Nescent five slots, Nescent had to decide which
> projects they liked the most).
>
>
>
> Joel Dudley wrote:
> > I am interested. What's involved in being a co-mentor? I am one of the
> > authors of MEGA (http://www.megasoftware.net) and I have been
> > programming with Ruby daily for about 3 years now.
> >
> > -- Joel
> >
> > On Mar 31, 2009, at 6:16 PM, Christian M Zmasek wrote:
> >
> >
> >> Hi:
> >>
> >> As you probably read on previous messages, NESCent is again having a
> >> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
> >> ).
> >>
> >> I am serving as the mentor of a project entitled "phyloXML support
> >> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
> >> #phyloXML_support_in_BioRuby).
> >>
> >> In order to ensure a successful outcome (should the project be
> >> accepted), as well as to improve the chances of being accepted, I am
> >> looking for people willing to serve as co-mentors.
> >>
> >> Christian
> >>
> >>
> >> PS: Here is the full description of the project:
> >>
> >>
> >> phyloXML support in BioRuby
> >>
> >> Rationale Evolutionary trees are central to comparative genomics
> >> studies.
> >> Trees used in this context are usually annotated with a variety of
> >> data elements, such as taxonomic information, genome-related data
> >> (gene names, functional annotations) and gene duplication events, as
> >> well as information related to the evolutionary tree itself (branch
> >> lengths, support values). phyloXML is an XML data exchange standard
> >> that can represent this data. Trees in phyloXML format can be
> >> displayed and analyzed with Archaeopteryx
> >> (the successor to ATV
> >> >>
> >>> ),
> >>>
> >> which also allows manipulation and navigation of the tree. While
> >> tools exist to convert other formats (such as the widely used Newick
> >> and Nexus formats) to phyloXML, there is currently support for
> >> phyloXML in only one of the open source Bio* projects (in BioPerl
> >> , as a result of
> >> Google's Summer of Code 2008). Approach Build phyloXML support
> >> in the increasingly popular, dynamic, and
> >> fully objected oriented language Ruby. More specifically, extend the
> >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
> >> just been released). This will entail (i) the development of objects
> >> to represent all the elements of phyloXML (sequences, taxonomic
> >> data, annotations, etc), (ii) the development of a parser to read in
> >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
> >> data elements specific to phyloXML to the tree classes
> >> already in BioRuby while maintaining the standards of the BioRuby
> >> project. Development of a time and memory efficient phyloXML parser
> >> (the parser has to be able to process trees with thousands of
> >> external nodes, at least). Involved toolkits or projects
> >> BioRuby , phyloXML >> www.phyloxml.org> Degree of difficulty and needed skills Medium.
> >> Requires experience in an object oriented programming
> >> language (such as C++, Java, or, ideally, Ruby). Experience in
> >> genomics or a related biological field is also critical. Knowledge
> >> of BioRuby will obviously help, as well as familiarity with XML.
> >> Mentors Christian Zmasek
> >>
> >>
> >> _______________________________________________
> >> BioRuby mailing list
> >> BioRuby at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioruby
> >>
From hlapp at gmx.net Thu Apr 2 01:25:43 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 2 Apr 2009 01:25:43 -0400
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D3B2F0.5030605@burnham.org>
References: <49D2C051.9090002@burnham.org>
<2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu>
<49D3B2F0.5030605@burnham.org>
Message-ID:
On Apr 1, 2009, at 2:31 PM, Christian M Zmasek wrote:
> But I would feel more comfortable, if I had one or two co-mentors
> with significant Ruby and BioRuby experience
I would echo this and add that from NESCent's perspective as the
mentoring organization having a committed co-mentor as a backup when
the primary mentor for whatever reason is temporarily incommunicado is
highly desirable.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
From bonnalraoul at ingm.it Thu Apr 2 04:43:34 2009
From: bonnalraoul at ingm.it (Raoul JP Bonnal)
Date: Thu, 02 Apr 2009 10:43:34 +0200
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D3B067.3050101@burnham.org>
References: <49D2C051.9090002@burnham.org> <49D343E7.1080007@ingm.it>
<49D3B067.3050101@burnham.org>
Message-ID: <49D47AB6.8090304@ingm.it>
Ok, probably was not so clear, I'm available for co-mentoring if you want.
Christian M Zmasek ha scritto:
> Hi Raoul:
>
> Thank you for your reply.
> While your project sounds interesting and much needed, the project I
> was referring to is to implement phyloXML (www.phyloxml.org) support
> in BioRuby.
> Given that the deadline for Google Summer of Code projects is April
> 3rd, I think it would be almost impossible to propose a new project
> and find a student for it before this deadline.
>
> Sincerely,
>
> Christian
>
>
>
> Raoul JP Bonnal wrote:
>> Dear Christian,
>> one of the outcomes from Biohackathon 09, for me, was to implement
>> Phylo into BioSQL, could be of interest for this proposal ?
>>
>> Let me know.
>>
>> Christian M Zmasek ha scritto:
>>
>>
>>> Hi:
>>>
>>> As you probably read on previous messages, NESCent is again having a
>>> Google Summer of Code program
>>> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
>>>
>>> I am serving as the mentor of a project entitled "phyloXML support
>>> in BioRuby" (see:
>>> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
>>>
>>>
>>> In order to ensure a successful outcome (should the project be
>>> accepted), as well as to improve the chances of being accepted, I am
>>> looking for people willing to serve as co-mentors.
>>>
>>> Christian
>>>
>>>
>>> PS: Here is the full description of the project:
>>>
>>>
>>> phyloXML support in BioRuby
>>>
>>> Rationale Evolutionary trees are central to comparative genomics
>>> studies.
>>> Trees used in this context are usually annotated with a variety of
>>> data elements, such as taxonomic information, genome-related data
>>> (gene names, functional annotations) and gene duplication events, as
>>> well as information related to the evolutionary tree itself (branch
>>> lengths, support values). phyloXML is an XML data exchange standard
>>> that can represent this data. Trees in phyloXML format can be
>>> displayed and analyzed with Archaeopteryx
>>> (the successor to ATV
>>>
>>> ),
>>>
>>> which also allows manipulation and navigation of the tree. While
>>> tools exist to convert other formats (such as the widely used Newick
>>> and Nexus formats) to phyloXML, there is currently support for
>>> phyloXML in only one of the open source Bio* projects (in BioPerl
>>> , as a result of
>>> Google's Summer of Code 2008). Approach Build phyloXML support
>>> in the increasingly popular, dynamic, and
>>> fully objected oriented language Ruby. More specifically, extend the
>>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>>> just been released). This will entail (i) the development of objects
>>> to represent all the elements of phyloXML (sequences, taxonomic
>>> data, annotations, etc), (ii) the development of a parser to read in
>>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
>>> data elements specific to phyloXML to the tree classes
>>> already in BioRuby while maintaining the standards of the BioRuby
>>> project. Development of a time and memory efficient phyloXML parser
>>> (the parser has to be able to process trees with thousands of
>>> external nodes, at least). Involved toolkits or projects
>>> BioRuby , phyloXML
>>> Degree of difficulty and needed skills
>>> Medium. Requires experience in an object oriented programming
>>> language (such as C++, Java, or, ideally, Ruby). Experience in
>>> genomics or a related biological field is also critical. Knowledge
>>> of BioRuby will obviously help, as well as familiarity with XML.
>>> Mentors Christian Zmasek
>>>
>>>
>>> _______________________________________________
>>> BioRuby mailing list
>>> BioRuby at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>
>>
>>
>>
>
From czmasek at burnham.org Thu Apr 2 19:01:13 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Thu, 2 Apr 2009 16:01:13 -0700
Subject: [BioRuby] [PhyloSoC] Project plan for phyloXML integration with
BioRuby
In-Reply-To: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com>
References: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com>
Message-ID: <49D543B9.2050200@burnham.org>
Diane:
Thank you for your interest in this project!
Indeed, the hour is late, and your proposal still needs significant work
in order to be competitive.
I think you already got some comments from Hilmar (I am at work and I
cannot use IRC).
Beside those, I'd like to suggest:
1. please make sure that all the studying is done before the coding
begins (May 23) (i.e. your "week 1" should be during the "community
bonding period").
2. You do not need to develop classes for objects already present in
BioRuby (such as phylogenetic trees). Actually, the less new classes you
have to introduce to better -- reuse!
3. I am happy to see that you include unit test early on, this is good!
You have to make extensive use of BioRuby's test suite.
4. In general, your weekly goals are not described in enough detail. It
might be a good idea to discuss goals, deliverables, anticipated
problems/difficulties (and possible solutions) for each week.
5. Documentation is very important(!)
6. Do you plan to maintain the code after the summer?
Since the abstract is due April 3rd and cannot be changed after that it
is best to concentrate on the abstract first, though.
(The project plan can still be tweaked after April 3rd, I understand.)
In the abstract you have to make clear that you understand the _biology_
behind the project. Why does phyloXML have the elements it has? Why is
it useful? Might have a look at: http://www.phyloxml.org and
http://www.tdwg.org/proceedings/article/view/437.
Can you show that you understand what evolutionary trees are? Where and
why are they used?
Are they only important in 'tree of life' applications (e.g. see:
http://www.liebertonline.com/doi/pdf/10.1089/omi.2006.10.231)
What is "phylogenomics", what is "comparative genomics"?
You could also go to [http://monochrome-effect.net/publications.html]
and have a quick look at some of the papers there, most are related to
the issues at hand, and some show real world applications of
phylogenetic trees.
How might participating in this project help your career? What do plan
to learn? Why are you a good candidate for this?
Hope this helps some,
CZ
Diana Jaunzeikare wrote:
> Hi everybody,
>
> I know this is kinda late and I should have contacted you earlier, but
> better later than never. I found out about Phyloinformatics Summer of
> Code just last night when I was doing homework for Bioinformatics lab
> on Phylogenetic reconstruction and Parsimony. I was reading various
> bio-tech related blogs in Google reader and in the Google Top
> Recomendations bar I saw the blog of The Tree of Life. There I read
> the blog post about Phyloinformatics Summer of Code. This was very
> exciting news for me! I almost jumped off the sofa of the excitement
> :) Already for two years I wanted to participate in Google summer of
> code, but it never really fit with what i was doing at the time. When
> I saw the project about integrating phyloXML with BioRuby I knew it
> was for me! I am a big fun of Ruby! Last semester for my Computational
> Biology seminar I wrote bunch of scripts to deal with PDB database for
> my final project. Also it has been pleasure to develop in Ruby on
> Rails. What is even more exciting, is that my research interests lie
> in Bioinformatics. In fact, I had thoughts before to develop for
> BioRuby, but I didn't have a good enough reason before.
>
> Here is my project plan for building support for phyloXML for BioRuby.
> I think the emphasis should be on the ease of use for biologists and a
> lot of example code.
>
>
> Week1 :
> * Get familiar with BioRuby, its structure, classes (like Bio:Tree),
> coding conventions, documentation conventions. See other
> implementations of XML parsers in BioRuby (like BLAST XML).
> * Get familiar with phyloXML, its structure, typical uses. Get data
> set of many different files in phyloXML format for testing.
>
> Week 2:
> * Try to write a program which would use phyloXMl data in order to
> understand what would be the easiest way to use it. (Later will be
> used for unit testing).
> * Desing the architecture of phyloXML class, parser and writer,
> interface with other classes (like alignment class).
>
> Week 3: Develop the some/basic/the most essential objects of phyloXML
> (Phylogeny, Clade, Taxonomy).
>
> Weeks 4-5: Develop phyloXML parser.
>
> Weeks 6-7: Develop phyloXML writer.
>
> Week 8: Develop the rest of the objects of phyloXML.
>
> Week 9: Update parser.
>
> Week 10: Update writer.
>
> Week 11: Finish up documentation. Write extensive examples of how to
> use the code.
>
> Week 12: Do write up of the project.
>
>
> What do you think about the project plan? Anything missing?
>
> Thanks,
>
> Diana
>
>
> Diana Jaunzeikare
> Smith College
> Computer Science and Math double major '10
> CS Department Liaison and Master Tutor
> email: djaunzei at email.smith.edu
> cell: 413-387-2083
From czmasek at burnham.org Thu Apr 2 23:15:41 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Thu, 2 Apr 2009 20:15:41 -0700
Subject: [BioRuby] [PhyloSoC] Project plan for phyloXML integration with
BioRuby
In-Reply-To: <4057d3bf0904021957o73caa39bu835b949493fcab85@mail.gmail.com>
References: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com>
<49D543B9.2050200@burnham.org>
<4057d3bf0904021957o73caa39bu835b949493fcab85@mail.gmail.com>
Message-ID: <49D57F5D.7070703@burnham.org>
Hi, Diana:
Looks better.
I think you need to point out how evolutionary trees are used outside of
'tree of life' applications (i.e. phylogenomics, phylogeography, gene
function prediction, ...) as those are import applications for which
phyloXML has been designed for.
Also, it is not expected that _you_ benchmark various XML parsers. It's
good enough to rely on published results. The important point is that
you, together with the BioRuby community, determine which one integrates
best with BioRuby (i.e. ideally create no additional dependencies) and
still provides acceptable performance.
CZ
Diana Jaunzeikare wrote:
> Hi,
>
> I posted Abstract and new project plan.
>
> http://socghop.appspot.com/student_proposal/show/google/gsoc2009/dianaj/t123872262150
>
> Diana
>
> On Thu, Apr 2, 2009 at 7:01 PM, Christian M Zmasek
> > wrote:
>
> Diane:
>
> Thank you for your interest in this project!
>
> Indeed, the hour is late, and your proposal still needs
> significant work in order to be competitive.
>
> I think you already got some comments from Hilmar (I am at work
> and I cannot use IRC).
>
> Beside those, I'd like to suggest:
>
> 1. please make sure that all the studying is done before the
> coding begins (May 23) (i.e. your "week 1" should be during the
> "community bonding period").
> 2. You do not need to develop classes for objects already present
> in BioRuby (such as phylogenetic trees). Actually, the less new
> classes you have to introduce to better -- reuse!
> 3. I am happy to see that you include unit test early on, this is
> good! You have to make extensive use of BioRuby's test suite.
> 4. In general, your weekly goals are not described in enough
> detail. It might be a good idea to discuss goals, deliverables,
> anticipated problems/difficulties (and possible solutions) for
> each week.
> 5. Documentation is very important(!)
> 6. Do you plan to maintain the code after the summer?
>
> Since the abstract is due April 3rd and cannot be changed after
> that it is best to concentrate on the abstract first, though.
> (The project plan can still be tweaked after April 3rd, I understand.)
>
> In the abstract you have to make clear that you understand the
> _biology_ behind the project. Why does phyloXML have the elements
> it has? Why is it useful? Might have a look at:
> http://www.phyloxml.org and
> http://www.tdwg.org/proceedings/article/view/437.
>
> Can you show that you understand what evolutionary trees are?
> Where and why are they used?
> Are they only important in 'tree of life' applications (e.g. see:
> http://www.liebertonline.com/doi/pdf/10.1089/omi.2006.10.231)
> What is "phylogenomics", what is "comparative genomics"?
> You could also go to
> [http://monochrome-effect.net/publications.html] and have a quick
> look at some of the papers there, most are related to the issues
> at hand, and some show real world applications of phylogenetic trees.
>
> How might participating in this project help your career? What do
> plan to learn? Why are you a good candidate for this?
>
> Hope this helps some,
>
> CZ
>
>
> Diana Jaunzeikare wrote:
>
> Hi everybody,
>
> I know this is kinda late and I should have contacted you
> earlier, but better later than never. I found out about
> Phyloinformatics Summer of Code just last night when I was
> doing homework for Bioinformatics lab on Phylogenetic
> reconstruction and Parsimony. I was reading various bio-tech
> related blogs in Google reader and in the Google Top
> Recomendations bar I saw the blog of The Tree of Life. There I
> read the blog post about Phyloinformatics Summer of Code. This
> was very exciting news for me! I almost jumped off the sofa of
> the excitement :) Already for two years I wanted to
> participate in Google summer of code, but it never really fit
> with what i was doing at the time. When I saw the project
> about integrating phyloXML with BioRuby I knew it was for me!
> I am a big fun of Ruby! Last semester for my Computational
> Biology seminar I wrote bunch of scripts to deal with PDB
> database for my final project. Also it has been pleasure to
> develop in Ruby on Rails. What is even more exciting, is that
> my research interests lie in Bioinformatics. In fact, I had
> thoughts before to develop for BioRuby, but I didn't have a
> good enough reason before.
>
> Here is my project plan for building support for phyloXML for
> BioRuby. I think the emphasis should be on the ease of use for
> biologists and a lot of example code.
>
>
> Week1 :
> * Get familiar with BioRuby, its structure, classes (like
> Bio:Tree), coding conventions, documentation conventions. See
> other implementations of XML parsers in BioRuby (like BLAST XML).
> * Get familiar with phyloXML, its structure, typical uses.
> Get data set of many different files in phyloXML format for
> testing.
> Week 2:
> * Try to write a program which would use phyloXMl data in
> order to understand what would be the easiest way to use it.
> (Later will be used for unit testing).
> * Desing the architecture of phyloXML class, parser and
> writer, interface with other classes (like alignment class).
> Week 3: Develop the some/basic/the most essential objects of
> phyloXML (Phylogeny, Clade, Taxonomy).
> Weeks 4-5: Develop phyloXML parser.
> Weeks 6-7: Develop phyloXML writer.
> Week 8: Develop the rest of the objects of phyloXML.
> Week 9: Update parser.
>
> Week 10: Update writer. Week 11: Finish up documentation.
> Write extensive examples of how to use the code.
>
> Week 12: Do write up of the project.
>
> What do you think about the project plan? Anything missing?
>
> Thanks,
>
> Diana
>
>
> Diana Jaunzeikare
> Smith College
> Computer Science and Math double major '10
> CS Department Liaison and Master Tutor
> email: djaunzei at email.smith.edu
>
> >
> cell: 413-387-2083
>
>
>
From kenglish at gmail.com Sun Apr 5 20:13:37 2009
From: kenglish at gmail.com (Kevin English)
Date: Sun, 5 Apr 2009 14:13:37 -1000
Subject: [BioRuby] Blast with file as a query option?
Message-ID:
Hello,
I have to very large local fasta files that wish to blast against one
another and parse the results in bio ruby. I'm wondering if there is a way
to mimic the behavior of this blast command:
blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2
where Large_list_sequences_2 is a formatted fasta db. My current
implementation opens Large_list_sequences_1.fasta and goes through it
sequence by sequence. It seems to run pretty slow. I'm wondering if I can in
some way do the above blast command and loop through the results and get a
performance gain.
For any curious, my code is on github:
http://github.com/kenglishhi/bioflexrails/tree/master
The file that is doing the blasts is under app/model/biodatabase.rb.
I'm trying to write a rails app uses biosql db and allows this biologist to
organize his sequences. I'm very new to bioinformatics but have a lot
experience with Ruby on Rails.
Thanks in advance for you help.
Kevin
From ngoto at gen-info.osaka-u.ac.jp Tue Apr 7 00:03:03 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 7 Apr 2009 13:03:03 +0900
Subject: [BioRuby] Blast with file as a query option?
In-Reply-To:
References:
Message-ID: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp>
Hi,
On Sun, 5 Apr 2009 14:13:37 -1000
Kevin English wrote:
> Hello,
> I have to very large local fasta files that wish to blast against one
> another and parse the results in bio ruby. I'm wondering if there is a way
> to mimic the behavior of this blast command:
>
> blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2
>
>
> where Large_list_sequences_2 is a formatted fasta db. My current
> implementation opens Large_list_sequences_1.fasta and goes through it
> sequence by sequence. It seems to run pretty slow. I'm wondering if I can in
> some way do the above blast command and loop through the results and get a
> performance gain.
To gain performance, adding options to BLAST is strongly recommended.
-e Expectation value (E) [Real]
default = 10.0
-v Number of database sequences to show one-line descriptions for (V) [Integer]
default = 500
-b Number of database sequence to show alignments for (B) [Integer]
default = 250
Changing above to smaller values will reduce output report size
which means performance gain.
Executing BLAST with multiple query sequences can also gain performance.
In addition, when you have query sequences in a local file, calling
blastall command directly without Bio::Blast may be good.
For example,
require 'bio'
require 'tempfile'
command = %( blastall -p blastn -i Large_list_sequences_1.fasta
-d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 )
tempfile = Tempfile.new('blastout')
tempfile.close(false)
command = command + [ "-o", tempfile.path ]
system(*command)
# After system(), error checks will be needed but skipped.
tempfile.open
ff = Bio::FlatFile.open(tempfile)
ff.each do |report|
# For example, prints query_def and target_def
report.each do |hit|
print report.query_def, "\t", hit.target_def, "\n"
end
end
ff.close
tempfile.close(true)
> For any curious, my code is on github:
>
> http://github.com/kenglishhi/bioflexrails/tree/master
>
> The file that is doing the blasts is under app/model/biodatabase.rb.
>
> I'm trying to write a rails app uses biosql db and allows this biologist to
> organize his sequences. I'm very new to bioinformatics but have a lot
> experience with Ruby on Rails.
>
> Thanks in advance for you help.
In general, a BLAST search against a very large database takes
very long time, and using batch queueing system might be needed.
Thanks,
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
From donttrustben at gmail.com Tue Apr 7 00:30:09 2009
From: donttrustben at gmail.com (Ben Woodcroft)
Date: Tue, 7 Apr 2009 14:30:09 +1000
Subject: [BioRuby] Blast with file as a query option?
In-Reply-To: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp>
References:
<20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp>
Message-ID:
And there is the -a flag, for specifying you want to use multiple CPUs.
ben
2009/4/7 Naohisa GOTO
> Hi,
>
> On Sun, 5 Apr 2009 14:13:37 -1000
> Kevin English wrote:
>
> > Hello,
> > I have to very large local fasta files that wish to blast against one
> > another and parse the results in bio ruby. I'm wondering if there is a
> way
> > to mimic the behavior of this blast command:
> >
> > blastall -p blastn -i Large_list_sequences_1.fasta -d
> Large_list_sequences_2
> >
> >
> > where Large_list_sequences_2 is a formatted fasta db. My current
> > implementation opens Large_list_sequences_1.fasta and goes through it
> > sequence by sequence. It seems to run pretty slow. I'm wondering if I can
> in
> > some way do the above blast command and loop through the results and get
> a
> > performance gain.
>
> To gain performance, adding options to BLAST is strongly recommended.
> -e Expectation value (E) [Real]
> default = 10.0
> -v Number of database sequences to show one-line descriptions for (V)
> [Integer]
> default = 500
> -b Number of database sequence to show alignments for (B) [Integer]
> default = 250
>
> Changing above to smaller values will reduce output report size
> which means performance gain.
>
> Executing BLAST with multiple query sequences can also gain performance.
> In addition, when you have query sequences in a local file, calling
> blastall command directly without Bio::Blast may be good.
>
> For example,
>
> require 'bio'
> require 'tempfile'
>
> command = %( blastall -p blastn -i Large_list_sequences_1.fasta
> -d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 )
> tempfile = Tempfile.new('blastout')
> tempfile.close(false)
> command = command + [ "-o", tempfile.path ]
> system(*command)
> # After system(), error checks will be needed but skipped.
> tempfile.open
> ff = Bio::FlatFile.open(tempfile)
> ff.each do |report|
> # For example, prints query_def and target_def
> report.each do |hit|
> print report.query_def, "\t", hit.target_def, "\n"
> end
> end
> ff.close
> tempfile.close(true)
>
> > For any curious, my code is on github:
> >
> > http://github.com/kenglishhi/bioflexrails/tree/master
> >
> > The file that is doing the blasts is under app/model/biodatabase.rb.
> >
> > I'm trying to write a rails app uses biosql db and allows this biologist
> to
> > organize his sequences. I'm very new to bioinformatics but have a lot
> > experience with Ruby on Rails.
> >
> > Thanks in advance for you help.
>
> In general, a BLAST search against a very large database takes
> very long time, and using batch queueing system might be needed.
>
> Thanks,
>
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
--
FYI: My email addresses at unimelb, uq and gmail all redirect to the same
place.
From davide.rambaldi at ifom-ieo-campus.it Tue Apr 7 05:05:48 2009
From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi)
Date: Tue, 7 Apr 2009 11:05:48 +0200
Subject: [BioRuby] bioruby XML and libruby1.8 (ubuntu/debian)
Message-ID:
Hi, I got this ugly bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502535
Reported also by Naohisa Goto.
There is any solution provided by ubuntu/debian?
Or is better to recompile ruby from source?
thanks.
Davide Rambaldi,
Bioinformatics PhD student.
-----------------------------------------------------
Bioinformatic Group IFOM-IEO Campus
Via Adamello 16, Milano
I-20139 Italy
[t] +39 02574303 066
[e] davide.rambaldi at ifom-ieo-campus.it
[i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi
(homepage)
[i] http://www.semm.it (PhD school)
[i] http://www.btbs.unimib.it/ (Master)
-----------------------------------------------------
From davide.rambaldi at ifom-ieo-campus.it Tue Apr 7 05:09:46 2009
From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi)
Date: Tue, 7 Apr 2009 11:09:46 +0200
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D2C051.9090002@burnham.org>
References: <49D2C051.9090002@burnham.org>
Message-ID: <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it>
Hi, I am available to help is there is need.
I have a small experience with bioruby but I am really interested in
partecipate in a project with bioruby.
May be I can help in testing the code?
Davide
Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto:
> Hi:
>
> As you probably read on previous messages, NESCent is again having a
> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
> ).
>
> I am serving as the mentor of a project entitled "phyloXML support
> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
> #phyloXML_support_in_BioRuby).
>
> In order to ensure a successful outcome (should the project be
> accepted), as well as to improve the chances of being accepted, I am
> looking for people willing to serve as co-mentors.
>
> Christian
>
>
> PS: Here is the full description of the project:
>
>
> phyloXML support in BioRuby
>
> Rationale Evolutionary trees are central to comparative genomics
> studies.
> Trees used in this context are usually annotated with a variety of
> data elements, such as taxonomic information, genome-related data
> (gene names, functional annotations) and gene duplication events, as
> well as information related to the evolutionary tree itself (branch
> lengths, support values). phyloXML is an XML data exchange standard
> that can represent this data. Trees in phyloXML format can be
> displayed and analyzed with Archaeopteryx
> (the successor to ATV
> >),
> which also allows manipulation and navigation of the tree. While
> tools exist to convert other formats (such as the widely used Newick
> and Nexus formats) to phyloXML, there is currently support for
> phyloXML in only one of the open source Bio* projects (in BioPerl
> , as a result of
> Google's Summer of Code 2008). Approach Build phyloXML support
> in the increasingly popular, dynamic, and
> fully objected oriented language Ruby. More specifically, extend the
> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
> just been released). This will entail (i) the development of objects
> to represent all the elements of phyloXML (sequences, taxonomic
> data, annotations, etc), (ii) the development of a parser to read in
> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
> data elements specific to phyloXML to the tree classes
> already in BioRuby while maintaining the standards of the BioRuby
> project. Development of a time and memory efficient phyloXML parser
> (the parser has to be able to process trees with thousands of
> external nodes, at least). Involved toolkits or projects BioRuby
> , phyloXML Degree
> of difficulty and needed skills Medium. Requires experience in an
> object oriented programming
> language (such as C++, Java, or, ideally, Ruby). Experience in
> genomics or a related biological field is also critical. Knowledge
> of BioRuby will obviously help, as well as familiarity with XML.
> Mentors Christian Zmasek
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
Davide Rambaldi,
Bioinformatics PhD student.
-----------------------------------------------------
Bioinformatic Group IFOM-IEO Campus
Via Adamello 16, Milano
I-20139 Italy
[t] +39 02574303 066
[e] davide.rambaldi at ifom-ieo-campus.it
[i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi
(homepage)
[i] http://www.semm.it (PhD school)
[i] http://www.btbs.unimib.it/ (Master)
-----------------------------------------------------
From ngoto at gen-info.osaka-u.ac.jp Tue Apr 7 07:06:57 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 7 Apr 2009 20:06:57 +0900
Subject: [BioRuby] bioruby XML and libruby1.8 (ubuntu/debian)
In-Reply-To:
References:
Message-ID: <20090407110659.0CD521CBC3F4@idnmail.gen-info.osaka-u.ac.jp>
Hi,
For Debian, the regression have been fixed with DSA-1695-1.
http://www.debian.org/security/2009/dsa-1695
Normally, aptitude update && aptitude upgrade will fix
the problem.
I know nothing about Ubuntu.
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
On Tue, 7 Apr 2009 11:05:48 +0200
Davide Rambaldi wrote:
> Hi, I got this ugly bug:
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502535
>
> Reported also by Naohisa Goto.
>
> There is any solution provided by ubuntu/debian?
>
> Or is better to recompile ruby from source?
>
> thanks.
>
> Davide Rambaldi,
> Bioinformatics PhD student.
> -----------------------------------------------------
> Bioinformatic Group IFOM-IEO Campus
> Via Adamello 16, Milano
> I-20139 Italy
>
> [t] +39 02574303 066
> [e] davide.rambaldi at ifom-ieo-campus.it
> [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi
> (homepage)
> [i] http://www.semm.it (PhD school)
> [i] http://www.btbs.unimib.it/ (Master)
>
> -----------------------------------------------------
>
>
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
From czmasek at burnham.org Wed Apr 8 19:14:35 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Wed, 8 Apr 2009 16:14:35 -0700
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it>
References: <49D2C051.9090002@burnham.org>
<4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it>
Message-ID: <49DD2FDB.4080909@burnham.org>
Hi, Davide:
Yes, help in testing code is always very welcome!
Hopefully the project will get accepted!
Christian
Davide Rambaldi wrote:
> Hi, I am available to help is there is need.
>
> I have a small experience with bioruby but I am really interested in
> partecipate in a project with bioruby.
>
> May be I can help in testing the code?
>
> Davide
>
>
> Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto:
>
>
>> Hi:
>>
>> As you probably read on previous messages, NESCent is again having a
>> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
>> ).
>>
>> I am serving as the mentor of a project entitled "phyloXML support
>> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
>> #phyloXML_support_in_BioRuby).
>>
>> In order to ensure a successful outcome (should the project be
>> accepted), as well as to improve the chances of being accepted, I am
>> looking for people willing to serve as co-mentors.
>>
>> Christian
>>
>>
>> PS: Here is the full description of the project:
>>
>>
>> phyloXML support in BioRuby
>>
>> Rationale Evolutionary trees are central to comparative genomics
>> studies.
>> Trees used in this context are usually annotated with a variety of
>> data elements, such as taxonomic information, genome-related data
>> (gene names, functional annotations) and gene duplication events, as
>> well as information related to the evolutionary tree itself (branch
>> lengths, support values). phyloXML is an XML data exchange standard
>> that can represent this data. Trees in phyloXML format can be
>> displayed and analyzed with Archaeopteryx
>> (the successor to ATV
>> >
>>> ),
>>>
>> which also allows manipulation and navigation of the tree. While
>> tools exist to convert other formats (such as the widely used Newick
>> and Nexus formats) to phyloXML, there is currently support for
>> phyloXML in only one of the open source Bio* projects (in BioPerl
>> , as a result of
>> Google's Summer of Code 2008). Approach Build phyloXML support
>> in the increasingly popular, dynamic, and
>> fully objected oriented language Ruby. More specifically, extend the
>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>> just been released). This will entail (i) the development of objects
>> to represent all the elements of phyloXML (sequences, taxonomic
>> data, annotations, etc), (ii) the development of a parser to read in
>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
>> data elements specific to phyloXML to the tree classes
>> already in BioRuby while maintaining the standards of the BioRuby
>> project. Development of a time and memory efficient phyloXML parser
>> (the parser has to be able to process trees with thousands of
>> external nodes, at least). Involved toolkits or projects
>> BioRuby , phyloXML > www.phyloxml.org> Degree of difficulty and needed skills Medium.
>> Requires experience in an object oriented programming
>> language (such as C++, Java, or, ideally, Ruby). Experience in
>> genomics or a related biological field is also critical. Knowledge
>> of BioRuby will obviously help, as well as familiarity with XML.
>> Mentors Christian Zmasek
>>
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
> Davide Rambaldi,
> Bioinformatics PhD student.
> -----------------------------------------------------
> Bioinformatic Group IFOM-IEO Campus
> Via Adamello 16, Milano
> I-20139 Italy
>
> [t] +39 02574303 066
> [e] davide.rambaldi at ifom-ieo-campus.it
> [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi
> (homepage)
> [i] http://www.semm.it (PhD school)
> [i] http://www.btbs.unimib.it/ (Master)
>
> -----------------------------------------------------
>
>
>
>
>
From ixpata at gmail.com Wed Apr 8 19:23:47 2009
From: ixpata at gmail.com (S M)
Date: Wed, 8 Apr 2009 19:23:47 -0400
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49DD2FDB.4080909@burnham.org>
References: <49D2C051.9090002@burnham.org>
<4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it>
<49DD2FDB.4080909@burnham.org>
Message-ID:
I'd be interested in helping test too or in any other way I could
help. I'm not experienced enough to program but maybe I can help on
other ways.
Sent from my iPhone
On Apr 8, 2009, at 7:14 PM, Christian M Zmasek
wrote:
> Hi, Davide:
>
> Yes, help in testing code is always very welcome!
>
> Hopefully the project will get accepted!
>
> Christian
>
>
> Davide Rambaldi wrote:
>> Hi, I am available to help is there is need.
>>
>> I have a small experience with bioruby but I am really interested
>> in partecipate in a project with bioruby.
>>
>> May be I can help in testing the code?
>>
>> Davide
>>
>>
>> Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto:
>>
>>
>>> Hi:
>>>
>>> As you probably read on previous messages, NESCent is again having
>>> a Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
>>> ).
>>>
>>> I am serving as the mentor of a project entitled "phyloXML
>>> support in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
>>> #phyloXML_support_in_BioRuby).
>>>
>>> In order to ensure a successful outcome (should the project be
>>> accepted), as well as to improve the chances of being accepted, I
>>> am looking for people willing to serve as co-mentors.
>>>
>>> Christian
>>>
>>>
>>> PS: Here is the full description of the project:
>>>
>>>
>>> phyloXML support in BioRuby
>>>
>>> Rationale Evolutionary trees are central to comparative
>>> genomics studies.
>>> Trees used in this context are usually annotated with a variety of
>>> data elements, such as taxonomic information, genome-related data
>>> (gene names, functional annotations) and gene duplication events,
>>> as
>>> well as information related to the evolutionary tree itself (branch
>>> lengths, support values). phyloXML is an XML data exchange standard
>>> that can represent this data. Trees in phyloXML format can be
>>> displayed and analyzed with Archaeopteryx
>>> (the successor to ATV
>>> >>> ),
>>>>
>>> which also allows manipulation and navigation of the tree. While
>>> tools exist to convert other formats (such as the widely used
>>> Newick
>>> and Nexus formats) to phyloXML, there is currently support for
>>> phyloXML in only one of the open source Bio* projects (in BioPerl
>>> , as a result of
>>> Google's Summer of Code 2008). Approach Build phyloXML
>>> support in the increasingly popular, dynamic, and
>>> fully objected oriented language Ruby. More specifically, extend
>>> the
>>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>>> just been released). This will entail (i) the development of
>>> objects
>>> to represent all the elements of phyloXML (sequences, taxonomic
>>> data, annotations, etc), (ii) the development of a parser to read
>>> in
>>> phyloXML, and (iii) a phyloXML writer. Challenges Relating
>>> the data elements specific to phyloXML to the tree classes
>>> already in BioRuby while maintaining the standards of the BioRuby
>>> project. Development of a time and memory efficient phyloXML parser
>>> (the parser has to be able to process trees with thousands of
>>> external nodes, at least). Involved toolkits or projects
>>> BioRuby , phyloXML >> > Degree of difficulty and needed skills Medium. Requires
>>> experience in an object oriented programming
>>> language (such as C++, Java, or, ideally, Ruby). Experience in
>>> genomics or a related biological field is also critical. Knowledge
>>> of BioRuby will obviously help, as well as familiarity with XML.
>>> Mentors Christian Zmasek
>>>
>>>
>>> _______________________________________________
>>> BioRuby mailing list
>>> BioRuby at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>
>>
>> Davide Rambaldi,
>> Bioinformatics PhD student.
>> -----------------------------------------------------
>> Bioinformatic Group IFOM-IEO Campus
>> Via Adamello 16, Milano
>> I-20139 Italy
>>
>> [t] +39 02574303 066
>> [e] davide.rambaldi at ifom-ieo-campus.it
>> [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/
>> DavideRambaldi (homepage)
>> [i] http://www.semm.it (PhD school)
>> [i] http://www.btbs.unimib.it/ (Master)
>>
>> -----------------------------------------------------
>>
>>
>>
>>
>>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
From mkikkawa at gmail.com Fri Apr 10 03:20:19 2009
From: mkikkawa at gmail.com (Masahide Kikkawa)
Date: Fri, 10 Apr 2009 16:20:19 +0900
Subject: [BioRuby] Bio::PubMed.efetch, bug?
Message-ID: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com>
Hi,
I recently installed bioruby-1.3.0 and encountered a bug.
Here is what I did
result = Bio::PubMed.efetch([123,456,789])
I expected the result to be Array of MEDLINE formatted String
as documented here:
http://bioruby.org/rdoc/classes/Bio/PubMed.html
However, the return value was one String containing all the results.
In the bioruby-1.2.1, it does return Array of MEDLINE formatted String.
Therefore I would recommend to apply following patch to ncbirest.rb
<
< result.strip!
< if opts["retmode"] == "text"
< result = result.split(/\n\n+/)
< end
<
< return result
---
> return result.strip
> #return result.strip.split(/\n\n+/)
---------------------------------------------------------------
Masahide Kikkawa, M.D., Ph. D.
Professor
Structural Biology
Graduate School of Science
Kyoto University
Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502
JAPAN
http://structure.biophys.kyoto-u.ac.jp/
Tel: +81-75-753-9421
FAX: +81-75-753-4218
---------------------------------------------------------------
From jan.aerts at gmail.com Fri Apr 10 03:30:32 2009
From: jan.aerts at gmail.com (Jan Aerts)
Date: Fri, 10 Apr 2009 08:30:32 +0100
Subject: [BioRuby] searching through large sets of features
Message-ID: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com>
Hi all,
I need to be able to handle millions of features for a visualization and
therefore be able to search through them efficiently. I've been made aware
of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data structure
that could do just that. One of the issues that this approach would solve is
that it doesn't need to load every single feature when the display
resolution is not high enough (no use trying to plot 5 million points next
to each other on a screen of 800 pixels wide; you can only plot 800
points....)
I'm now trying to write a ruby-implementation of an R-Tree that is more fit
for loci (i.e. one less dimension than an R-Tree, and has to handle
chromosome as well) at http://github.com/jandot/locustree. I'm trying to
make it a module that can be mixed in, so anything which has a chromosome,
start and stop can be indexed and searched using this R-Tree adaptation.
Does anyone know a ruby-implementation of R-Tree? Or is anyone interested in
helping me out?
Thanks,
jan.
From mail at michaelbarton.me.uk Fri Apr 10 09:33:30 2009
From: mail at michaelbarton.me.uk (Michael Barton)
Date: Fri, 10 Apr 2009 14:33:30 +0100
Subject: [BioRuby] searching through large sets of features
In-Reply-To:
References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com>
Message-ID:
There's also some ruby searching algorithms here
http://algorithms.rubyforge.org/
I'm happy to try and help if I can
Cheers
Mike
2009/4/10 Michael Barton
> Would a bloom tree fit this problem?
> http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/
>
> 2009/4/10 Jan Aerts
>
> Hi all,
>> I need to be able to handle millions of features for a visualization and
>> therefore be able to search through them efficiently. I've been made aware
>> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data
>> structure
>> that could do just that. One of the issues that this approach would solve
>> is
>> that it doesn't need to load every single feature when the display
>> resolution is not high enough (no use trying to plot 5 million points next
>> to each other on a screen of 800 pixels wide; you can only plot 800
>> points....)
>>
>> I'm now trying to write a ruby-implementation of an R-Tree that is more
>> fit
>> for loci (i.e. one less dimension than an R-Tree, and has to handle
>> chromosome as well) at http://github.com/jandot/locustree. I'm trying to
>> make it a module that can be mixed in, so anything which has a chromosome,
>> start and stop can be indexed and searched using this R-Tree adaptation.
>>
>> Does anyone know a ruby-implementation of R-Tree? Or is anyone interested
>> in
>> helping me out?
>>
>> Thanks,
>> jan.
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
>
From jan.aerts at gmail.com Fri Apr 10 09:57:20 2009
From: jan.aerts at gmail.com (Jan Aerts)
Date: Fri, 10 Apr 2009 14:57:20 +0100
Subject: [BioRuby] searching through large sets of features
In-Reply-To:
References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com>
Message-ID: <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com>
That is a _really_ good resource. Unfortunately, R-Trees are not in there
(or I didn't look well enough). I was thinking of sending him an email to
ask if he could quickly add that to his library :-), but saw that he's
switching to python. (But that particular announcement was on the 1st of
April, so let's hope it's a joke)
j.
2009/4/10 Michael Barton
> There's also some ruby searching algorithms here
> http://algorithms.rubyforge.org/
>
> I'm happy to try and help if I can
>
> Cheers
>
> Mike
>
> 2009/4/10 Michael Barton
>
> > Would a bloom tree fit this problem?
> >
> http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/
> >
> > 2009/4/10 Jan Aerts
> >
> > Hi all,
> >> I need to be able to handle millions of features for a visualization and
> >> therefore be able to search through them efficiently. I've been made
> aware
> >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data
> >> structure
> >> that could do just that. One of the issues that this approach would
> solve
> >> is
> >> that it doesn't need to load every single feature when the display
> >> resolution is not high enough (no use trying to plot 5 million points
> next
> >> to each other on a screen of 800 pixels wide; you can only plot 800
> >> points....)
> >>
> >> I'm now trying to write a ruby-implementation of an R-Tree that is more
> >> fit
> >> for loci (i.e. one less dimension than an R-Tree, and has to handle
> >> chromosome as well) at http://github.com/jandot/locustree. I'm trying
> to
> >> make it a module that can be mixed in, so anything which has a
> chromosome,
> >> start and stop can be indexed and searched using this R-Tree adaptation.
> >>
> >> Does anyone know a ruby-implementation of R-Tree? Or is anyone
> interested
> >> in
> >> helping me out?
> >>
> >> Thanks,
> >> jan.
> >> _______________________________________________
> >> BioRuby mailing list
> >> BioRuby at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioruby
> >>
> >
> >
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
From mail at michaelbarton.me.uk Fri Apr 10 10:51:10 2009
From: mail at michaelbarton.me.uk (Michael Barton)
Date: Fri, 10 Apr 2009 15:51:10 +0100
Subject: [BioRuby] searching through large sets of features
In-Reply-To: <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com>
References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com>
<4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com>
Message-ID:
You could use the kd-tree to narrow the search to the 1000 features each
side of the middle point, then traverse this set to exclude those outside of
the visualisation range. It's a bit of hack but it could save creating an
R-tree if there's no existing implementation.
2009/4/10 Jan Aerts
> That is a _really_ good resource. Unfortunately, R-Trees are not in there
> (or I didn't look well enough). I was thinking of sending him an email to
> ask if he could quickly add that to his library :-), but saw that he's
> switching to python. (But that particular announcement was on the 1st of
> April, so let's hope it's a joke)
> j.
>
>
> 2009/4/10 Michael Barton
>
>> There's also some ruby searching algorithms here
>> http://algorithms.rubyforge.org/
>>
>> I'm happy to try and help if I can
>>
>> Cheers
>>
>> Mike
>>
>> 2009/4/10 Michael Barton
>>
>> > Would a bloom tree fit this problem?
>> >
>> http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/
>> >
>> > 2009/4/10 Jan Aerts
>> >
>> > Hi all,
>> >> I need to be able to handle millions of features for a visualization
>> and
>> >> therefore be able to search through them efficiently. I've been made
>> aware
>> >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data
>> >> structure
>> >> that could do just that. One of the issues that this approach would
>> solve
>> >> is
>> >> that it doesn't need to load every single feature when the display
>> >> resolution is not high enough (no use trying to plot 5 million points
>> next
>> >> to each other on a screen of 800 pixels wide; you can only plot 800
>> >> points....)
>> >>
>> >> I'm now trying to write a ruby-implementation of an R-Tree that is more
>> >> fit
>> >> for loci (i.e. one less dimension than an R-Tree, and has to handle
>> >> chromosome as well) at http://github.com/jandot/locustree. I'm trying
>> to
>> >> make it a module that can be mixed in, so anything which has a
>> chromosome,
>> >> start and stop can be indexed and searched using this R-Tree
>> adaptation.
>> >>
>> >> Does anyone know a ruby-implementation of R-Tree? Or is anyone
>> interested
>> >> in
>> >> helping me out?
>> >>
>> >> Thanks,
>> >> jan.
>> >> _______________________________________________
>> >> BioRuby mailing list
>> >> BioRuby at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioruby
>> >>
>> >
>> >
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
>
From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 13 00:16:00 2009
From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson)
Date: Mon, 13 Apr 2009 13:16:00 +0900
Subject: [BioRuby] Parsing MSF alignment file
Message-ID: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp>
I tried to parse an alignment file given in the MSF format by using
Bio::GCG::Msf. It turned out though that points/dots (.) were used as a
gap character in the alignment and that can't be handled by
Bio::GCG::Msf. So, for what it's worth, I made these changes to
bio/appl/gcg/msf.rb:
$ diff msf.rb.old msf.rb.new
33,35c33,36
< if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ str[/.*/] then
< @heading = str[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like
this
< str.sub!(/.*/, '')
---
> preamble, at data = str.split(/^\/\/$/)
> if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ preamble[/.*/] then
> @heading = preamble[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0'
or like this
> preamble.sub!(/.*/, '')
37c38
< str.sub!(/.*\.\.$/m, '')
---
> preamble.sub!(/.*\.\.$/m, '')
48,49d48
< str.sub!(/.*\/\/$/m, '')
< a = $&.to_s.split(/^/)
51c50
< a.each do |x|
---
> preamble.split(/^/).each do |x|
59d57
< @data = str
Best regards,
Fredrik Johansson
--
***********************************
Fredrik Johansson, grad. student
Division of Bioinformatics
Medical Institute of Bioregulation
Kyushu University
3-1-1 Maidashi, Higashi-ku
Fukuoka 812-8582, Japan
fredjoha at bioreg.kyushu-u.ac.jp
***********************************
From ngoto at gen-info.osaka-u.ac.jp Mon Apr 13 08:54:54 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Mon, 13 Apr 2009 21:54:54 +0900
Subject: [BioRuby] Parsing MSF alignment file
In-Reply-To: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp>
References: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp>
Message-ID: <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp>
Hi,
On Mon, 13 Apr 2009 13:16:00 +0900
Fredrik Johansson wrote:
> I tried to parse an alignment file given in the MSF format by using
> Bio::GCG::Msf. It turned out though that points/dots (.) were used as a
> gap character in the alignment and that can't be handled by
> Bio::GCG::Msf. So, for what it's worth, I made these changes to
> bio/appl/gcg/msf.rb:
Thank you very much. Do you mean that Bio::GCG::Msf fails to
parse an alignment that contain two contiguous gaps (..) in the
end of a line? This is a bug, and the patch will soon be
applied to the git repository.
> $ diff msf.rb.old msf.rb.new
In the next time, please use "diff -u" (unified context format).
Thanks,
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 13 09:19:27 2009
From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson)
Date: Mon, 13 Apr 2009 22:19:27 +0900
Subject: [BioRuby] Parsing MSF alignment file
In-Reply-To: <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp>
References: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp>
<20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp>
Message-ID: <49E33BDF.70200@bioreg.kyushu-u.ac.jp>
Yes, that's what happened. A regular expression matched all the way to
the last occurrence of two dots at the end of a line.
> Thank you very much. Do you mean that Bio::GCG::Msf fails to
> parse an alignment that contain two contiguous gaps (..) in the
> end of a line? This is a bug, and the patch will soon be
> applied to the git repository.
>
>
>> $ diff msf.rb.old msf.rb.new
>>
>
> In the next time, please use "diff -u" (unified context format).
>
>
Ok, I'll attach the output of diff -u here:
--- /usr/lib/ruby/gems/1.8/gems/bio-1.3.0/lib/bio/appl/gcg/msf.rb.old
2009-04-13 11:32:53.000000000 +0900
+++ /usr/lib/ruby/gems/1.8/gems/bio-1.3.0/lib/bio/appl/gcg/msf.rb
2009-04-13 13:36:26.000000000 +0900
@@ -30,11 +30,12 @@
# Creates a new Msf object.
def initialize(str)
str = str.sub(/\A[\r\n]+/, '')
- if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ str[/.*/] then
- @heading = str[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this
- str.sub!(/.*/, '')
+ preamble, at data = str.split(/^\/\/$/)
+ if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ preamble[/.*/] then
+ @heading = preamble[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or
like this
+ preamble.sub!(/.*/, '')
end
- str.sub!(/.*\.\.$/m, '')
+ preamble.sub!(/.*\.\.$/m, '')
@description = $&.to_s.sub(/^.*\.\.$/, '').to_s
d = $&.to_s
if m =
/(.+)\s+MSF\:\s+(\d+)\s+Type\:\s+(\w)\s+(.+)\s+(Comp)?Check\:\s+(\d+)/.match(d)
then
@@ -45,10 +46,8 @@
@checksum = (m[6] ? m[6].to_i : nil)
end
- str.sub!(/.*\/\/$/m, '')
- a = $&.to_s.split(/^/)
@seq_info = []
- a.each do |x|
+ preamble.split(/^/).each do |x|
if /Name\: / =~ x then
s = {}
x.scan(/(\S+)\: +(\S*)/) { |y| s[$1] = $2 }
@@ -56,7 +55,6 @@
end
end
- @data = str
@description.sub!(/\A(\r\n|\r|\n)/, '')
@align = nil
end
From andrew.j.grimm at gmail.com Tue Apr 14 05:30:16 2009
From: andrew.j.grimm at gmail.com (Andrew Grimm)
Date: Tue, 14 Apr 2009 19:30:16 +1000
Subject: [BioRuby] Status of bioruby with Debian packages
Message-ID:
According to http://packages.qa.debian.org/b/bioruby.html , the latest
bioruby available as a Debian package is 1.1.0 or 1.1.0-1.
Are there plans to update bioruby's debian package?
This isn't an major issue for me - our ubuntu system already has bioruby
installed for ruby 1.8, and I was merely hoping to use ruby 1.9 merely for
faster performance. If a debian update isn't likely, then we might try using
rubygems on ubuntu and hope that the package management police don't arrest
us!
Andrew
PS: Is there anything better than just plain web search for searching
bioruby's mailing list archives?
From davide.rambaldi at ifom-ieo-campus.it Wed Apr 15 05:24:03 2009
From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi)
Date: Wed, 15 Apr 2009 11:24:03 +0200
Subject: [BioRuby] ACE assembly format
Message-ID: <7d40e94d36b1af5fd01fb9a908512aa2@ifom-ieo-campus.it>
Hi all, there is in bioruby a parser for the .ace format?
Something like Bio::Assembly::IO::ace in BioPerl? Many thanks.
Davide Rambaldi
--
[- Il Tuo 5 per Mille a favore della Ricerca dell'Istituto Europeo di
Oncologia
Tutti coloro che presentano il modello Unico, il modello 730 o pi?
semplicemente che ricevono dal proprio datore di lavoro il modello CUD, hanno
la facolt? di scegliere la destinazione del proprio 5 per mille.
Nella
casella riservata al Finanziamento agli Enti della Ricerca Sanitaria inserisci
il codice fiscale dello IEO (08 69 14 40 153) ed apponi la Tua firma.
Il Tuo 5
per Mille verr? destinato alla ricerca contro il cancro dell'Istituto Europeo
di Oncologia
NON COSTA NULLA E NON COMPORTA ALCUN AUMENTO DELLE IMPOSTE DA
VERSARE
NON ? UN'ALTERNATIVA ALL'8 PER MILLE
E' UN GESTO CONCRETO E DI GRANDE
VALORE
Per saperne di pi? vai al sito dello IEO www.ieo.it <
http://www.ieo.it > oppure scrivi a fondazione at ieo.it <
mailto:fondazione at ieo.it>
Segnala ad un amico questa opportunit?
-]
From ngoto at gen-info.osaka-u.ac.jp Wed Apr 15 07:04:00 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Wed, 15 Apr 2009 20:04:00 +0900
Subject: [BioRuby] Status of bioruby with Debian packages
In-Reply-To:
References:
Message-ID: <20090415110401.744641CBC544@idnmail.gen-info.osaka-u.ac.jp>
Hi,
On Tue, 14 Apr 2009 19:30:16 +1000
Andrew Grimm wrote:
> According to http://packages.qa.debian.org/b/bioruby.html , the latest
> bioruby available as a Debian package is 1.1.0 or 1.1.0-1.
>
> Are there plans to update bioruby's debian package?
I don't know, but I'm also a debian user, and I've sent a
wishlist to the Debian BTS.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=524180
In general, Debian specific issues should be asked to Debian
mailing lists, Bug Tracking System, and/or package maintainers.
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
From fredjoha at bioreg.kyushu-u.ac.jp Sun Apr 19 23:35:27 2009
From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson)
Date: Mon, 20 Apr 2009 12:35:27 +0900
Subject: [BioRuby] PubMed.efetch error
Message-ID: <49EBED7F.4010509@bioreg.kyushu-u.ac.jp>
PubMed.efetch does not give me correct output if I give an array as
argument. I think it is supposed to return an array of MEDLINE entries,
but it returns all entries in one string. I can get an array if I type
Bio::PubMed.efetch(entries).split("\n\n")
But that can't be the intended behaviour, right?
Regards,
Fredrik Johansson
--
***********************************
Fredrik Johansson, grad. student
Division of Bioinformatics
Medical Institute of Bioregulation
Kyushu University
3-1-1 Maidashi, Higashi-ku
Fukuoka 812-8582, Japan
fredjoha at bioreg.kyushu-u.ac.jp
***********************************
From tomoakin at kenroku.kanazawa-u.ac.jp Tue Apr 21 04:29:26 2009
From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA)
Date: Tue, 21 Apr 2009 17:29:26 +0900
Subject: [BioRuby] Read/write of simple fasta file increases the identifier.
Message-ID:
Hi,
With bioruby-1.3.0, reading a fasta file and convert to Bio::Sequence
with
to_seq and then write with output(:fasta)
causes the definition being longer each time the file is processed...
Is there a better interface to keep the definition line as the original?
Perhaps this is because, in some format there are cleary separate
entry_id and definition, while it is ambiguous in FASTA format.
However, it is better to be able to easily recover the original
definition line.
The following is the simple example case.
Surely this is quite simple that the entry need not converted
to Bio::Sequence, but in fact I would like to manipulate the sequence
such as getting subsequence or adding some sequence, translate, etc...
% cat > simple_fasta
>abc
acgttgac
% cat fastacat
#!/usr/local/bin/ruby
require 'bio'
ff = Bio::FlatFile.open(Bio::FastaFormat, ARGF)
while fe = ff.next_entry
seq = fe.to_seq;
puts seq.output(:fasta)
end
% ruby fastacat simple_fasta
>abc abc
acgttgac
% ruby fastacat simple_fasta | ruby fastacat
>abc abc abc
acgttgac
Sincerely yours,
--
Tomoaki NISHIYAMA
Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan
From ngoto at gen-info.osaka-u.ac.jp Tue Apr 21 10:17:26 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 21 Apr 2009 23:17:26 +0900
Subject: [BioRuby] Bio::PubMed.efetch, bug?
In-Reply-To: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com>
References: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com>
Message-ID: <20090421141727.C5B6A1CBC4A8@idnmail.gen-info.osaka-u.ac.jp>
Hi,
On Fri, 10 Apr 2009 16:20:19 +0900
Masahide Kikkawa wrote:
> Hi,
>
>
> I recently installed bioruby-1.3.0 and encountered a bug.
>
> Here is what I did
> result = Bio::PubMed.efetch([123,456,789])
>
> I expected the result to be Array of MEDLINE formatted String
> as documented here:
> http://bioruby.org/rdoc/classes/Bio/PubMed.html
>
> However, the return value was one String containing all the results.
>
> In the bioruby-1.2.1, it does return Array of MEDLINE formatted String.
Thank you for reporting a bug. This is a regression introduced
with the refactoring of Bio::PubMed between 1.2.1 and 1.3.0.
> Therefore I would recommend to apply following patch to ncbirest.rb
>
> <
> < result.strip!
> < if opts["retmode"] == "text"
> < result = result.split(/\n\n+/)
> < end
> <
> < return result
> ---
> > return result.strip
> > #return result.strip.split(/\n\n+/)
>
Bio::NCBI::REST is not only for PubMed but also for many
databases in NCBI, and efetch may return free text,
depending on databases. Thus, the processing of the result
should only be applied to Bio::PubMed.
http://github.com/bioruby/bioruby/commit/a48a9a35b87dead069fe328ba7086977304af995
To confirm bug fix, tests for Bio::PubMed are also added.
http://github.com/bioruby/bioruby/commit/bf5ba6d4503f3ddb0ca31673882f5b396a932bbe
In addition, I found a bug about "retstart" and fixed the bug.
http://github.com/bioruby/bioruby/commit/fc0339fe8a42cd00199cfdc938590ae9626551bc
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
From rozziite at gmail.com Thu Apr 30 21:37:07 2009
From: rozziite at gmail.com (Diana Jaunzeikare)
Date: Thu, 30 Apr 2009 21:37:07 -0400
Subject: [BioRuby] Google Summer of Code Intro: PhyloXML support in BioRuby
Message-ID: <4057d3bf0904301837r302bfb2buaa8a644c448267fa@mail.gmail.com>
Hi all,
I would like to introduce myself. My name is Diana and I have been accepted
for Google Summer of Code to implement PhyloXML support for BioRuby. I am a
junior at Smith College double majoring in Computer Science and Math. I am
interested in Bioinformatics, especially protein structure based
phylogenetics.
Here is the project abstract:
===
Phylogenetic trees are used in important applications, including
phylogenomics, phylogeography, gene function prediction, cladistics and the
study of molecular evolution. In order to foster successful analysis,
exchange, storage and reuse of phylogenetic trees and associated data, the
phyloXML format was developed. It can store all necessary information about
the phylogenetic tree, like clade, sequence, name and distance. The goal of
this project is to implement support for phyloXML in BioRuby.
===
Here is wiki:
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:PhyloXML_support_in_BioRuby
Any comments are welcome!
Cheers,
Diana
From czmasek at burnham.org Wed Apr 1 01:16:01 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Tue, 31 Mar 2009 18:16:01 -0700
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
Message-ID: <49D2C051.9090002@burnham.org>
Hi:
As you probably read on previous messages, NESCent is again having a
Google Summer of Code program
(http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
I am serving as the mentor of a project entitled "phyloXML support in
BioRuby" (see:
https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
In order to ensure a successful outcome (should the project be
accepted), as well as to improve the chances of being accepted, I am
looking for people willing to serve as co-mentors.
Christian
PS: Here is the full description of the project:
phyloXML support in BioRuby
Rationale
Evolutionary trees are central to comparative genomics studies.
Trees used in this context are usually annotated with a variety of
data elements, such as taxonomic information, genome-related data
(gene names, functional annotations) and gene duplication events, as
well as information related to the evolutionary tree itself (branch
lengths, support values). phyloXML is an XML data exchange standard
that can represent this data. Trees in phyloXML format can be
displayed and analyzed with Archaeopteryx
(the successor to ATV
),
which also allows manipulation and navigation of the tree. While
tools exist to convert other formats (such as the widely used Newick
and Nexus formats) to phyloXML, there is currently support for
phyloXML in only one of the open source Bio* projects (in BioPerl
, as a result of
Google's Summer of Code 2008).
Approach
Build phyloXML support in the increasingly popular, dynamic, and
fully objected oriented language Ruby. More specifically, extend the
open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
just been released). This will entail (i) the development of objects
to represent all the elements of phyloXML (sequences, taxonomic
data, annotations, etc), (ii) the development of a parser to read in
phyloXML, and (iii) a phyloXML writer.
Challenges
Relating the data elements specific to phyloXML to the tree classes
already in BioRuby while maintaining the standards of the BioRuby
project. Development of a time and memory efficient phyloXML parser
(the parser has to be able to process trees with thousands of
external nodes, at least).
Involved toolkits or projects
BioRuby , phyloXML
Degree of difficulty and needed skills
Medium. Requires experience in an object oriented programming
language (such as C++, Java, or, ideally, Ruby). Experience in
genomics or a related biological field is also critical. Knowledge
of BioRuby will obviously help, as well as familiarity with XML.
Mentors
Christian Zmasek
From pjotr.public14 at thebird.nl Wed Apr 1 08:08:08 2009
From: pjotr.public14 at thebird.nl (Pjotr Prins)
Date: Wed, 1 Apr 2009 10:08:08 +0200
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking
for co-mentors
In-Reply-To: <49D2C051.9090002@burnham.org>
References: <49D2C051.9090002@burnham.org>
Message-ID: <20090401080808.GA30972@thebird.nl>
Hi Christian,
I am happy to help.
Pj.
On Tue, Mar 31, 2009 at 06:16:01PM -0700, Christian M Zmasek wrote:
> Hi:
>
> As you probably read on previous messages, NESCent is again having a
> Google Summer of Code program
> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
>
> I am serving as the mentor of a project entitled "phyloXML support in
> BioRuby" (see:
> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
>
> In order to ensure a successful outcome (should the project be
> accepted), as well as to improve the chances of being accepted, I am
> looking for people willing to serve as co-mentors.
>
> Christian
>
>
> PS: Here is the full description of the project:
>
>
> phyloXML support in BioRuby
>
> Rationale Evolutionary trees are central to comparative genomics
> studies.
> Trees used in this context are usually annotated with a variety of
> data elements, such as taxonomic information, genome-related data
> (gene names, functional annotations) and gene duplication events, as
> well as information related to the evolutionary tree itself (branch
> lengths, support values). phyloXML is an XML data exchange standard
> that can represent this data. Trees in phyloXML format can be
> displayed and analyzed with Archaeopteryx
> (the successor to ATV
> ),
> which also allows manipulation and navigation of the tree. While
> tools exist to convert other formats (such as the widely used Newick
> and Nexus formats) to phyloXML, there is currently support for
> phyloXML in only one of the open source Bio* projects (in BioPerl
> , as a result of
> Google's Summer of Code 2008). Approach Build phyloXML support in
> the increasingly popular, dynamic, and
> fully objected oriented language Ruby. More specifically, extend the
> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
> just been released). This will entail (i) the development of objects
> to represent all the elements of phyloXML (sequences, taxonomic
> data, annotations, etc), (ii) the development of a parser to read in
> phyloXML, and (iii) a phyloXML writer. Challenges Relating the data
> elements specific to phyloXML to the tree classes
> already in BioRuby while maintaining the standards of the BioRuby
> project. Development of a time and memory efficient phyloXML parser
> (the parser has to be able to process trees with thousands of
> external nodes, at least). Involved toolkits or projects BioRuby
> , phyloXML Degree of
> difficulty and needed skills Medium. Requires experience in an object
> oriented programming
> language (such as C++, Java, or, ideally, Ruby). Experience in
> genomics or a related biological field is also critical. Knowledge
> of BioRuby will obviously help, as well as familiarity with XML.
> Mentors Christian Zmasek
>
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
From bonnalraoul at ingm.it Wed Apr 1 10:37:27 2009
From: bonnalraoul at ingm.it (Raoul JP Bonnal)
Date: Wed, 01 Apr 2009 12:37:27 +0200
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D2C051.9090002@burnham.org>
References: <49D2C051.9090002@burnham.org>
Message-ID: <49D343E7.1080007@ingm.it>
Dear Christian,
one of the outcomes from Biohackathon 09, for me, was to implement Phylo
into BioSQL, could be of interest for this proposal ?
Let me know.
Christian M Zmasek ha scritto:
> Hi:
>
> As you probably read on previous messages, NESCent is again having a
> Google Summer of Code program
> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
>
> I am serving as the mentor of a project entitled "phyloXML support in
> BioRuby" (see:
> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
>
>
> In order to ensure a successful outcome (should the project be
> accepted), as well as to improve the chances of being accepted, I am
> looking for people willing to serve as co-mentors.
>
> Christian
>
>
> PS: Here is the full description of the project:
>
>
> phyloXML support in BioRuby
>
> Rationale Evolutionary trees are central to comparative genomics
> studies.
> Trees used in this context are usually annotated with a variety of
> data elements, such as taxonomic information, genome-related data
> (gene names, functional annotations) and gene duplication events, as
> well as information related to the evolutionary tree itself (branch
> lengths, support values). phyloXML is an XML data exchange standard
> that can represent this data. Trees in phyloXML format can be
> displayed and analyzed with Archaeopteryx
> (the successor to ATV
>
> ),
>
> which also allows manipulation and navigation of the tree. While
> tools exist to convert other formats (such as the widely used Newick
> and Nexus formats) to phyloXML, there is currently support for
> phyloXML in only one of the open source Bio* projects (in BioPerl
> , as a result of
> Google's Summer of Code 2008). Approach Build phyloXML support
> in the increasingly popular, dynamic, and
> fully objected oriented language Ruby. More specifically, extend the
> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
> just been released). This will entail (i) the development of objects
> to represent all the elements of phyloXML (sequences, taxonomic
> data, annotations, etc), (ii) the development of a parser to read in
> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
> data elements specific to phyloXML to the tree classes
> already in BioRuby while maintaining the standards of the BioRuby
> project. Development of a time and memory efficient phyloXML parser
> (the parser has to be able to process trees with thousands of
> external nodes, at least). Involved toolkits or projects BioRuby
> , phyloXML Degree
> of difficulty and needed skills Medium. Requires experience in an
> object oriented programming
> language (such as C++, Java, or, ideally, Ruby). Experience in
> genomics or a related biological field is also critical. Knowledge
> of BioRuby will obviously help, as well as familiarity with XML.
> Mentors Christian Zmasek
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
From czmasek at burnham.org Wed Apr 1 18:20:23 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Wed, 1 Apr 2009 11:20:23 -0700
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D343E7.1080007@ingm.it>
References: <49D2C051.9090002@burnham.org> <49D343E7.1080007@ingm.it>
Message-ID: <49D3B067.3050101@burnham.org>
Hi Raoul:
Thank you for your reply.
While your project sounds interesting and much needed, the project I was
referring to is to implement phyloXML (www.phyloxml.org) support in BioRuby.
Given that the deadline for Google Summer of Code projects is April 3rd,
I think it would be almost impossible to propose a new project and find
a student for it before this deadline.
Sincerely,
Christian
Raoul JP Bonnal wrote:
> Dear Christian,
> one of the outcomes from Biohackathon 09, for me, was to implement Phylo
> into BioSQL, could be of interest for this proposal ?
>
> Let me know.
>
> Christian M Zmasek ha scritto:
>
>
>> Hi:
>>
>> As you probably read on previous messages, NESCent is again having a
>> Google Summer of Code program
>> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
>>
>> I am serving as the mentor of a project entitled "phyloXML support in
>> BioRuby" (see:
>> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
>>
>>
>> In order to ensure a successful outcome (should the project be
>> accepted), as well as to improve the chances of being accepted, I am
>> looking for people willing to serve as co-mentors.
>>
>> Christian
>>
>>
>> PS: Here is the full description of the project:
>>
>>
>> phyloXML support in BioRuby
>>
>> Rationale Evolutionary trees are central to comparative genomics
>> studies.
>> Trees used in this context are usually annotated with a variety of
>> data elements, such as taxonomic information, genome-related data
>> (gene names, functional annotations) and gene duplication events, as
>> well as information related to the evolutionary tree itself (branch
>> lengths, support values). phyloXML is an XML data exchange standard
>> that can represent this data. Trees in phyloXML format can be
>> displayed and analyzed with Archaeopteryx
>> (the successor to ATV
>>
>> ),
>>
>> which also allows manipulation and navigation of the tree. While
>> tools exist to convert other formats (such as the widely used Newick
>> and Nexus formats) to phyloXML, there is currently support for
>> phyloXML in only one of the open source Bio* projects (in BioPerl
>> , as a result of
>> Google's Summer of Code 2008). Approach Build phyloXML support
>> in the increasingly popular, dynamic, and
>> fully objected oriented language Ruby. More specifically, extend the
>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>> just been released). This will entail (i) the development of objects
>> to represent all the elements of phyloXML (sequences, taxonomic
>> data, annotations, etc), (ii) the development of a parser to read in
>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
>> data elements specific to phyloXML to the tree classes
>> already in BioRuby while maintaining the standards of the BioRuby
>> project. Development of a time and memory efficient phyloXML parser
>> (the parser has to be able to process trees with thousands of
>> external nodes, at least). Involved toolkits or projects BioRuby
>> , phyloXML Degree
>> of difficulty and needed skills Medium. Requires experience in an
>> object oriented programming
>> language (such as C++, Java, or, ideally, Ruby). Experience in
>> genomics or a related biological field is also critical. Knowledge
>> of BioRuby will obviously help, as well as familiarity with XML.
>> Mentors Christian Zmasek
>>
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
>
>
From czmasek at burnham.org Wed Apr 1 18:31:12 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Wed, 1 Apr 2009 11:31:12 -0700
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu>
References: <49D2C051.9090002@burnham.org>
<2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu>
Message-ID: <49D3B2F0.5030605@burnham.org>
Hi, Joel:
From my own experience as a co-mentor last year, the time requirement
is rather limited.
The project I co-mentored was "phyloXML in BioRuby". Since I am
knowledgeable about phyloXML I mainly helped out with questions related
to it.
Of course, for this year's project "phyloXML in BioRuby", it will be the
opposite. Since I am the main-mentor I will/am handling all the issues
related to the application process/midterm
evaluation/final-evaluation/phyloXML. But I would feel more comfortable,
if I had one or two co-mentors with significant Ruby and BioRuby
experience who could help out if questions specific to these languages
arise.
Christian
PS: The project still needs to be accepted by the Nescent organization.
Last year, there were about 15 projects, and only five got accepted
(since Google gave Nescent five slots, Nescent had to decide which
projects they liked the most).
Joel Dudley wrote:
> I am interested. What's involved in being a co-mentor? I am one of the
> authors of MEGA (http://www.megasoftware.net) and I have been
> programming with Ruby daily for about 3 years now.
>
> -- Joel
>
> On Mar 31, 2009, at 6:16 PM, Christian M Zmasek wrote:
>
>
>> Hi:
>>
>> As you probably read on previous messages, NESCent is again having a
>> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
>> ).
>>
>> I am serving as the mentor of a project entitled "phyloXML support
>> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
>> #phyloXML_support_in_BioRuby).
>>
>> In order to ensure a successful outcome (should the project be
>> accepted), as well as to improve the chances of being accepted, I am
>> looking for people willing to serve as co-mentors.
>>
>> Christian
>>
>>
>> PS: Here is the full description of the project:
>>
>>
>> phyloXML support in BioRuby
>>
>> Rationale Evolutionary trees are central to comparative genomics
>> studies.
>> Trees used in this context are usually annotated with a variety of
>> data elements, such as taxonomic information, genome-related data
>> (gene names, functional annotations) and gene duplication events, as
>> well as information related to the evolutionary tree itself (branch
>> lengths, support values). phyloXML is an XML data exchange standard
>> that can represent this data. Trees in phyloXML format can be
>> displayed and analyzed with Archaeopteryx
>> (the successor to ATV
>> >
>>> ),
>>>
>> which also allows manipulation and navigation of the tree. While
>> tools exist to convert other formats (such as the widely used Newick
>> and Nexus formats) to phyloXML, there is currently support for
>> phyloXML in only one of the open source Bio* projects (in BioPerl
>> , as a result of
>> Google's Summer of Code 2008). Approach Build phyloXML support
>> in the increasingly popular, dynamic, and
>> fully objected oriented language Ruby. More specifically, extend the
>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>> just been released). This will entail (i) the development of objects
>> to represent all the elements of phyloXML (sequences, taxonomic
>> data, annotations, etc), (ii) the development of a parser to read in
>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
>> data elements specific to phyloXML to the tree classes
>> already in BioRuby while maintaining the standards of the BioRuby
>> project. Development of a time and memory efficient phyloXML parser
>> (the parser has to be able to process trees with thousands of
>> external nodes, at least). Involved toolkits or projects
>> BioRuby , phyloXML > www.phyloxml.org> Degree of difficulty and needed skills Medium.
>> Requires experience in an object oriented programming
>> language (such as C++, Java, or, ideally, Ruby). Experience in
>> genomics or a related biological field is also critical. Knowledge
>> of BioRuby will obviously help, as well as familiarity with XML.
>> Mentors Christian Zmasek
>>
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
> =============Email Policy=============
> I now only check mail at the following times:
> - 8 a.m., Noon, 5 p.m. Mon-Fri
> - 8 a.m. Sat-Sun
> If this is a true emergency, or you require an
> immediate response from me, please contact
> me via my mobile phone: (408) 464-8258
> ====================================
>
>
From ngoto at gen-info.osaka-u.ac.jp Thu Apr 2 05:08:38 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Thu, 2 Apr 2009 14:08:38 +0900
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D3B2F0.5030605@burnham.org>
References: <49D2C051.9090002@burnham.org>
<2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu>
<49D3B2F0.5030605@burnham.org>
Message-ID: <20090402050839.73F151CBC3BD@idnmail.gen-info.osaka-u.ac.jp>
Hi Christian,
I'll answer any questions about BioRuby as much as I can.
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
On Wed, 1 Apr 2009 11:31:12 -0700
Christian M Zmasek wrote:
> Hi, Joel:
>
> From my own experience as a co-mentor last year, the time requirement
> is rather limited.
> The project I co-mentored was "phyloXML in BioRuby". Since I am
> knowledgeable about phyloXML I mainly helped out with questions related
> to it.
>
> Of course, for this year's project "phyloXML in BioRuby", it will be the
> opposite. Since I am the main-mentor I will/am handling all the issues
> related to the application process/midterm
> evaluation/final-evaluation/phyloXML. But I would feel more comfortable,
> if I had one or two co-mentors with significant Ruby and BioRuby
> experience who could help out if questions specific to these languages
> arise.
>
> Christian
>
> PS: The project still needs to be accepted by the Nescent organization.
> Last year, there were about 15 projects, and only five got accepted
> (since Google gave Nescent five slots, Nescent had to decide which
> projects they liked the most).
>
>
>
> Joel Dudley wrote:
> > I am interested. What's involved in being a co-mentor? I am one of the
> > authors of MEGA (http://www.megasoftware.net) and I have been
> > programming with Ruby daily for about 3 years now.
> >
> > -- Joel
> >
> > On Mar 31, 2009, at 6:16 PM, Christian M Zmasek wrote:
> >
> >
> >> Hi:
> >>
> >> As you probably read on previous messages, NESCent is again having a
> >> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
> >> ).
> >>
> >> I am serving as the mentor of a project entitled "phyloXML support
> >> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
> >> #phyloXML_support_in_BioRuby).
> >>
> >> In order to ensure a successful outcome (should the project be
> >> accepted), as well as to improve the chances of being accepted, I am
> >> looking for people willing to serve as co-mentors.
> >>
> >> Christian
> >>
> >>
> >> PS: Here is the full description of the project:
> >>
> >>
> >> phyloXML support in BioRuby
> >>
> >> Rationale Evolutionary trees are central to comparative genomics
> >> studies.
> >> Trees used in this context are usually annotated with a variety of
> >> data elements, such as taxonomic information, genome-related data
> >> (gene names, functional annotations) and gene duplication events, as
> >> well as information related to the evolutionary tree itself (branch
> >> lengths, support values). phyloXML is an XML data exchange standard
> >> that can represent this data. Trees in phyloXML format can be
> >> displayed and analyzed with Archaeopteryx
> >> (the successor to ATV
> >> >>
> >>> ),
> >>>
> >> which also allows manipulation and navigation of the tree. While
> >> tools exist to convert other formats (such as the widely used Newick
> >> and Nexus formats) to phyloXML, there is currently support for
> >> phyloXML in only one of the open source Bio* projects (in BioPerl
> >> , as a result of
> >> Google's Summer of Code 2008). Approach Build phyloXML support
> >> in the increasingly popular, dynamic, and
> >> fully objected oriented language Ruby. More specifically, extend the
> >> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
> >> just been released). This will entail (i) the development of objects
> >> to represent all the elements of phyloXML (sequences, taxonomic
> >> data, annotations, etc), (ii) the development of a parser to read in
> >> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
> >> data elements specific to phyloXML to the tree classes
> >> already in BioRuby while maintaining the standards of the BioRuby
> >> project. Development of a time and memory efficient phyloXML parser
> >> (the parser has to be able to process trees with thousands of
> >> external nodes, at least). Involved toolkits or projects
> >> BioRuby , phyloXML >> www.phyloxml.org> Degree of difficulty and needed skills Medium.
> >> Requires experience in an object oriented programming
> >> language (such as C++, Java, or, ideally, Ruby). Experience in
> >> genomics or a related biological field is also critical. Knowledge
> >> of BioRuby will obviously help, as well as familiarity with XML.
> >> Mentors Christian Zmasek
> >>
> >>
> >> _______________________________________________
> >> BioRuby mailing list
> >> BioRuby at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioruby
> >>
From hlapp at gmx.net Thu Apr 2 05:25:43 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 2 Apr 2009 01:25:43 -0400
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D3B2F0.5030605@burnham.org>
References: <49D2C051.9090002@burnham.org>
<2527A21F-A983-45CF-A814-C300FDD3C091@stanford.edu>
<49D3B2F0.5030605@burnham.org>
Message-ID:
On Apr 1, 2009, at 2:31 PM, Christian M Zmasek wrote:
> But I would feel more comfortable, if I had one or two co-mentors
> with significant Ruby and BioRuby experience
I would echo this and add that from NESCent's perspective as the
mentoring organization having a committed co-mentor as a backup when
the primary mentor for whatever reason is temporarily incommunicado is
highly desirable.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
From bonnalraoul at ingm.it Thu Apr 2 08:43:34 2009
From: bonnalraoul at ingm.it (Raoul JP Bonnal)
Date: Thu, 02 Apr 2009 10:43:34 +0200
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D3B067.3050101@burnham.org>
References: <49D2C051.9090002@burnham.org> <49D343E7.1080007@ingm.it>
<49D3B067.3050101@burnham.org>
Message-ID: <49D47AB6.8090304@ingm.it>
Ok, probably was not so clear, I'm available for co-mentoring if you want.
Christian M Zmasek ha scritto:
> Hi Raoul:
>
> Thank you for your reply.
> While your project sounds interesting and much needed, the project I
> was referring to is to implement phyloXML (www.phyloxml.org) support
> in BioRuby.
> Given that the deadline for Google Summer of Code projects is April
> 3rd, I think it would be almost impossible to propose a new project
> and find a student for it before this deadline.
>
> Sincerely,
>
> Christian
>
>
>
> Raoul JP Bonnal wrote:
>> Dear Christian,
>> one of the outcomes from Biohackathon 09, for me, was to implement
>> Phylo into BioSQL, could be of interest for this proposal ?
>>
>> Let me know.
>>
>> Christian M Zmasek ha scritto:
>>
>>
>>> Hi:
>>>
>>> As you probably read on previous messages, NESCent is again having a
>>> Google Summer of Code program
>>> (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
>>>
>>> I am serving as the mentor of a project entitled "phyloXML support
>>> in BioRuby" (see:
>>> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
>>>
>>>
>>> In order to ensure a successful outcome (should the project be
>>> accepted), as well as to improve the chances of being accepted, I am
>>> looking for people willing to serve as co-mentors.
>>>
>>> Christian
>>>
>>>
>>> PS: Here is the full description of the project:
>>>
>>>
>>> phyloXML support in BioRuby
>>>
>>> Rationale Evolutionary trees are central to comparative genomics
>>> studies.
>>> Trees used in this context are usually annotated with a variety of
>>> data elements, such as taxonomic information, genome-related data
>>> (gene names, functional annotations) and gene duplication events, as
>>> well as information related to the evolutionary tree itself (branch
>>> lengths, support values). phyloXML is an XML data exchange standard
>>> that can represent this data. Trees in phyloXML format can be
>>> displayed and analyzed with Archaeopteryx
>>> (the successor to ATV
>>>
>>> ),
>>>
>>> which also allows manipulation and navigation of the tree. While
>>> tools exist to convert other formats (such as the widely used Newick
>>> and Nexus formats) to phyloXML, there is currently support for
>>> phyloXML in only one of the open source Bio* projects (in BioPerl
>>> , as a result of
>>> Google's Summer of Code 2008). Approach Build phyloXML support
>>> in the increasingly popular, dynamic, and
>>> fully objected oriented language Ruby. More specifically, extend the
>>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>>> just been released). This will entail (i) the development of objects
>>> to represent all the elements of phyloXML (sequences, taxonomic
>>> data, annotations, etc), (ii) the development of a parser to read in
>>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
>>> data elements specific to phyloXML to the tree classes
>>> already in BioRuby while maintaining the standards of the BioRuby
>>> project. Development of a time and memory efficient phyloXML parser
>>> (the parser has to be able to process trees with thousands of
>>> external nodes, at least). Involved toolkits or projects
>>> BioRuby , phyloXML
>>> Degree of difficulty and needed skills
>>> Medium. Requires experience in an object oriented programming
>>> language (such as C++, Java, or, ideally, Ruby). Experience in
>>> genomics or a related biological field is also critical. Knowledge
>>> of BioRuby will obviously help, as well as familiarity with XML.
>>> Mentors Christian Zmasek
>>>
>>>
>>> _______________________________________________
>>> BioRuby mailing list
>>> BioRuby at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>
>>
>>
>>
>
From czmasek at burnham.org Thu Apr 2 23:01:13 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Thu, 2 Apr 2009 16:01:13 -0700
Subject: [BioRuby] [PhyloSoC] Project plan for phyloXML integration with
BioRuby
In-Reply-To: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com>
References: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com>
Message-ID: <49D543B9.2050200@burnham.org>
Diane:
Thank you for your interest in this project!
Indeed, the hour is late, and your proposal still needs significant work
in order to be competitive.
I think you already got some comments from Hilmar (I am at work and I
cannot use IRC).
Beside those, I'd like to suggest:
1. please make sure that all the studying is done before the coding
begins (May 23) (i.e. your "week 1" should be during the "community
bonding period").
2. You do not need to develop classes for objects already present in
BioRuby (such as phylogenetic trees). Actually, the less new classes you
have to introduce to better -- reuse!
3. I am happy to see that you include unit test early on, this is good!
You have to make extensive use of BioRuby's test suite.
4. In general, your weekly goals are not described in enough detail. It
might be a good idea to discuss goals, deliverables, anticipated
problems/difficulties (and possible solutions) for each week.
5. Documentation is very important(!)
6. Do you plan to maintain the code after the summer?
Since the abstract is due April 3rd and cannot be changed after that it
is best to concentrate on the abstract first, though.
(The project plan can still be tweaked after April 3rd, I understand.)
In the abstract you have to make clear that you understand the _biology_
behind the project. Why does phyloXML have the elements it has? Why is
it useful? Might have a look at: http://www.phyloxml.org and
http://www.tdwg.org/proceedings/article/view/437.
Can you show that you understand what evolutionary trees are? Where and
why are they used?
Are they only important in 'tree of life' applications (e.g. see:
http://www.liebertonline.com/doi/pdf/10.1089/omi.2006.10.231)
What is "phylogenomics", what is "comparative genomics"?
You could also go to [http://monochrome-effect.net/publications.html]
and have a quick look at some of the papers there, most are related to
the issues at hand, and some show real world applications of
phylogenetic trees.
How might participating in this project help your career? What do plan
to learn? Why are you a good candidate for this?
Hope this helps some,
CZ
Diana Jaunzeikare wrote:
> Hi everybody,
>
> I know this is kinda late and I should have contacted you earlier, but
> better later than never. I found out about Phyloinformatics Summer of
> Code just last night when I was doing homework for Bioinformatics lab
> on Phylogenetic reconstruction and Parsimony. I was reading various
> bio-tech related blogs in Google reader and in the Google Top
> Recomendations bar I saw the blog of The Tree of Life. There I read
> the blog post about Phyloinformatics Summer of Code. This was very
> exciting news for me! I almost jumped off the sofa of the excitement
> :) Already for two years I wanted to participate in Google summer of
> code, but it never really fit with what i was doing at the time. When
> I saw the project about integrating phyloXML with BioRuby I knew it
> was for me! I am a big fun of Ruby! Last semester for my Computational
> Biology seminar I wrote bunch of scripts to deal with PDB database for
> my final project. Also it has been pleasure to develop in Ruby on
> Rails. What is even more exciting, is that my research interests lie
> in Bioinformatics. In fact, I had thoughts before to develop for
> BioRuby, but I didn't have a good enough reason before.
>
> Here is my project plan for building support for phyloXML for BioRuby.
> I think the emphasis should be on the ease of use for biologists and a
> lot of example code.
>
>
> Week1 :
> * Get familiar with BioRuby, its structure, classes (like Bio:Tree),
> coding conventions, documentation conventions. See other
> implementations of XML parsers in BioRuby (like BLAST XML).
> * Get familiar with phyloXML, its structure, typical uses. Get data
> set of many different files in phyloXML format for testing.
>
> Week 2:
> * Try to write a program which would use phyloXMl data in order to
> understand what would be the easiest way to use it. (Later will be
> used for unit testing).
> * Desing the architecture of phyloXML class, parser and writer,
> interface with other classes (like alignment class).
>
> Week 3: Develop the some/basic/the most essential objects of phyloXML
> (Phylogeny, Clade, Taxonomy).
>
> Weeks 4-5: Develop phyloXML parser.
>
> Weeks 6-7: Develop phyloXML writer.
>
> Week 8: Develop the rest of the objects of phyloXML.
>
> Week 9: Update parser.
>
> Week 10: Update writer.
>
> Week 11: Finish up documentation. Write extensive examples of how to
> use the code.
>
> Week 12: Do write up of the project.
>
>
> What do you think about the project plan? Anything missing?
>
> Thanks,
>
> Diana
>
>
> Diana Jaunzeikare
> Smith College
> Computer Science and Math double major '10
> CS Department Liaison and Master Tutor
> email: djaunzei at email.smith.edu
> cell: 413-387-2083
From czmasek at burnham.org Fri Apr 3 03:15:41 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Thu, 2 Apr 2009 20:15:41 -0700
Subject: [BioRuby] [PhyloSoC] Project plan for phyloXML integration with
BioRuby
In-Reply-To: <4057d3bf0904021957o73caa39bu835b949493fcab85@mail.gmail.com>
References: <4057d3bf0904021344t1d7fa4d6nef54867bdfae924@mail.gmail.com>
<49D543B9.2050200@burnham.org>
<4057d3bf0904021957o73caa39bu835b949493fcab85@mail.gmail.com>
Message-ID: <49D57F5D.7070703@burnham.org>
Hi, Diana:
Looks better.
I think you need to point out how evolutionary trees are used outside of
'tree of life' applications (i.e. phylogenomics, phylogeography, gene
function prediction, ...) as those are import applications for which
phyloXML has been designed for.
Also, it is not expected that _you_ benchmark various XML parsers. It's
good enough to rely on published results. The important point is that
you, together with the BioRuby community, determine which one integrates
best with BioRuby (i.e. ideally create no additional dependencies) and
still provides acceptable performance.
CZ
Diana Jaunzeikare wrote:
> Hi,
>
> I posted Abstract and new project plan.
>
> http://socghop.appspot.com/student_proposal/show/google/gsoc2009/dianaj/t123872262150
>
> Diana
>
> On Thu, Apr 2, 2009 at 7:01 PM, Christian M Zmasek
> > wrote:
>
> Diane:
>
> Thank you for your interest in this project!
>
> Indeed, the hour is late, and your proposal still needs
> significant work in order to be competitive.
>
> I think you already got some comments from Hilmar (I am at work
> and I cannot use IRC).
>
> Beside those, I'd like to suggest:
>
> 1. please make sure that all the studying is done before the
> coding begins (May 23) (i.e. your "week 1" should be during the
> "community bonding period").
> 2. You do not need to develop classes for objects already present
> in BioRuby (such as phylogenetic trees). Actually, the less new
> classes you have to introduce to better -- reuse!
> 3. I am happy to see that you include unit test early on, this is
> good! You have to make extensive use of BioRuby's test suite.
> 4. In general, your weekly goals are not described in enough
> detail. It might be a good idea to discuss goals, deliverables,
> anticipated problems/difficulties (and possible solutions) for
> each week.
> 5. Documentation is very important(!)
> 6. Do you plan to maintain the code after the summer?
>
> Since the abstract is due April 3rd and cannot be changed after
> that it is best to concentrate on the abstract first, though.
> (The project plan can still be tweaked after April 3rd, I understand.)
>
> In the abstract you have to make clear that you understand the
> _biology_ behind the project. Why does phyloXML have the elements
> it has? Why is it useful? Might have a look at:
> http://www.phyloxml.org and
> http://www.tdwg.org/proceedings/article/view/437.
>
> Can you show that you understand what evolutionary trees are?
> Where and why are they used?
> Are they only important in 'tree of life' applications (e.g. see:
> http://www.liebertonline.com/doi/pdf/10.1089/omi.2006.10.231)
> What is "phylogenomics", what is "comparative genomics"?
> You could also go to
> [http://monochrome-effect.net/publications.html] and have a quick
> look at some of the papers there, most are related to the issues
> at hand, and some show real world applications of phylogenetic trees.
>
> How might participating in this project help your career? What do
> plan to learn? Why are you a good candidate for this?
>
> Hope this helps some,
>
> CZ
>
>
> Diana Jaunzeikare wrote:
>
> Hi everybody,
>
> I know this is kinda late and I should have contacted you
> earlier, but better later than never. I found out about
> Phyloinformatics Summer of Code just last night when I was
> doing homework for Bioinformatics lab on Phylogenetic
> reconstruction and Parsimony. I was reading various bio-tech
> related blogs in Google reader and in the Google Top
> Recomendations bar I saw the blog of The Tree of Life. There I
> read the blog post about Phyloinformatics Summer of Code. This
> was very exciting news for me! I almost jumped off the sofa of
> the excitement :) Already for two years I wanted to
> participate in Google summer of code, but it never really fit
> with what i was doing at the time. When I saw the project
> about integrating phyloXML with BioRuby I knew it was for me!
> I am a big fun of Ruby! Last semester for my Computational
> Biology seminar I wrote bunch of scripts to deal with PDB
> database for my final project. Also it has been pleasure to
> develop in Ruby on Rails. What is even more exciting, is that
> my research interests lie in Bioinformatics. In fact, I had
> thoughts before to develop for BioRuby, but I didn't have a
> good enough reason before.
>
> Here is my project plan for building support for phyloXML for
> BioRuby. I think the emphasis should be on the ease of use for
> biologists and a lot of example code.
>
>
> Week1 :
> * Get familiar with BioRuby, its structure, classes (like
> Bio:Tree), coding conventions, documentation conventions. See
> other implementations of XML parsers in BioRuby (like BLAST XML).
> * Get familiar with phyloXML, its structure, typical uses.
> Get data set of many different files in phyloXML format for
> testing.
> Week 2:
> * Try to write a program which would use phyloXMl data in
> order to understand what would be the easiest way to use it.
> (Later will be used for unit testing).
> * Desing the architecture of phyloXML class, parser and
> writer, interface with other classes (like alignment class).
> Week 3: Develop the some/basic/the most essential objects of
> phyloXML (Phylogeny, Clade, Taxonomy).
> Weeks 4-5: Develop phyloXML parser.
> Weeks 6-7: Develop phyloXML writer.
> Week 8: Develop the rest of the objects of phyloXML.
> Week 9: Update parser.
>
> Week 10: Update writer. Week 11: Finish up documentation.
> Write extensive examples of how to use the code.
>
> Week 12: Do write up of the project.
>
> What do you think about the project plan? Anything missing?
>
> Thanks,
>
> Diana
>
>
> Diana Jaunzeikare
> Smith College
> Computer Science and Math double major '10
> CS Department Liaison and Master Tutor
> email: djaunzei at email.smith.edu
>
> >
> cell: 413-387-2083
>
>
>
From kenglish at gmail.com Mon Apr 6 00:13:37 2009
From: kenglish at gmail.com (Kevin English)
Date: Sun, 5 Apr 2009 14:13:37 -1000
Subject: [BioRuby] Blast with file as a query option?
Message-ID:
Hello,
I have to very large local fasta files that wish to blast against one
another and parse the results in bio ruby. I'm wondering if there is a way
to mimic the behavior of this blast command:
blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2
where Large_list_sequences_2 is a formatted fasta db. My current
implementation opens Large_list_sequences_1.fasta and goes through it
sequence by sequence. It seems to run pretty slow. I'm wondering if I can in
some way do the above blast command and loop through the results and get a
performance gain.
For any curious, my code is on github:
http://github.com/kenglishhi/bioflexrails/tree/master
The file that is doing the blasts is under app/model/biodatabase.rb.
I'm trying to write a rails app uses biosql db and allows this biologist to
organize his sequences. I'm very new to bioinformatics but have a lot
experience with Ruby on Rails.
Thanks in advance for you help.
Kevin
From ngoto at gen-info.osaka-u.ac.jp Tue Apr 7 04:03:03 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 7 Apr 2009 13:03:03 +0900
Subject: [BioRuby] Blast with file as a query option?
In-Reply-To:
References:
Message-ID: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp>
Hi,
On Sun, 5 Apr 2009 14:13:37 -1000
Kevin English wrote:
> Hello,
> I have to very large local fasta files that wish to blast against one
> another and parse the results in bio ruby. I'm wondering if there is a way
> to mimic the behavior of this blast command:
>
> blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2
>
>
> where Large_list_sequences_2 is a formatted fasta db. My current
> implementation opens Large_list_sequences_1.fasta and goes through it
> sequence by sequence. It seems to run pretty slow. I'm wondering if I can in
> some way do the above blast command and loop through the results and get a
> performance gain.
To gain performance, adding options to BLAST is strongly recommended.
-e Expectation value (E) [Real]
default = 10.0
-v Number of database sequences to show one-line descriptions for (V) [Integer]
default = 500
-b Number of database sequence to show alignments for (B) [Integer]
default = 250
Changing above to smaller values will reduce output report size
which means performance gain.
Executing BLAST with multiple query sequences can also gain performance.
In addition, when you have query sequences in a local file, calling
blastall command directly without Bio::Blast may be good.
For example,
require 'bio'
require 'tempfile'
command = %( blastall -p blastn -i Large_list_sequences_1.fasta
-d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 )
tempfile = Tempfile.new('blastout')
tempfile.close(false)
command = command + [ "-o", tempfile.path ]
system(*command)
# After system(), error checks will be needed but skipped.
tempfile.open
ff = Bio::FlatFile.open(tempfile)
ff.each do |report|
# For example, prints query_def and target_def
report.each do |hit|
print report.query_def, "\t", hit.target_def, "\n"
end
end
ff.close
tempfile.close(true)
> For any curious, my code is on github:
>
> http://github.com/kenglishhi/bioflexrails/tree/master
>
> The file that is doing the blasts is under app/model/biodatabase.rb.
>
> I'm trying to write a rails app uses biosql db and allows this biologist to
> organize his sequences. I'm very new to bioinformatics but have a lot
> experience with Ruby on Rails.
>
> Thanks in advance for you help.
In general, a BLAST search against a very large database takes
very long time, and using batch queueing system might be needed.
Thanks,
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
From donttrustben at gmail.com Tue Apr 7 04:30:09 2009
From: donttrustben at gmail.com (Ben Woodcroft)
Date: Tue, 7 Apr 2009 14:30:09 +1000
Subject: [BioRuby] Blast with file as a query option?
In-Reply-To: <20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp>
References:
<20090407040304.2E8391CBC3CD@idnmail.gen-info.osaka-u.ac.jp>
Message-ID:
And there is the -a flag, for specifying you want to use multiple CPUs.
ben
2009/4/7 Naohisa GOTO
> Hi,
>
> On Sun, 5 Apr 2009 14:13:37 -1000
> Kevin English wrote:
>
> > Hello,
> > I have to very large local fasta files that wish to blast against one
> > another and parse the results in bio ruby. I'm wondering if there is a
> way
> > to mimic the behavior of this blast command:
> >
> > blastall -p blastn -i Large_list_sequences_1.fasta -d
> Large_list_sequences_2
> >
> >
> > where Large_list_sequences_2 is a formatted fasta db. My current
> > implementation opens Large_list_sequences_1.fasta and goes through it
> > sequence by sequence. It seems to run pretty slow. I'm wondering if I can
> in
> > some way do the above blast command and loop through the results and get
> a
> > performance gain.
>
> To gain performance, adding options to BLAST is strongly recommended.
> -e Expectation value (E) [Real]
> default = 10.0
> -v Number of database sequences to show one-line descriptions for (V)
> [Integer]
> default = 500
> -b Number of database sequence to show alignments for (B) [Integer]
> default = 250
>
> Changing above to smaller values will reduce output report size
> which means performance gain.
>
> Executing BLAST with multiple query sequences can also gain performance.
> In addition, when you have query sequences in a local file, calling
> blastall command directly without Bio::Blast may be good.
>
> For example,
>
> require 'bio'
> require 'tempfile'
>
> command = %( blastall -p blastn -i Large_list_sequences_1.fasta
> -d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 )
> tempfile = Tempfile.new('blastout')
> tempfile.close(false)
> command = command + [ "-o", tempfile.path ]
> system(*command)
> # After system(), error checks will be needed but skipped.
> tempfile.open
> ff = Bio::FlatFile.open(tempfile)
> ff.each do |report|
> # For example, prints query_def and target_def
> report.each do |hit|
> print report.query_def, "\t", hit.target_def, "\n"
> end
> end
> ff.close
> tempfile.close(true)
>
> > For any curious, my code is on github:
> >
> > http://github.com/kenglishhi/bioflexrails/tree/master
> >
> > The file that is doing the blasts is under app/model/biodatabase.rb.
> >
> > I'm trying to write a rails app uses biosql db and allows this biologist
> to
> > organize his sequences. I'm very new to bioinformatics but have a lot
> > experience with Ruby on Rails.
> >
> > Thanks in advance for you help.
>
> In general, a BLAST search against a very large database takes
> very long time, and using batch queueing system might be needed.
>
> Thanks,
>
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
--
FYI: My email addresses at unimelb, uq and gmail all redirect to the same
place.
From davide.rambaldi at ifom-ieo-campus.it Tue Apr 7 09:05:48 2009
From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi)
Date: Tue, 7 Apr 2009 11:05:48 +0200
Subject: [BioRuby] bioruby XML and libruby1.8 (ubuntu/debian)
Message-ID:
Hi, I got this ugly bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502535
Reported also by Naohisa Goto.
There is any solution provided by ubuntu/debian?
Or is better to recompile ruby from source?
thanks.
Davide Rambaldi,
Bioinformatics PhD student.
-----------------------------------------------------
Bioinformatic Group IFOM-IEO Campus
Via Adamello 16, Milano
I-20139 Italy
[t] +39 02574303 066
[e] davide.rambaldi at ifom-ieo-campus.it
[i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi
(homepage)
[i] http://www.semm.it (PhD school)
[i] http://www.btbs.unimib.it/ (Master)
-----------------------------------------------------
From davide.rambaldi at ifom-ieo-campus.it Tue Apr 7 09:09:46 2009
From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi)
Date: Tue, 7 Apr 2009 11:09:46 +0200
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49D2C051.9090002@burnham.org>
References: <49D2C051.9090002@burnham.org>
Message-ID: <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it>
Hi, I am available to help is there is need.
I have a small experience with bioruby but I am really interested in
partecipate in a project with bioruby.
May be I can help in testing the code?
Davide
Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto:
> Hi:
>
> As you probably read on previous messages, NESCent is again having a
> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
> ).
>
> I am serving as the mentor of a project entitled "phyloXML support
> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
> #phyloXML_support_in_BioRuby).
>
> In order to ensure a successful outcome (should the project be
> accepted), as well as to improve the chances of being accepted, I am
> looking for people willing to serve as co-mentors.
>
> Christian
>
>
> PS: Here is the full description of the project:
>
>
> phyloXML support in BioRuby
>
> Rationale Evolutionary trees are central to comparative genomics
> studies.
> Trees used in this context are usually annotated with a variety of
> data elements, such as taxonomic information, genome-related data
> (gene names, functional annotations) and gene duplication events, as
> well as information related to the evolutionary tree itself (branch
> lengths, support values). phyloXML is an XML data exchange standard
> that can represent this data. Trees in phyloXML format can be
> displayed and analyzed with Archaeopteryx
> (the successor to ATV
> >),
> which also allows manipulation and navigation of the tree. While
> tools exist to convert other formats (such as the widely used Newick
> and Nexus formats) to phyloXML, there is currently support for
> phyloXML in only one of the open source Bio* projects (in BioPerl
> , as a result of
> Google's Summer of Code 2008). Approach Build phyloXML support
> in the increasingly popular, dynamic, and
> fully objected oriented language Ruby. More specifically, extend the
> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
> just been released). This will entail (i) the development of objects
> to represent all the elements of phyloXML (sequences, taxonomic
> data, annotations, etc), (ii) the development of a parser to read in
> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
> data elements specific to phyloXML to the tree classes
> already in BioRuby while maintaining the standards of the BioRuby
> project. Development of a time and memory efficient phyloXML parser
> (the parser has to be able to process trees with thousands of
> external nodes, at least). Involved toolkits or projects BioRuby
> , phyloXML Degree
> of difficulty and needed skills Medium. Requires experience in an
> object oriented programming
> language (such as C++, Java, or, ideally, Ruby). Experience in
> genomics or a related biological field is also critical. Knowledge
> of BioRuby will obviously help, as well as familiarity with XML.
> Mentors Christian Zmasek
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
Davide Rambaldi,
Bioinformatics PhD student.
-----------------------------------------------------
Bioinformatic Group IFOM-IEO Campus
Via Adamello 16, Milano
I-20139 Italy
[t] +39 02574303 066
[e] davide.rambaldi at ifom-ieo-campus.it
[i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi
(homepage)
[i] http://www.semm.it (PhD school)
[i] http://www.btbs.unimib.it/ (Master)
-----------------------------------------------------
From ngoto at gen-info.osaka-u.ac.jp Tue Apr 7 11:06:57 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 7 Apr 2009 20:06:57 +0900
Subject: [BioRuby] bioruby XML and libruby1.8 (ubuntu/debian)
In-Reply-To:
References:
Message-ID: <20090407110659.0CD521CBC3F4@idnmail.gen-info.osaka-u.ac.jp>
Hi,
For Debian, the regression have been fixed with DSA-1695-1.
http://www.debian.org/security/2009/dsa-1695
Normally, aptitude update && aptitude upgrade will fix
the problem.
I know nothing about Ubuntu.
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
On Tue, 7 Apr 2009 11:05:48 +0200
Davide Rambaldi wrote:
> Hi, I got this ugly bug:
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502535
>
> Reported also by Naohisa Goto.
>
> There is any solution provided by ubuntu/debian?
>
> Or is better to recompile ruby from source?
>
> thanks.
>
> Davide Rambaldi,
> Bioinformatics PhD student.
> -----------------------------------------------------
> Bioinformatic Group IFOM-IEO Campus
> Via Adamello 16, Milano
> I-20139 Italy
>
> [t] +39 02574303 066
> [e] davide.rambaldi at ifom-ieo-campus.it
> [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi
> (homepage)
> [i] http://www.semm.it (PhD school)
> [i] http://www.btbs.unimib.it/ (Master)
>
> -----------------------------------------------------
>
>
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
From czmasek at burnham.org Wed Apr 8 23:14:35 2009
From: czmasek at burnham.org (Christian M Zmasek)
Date: Wed, 8 Apr 2009 16:14:35 -0700
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it>
References: <49D2C051.9090002@burnham.org>
<4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it>
Message-ID: <49DD2FDB.4080909@burnham.org>
Hi, Davide:
Yes, help in testing code is always very welcome!
Hopefully the project will get accepted!
Christian
Davide Rambaldi wrote:
> Hi, I am available to help is there is need.
>
> I have a small experience with bioruby but I am really interested in
> partecipate in a project with bioruby.
>
> May be I can help in testing the code?
>
> Davide
>
>
> Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto:
>
>
>> Hi:
>>
>> As you probably read on previous messages, NESCent is again having a
>> Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
>> ).
>>
>> I am serving as the mentor of a project entitled "phyloXML support
>> in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
>> #phyloXML_support_in_BioRuby).
>>
>> In order to ensure a successful outcome (should the project be
>> accepted), as well as to improve the chances of being accepted, I am
>> looking for people willing to serve as co-mentors.
>>
>> Christian
>>
>>
>> PS: Here is the full description of the project:
>>
>>
>> phyloXML support in BioRuby
>>
>> Rationale Evolutionary trees are central to comparative genomics
>> studies.
>> Trees used in this context are usually annotated with a variety of
>> data elements, such as taxonomic information, genome-related data
>> (gene names, functional annotations) and gene duplication events, as
>> well as information related to the evolutionary tree itself (branch
>> lengths, support values). phyloXML is an XML data exchange standard
>> that can represent this data. Trees in phyloXML format can be
>> displayed and analyzed with Archaeopteryx
>> (the successor to ATV
>> >
>>> ),
>>>
>> which also allows manipulation and navigation of the tree. While
>> tools exist to convert other formats (such as the widely used Newick
>> and Nexus formats) to phyloXML, there is currently support for
>> phyloXML in only one of the open source Bio* projects (in BioPerl
>> , as a result of
>> Google's Summer of Code 2008). Approach Build phyloXML support
>> in the increasingly popular, dynamic, and
>> fully objected oriented language Ruby. More specifically, extend the
>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>> just been released). This will entail (i) the development of objects
>> to represent all the elements of phyloXML (sequences, taxonomic
>> data, annotations, etc), (ii) the development of a parser to read in
>> phyloXML, and (iii) a phyloXML writer. Challenges Relating the
>> data elements specific to phyloXML to the tree classes
>> already in BioRuby while maintaining the standards of the BioRuby
>> project. Development of a time and memory efficient phyloXML parser
>> (the parser has to be able to process trees with thousands of
>> external nodes, at least). Involved toolkits or projects
>> BioRuby , phyloXML > www.phyloxml.org> Degree of difficulty and needed skills Medium.
>> Requires experience in an object oriented programming
>> language (such as C++, Java, or, ideally, Ruby). Experience in
>> genomics or a related biological field is also critical. Knowledge
>> of BioRuby will obviously help, as well as familiarity with XML.
>> Mentors Christian Zmasek
>>
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
> Davide Rambaldi,
> Bioinformatics PhD student.
> -----------------------------------------------------
> Bioinformatic Group IFOM-IEO Campus
> Via Adamello 16, Milano
> I-20139 Italy
>
> [t] +39 02574303 066
> [e] davide.rambaldi at ifom-ieo-campus.it
> [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi
> (homepage)
> [i] http://www.semm.it (PhD school)
> [i] http://www.btbs.unimib.it/ (Master)
>
> -----------------------------------------------------
>
>
>
>
>
From ixpata at gmail.com Wed Apr 8 23:23:47 2009
From: ixpata at gmail.com (S M)
Date: Wed, 8 Apr 2009 19:23:47 -0400
Subject: [BioRuby] Google Summer of Code BioRuby project: Looking for
co-mentors
In-Reply-To: <49DD2FDB.4080909@burnham.org>
References: <49D2C051.9090002@burnham.org>
<4677BE43-1421-48F1-9ABB-6DAAC98F8CAD@ifom-ieo-campus.it>
<49DD2FDB.4080909@burnham.org>
Message-ID:
I'd be interested in helping test too or in any other way I could
help. I'm not experienced enough to program but maybe I can help on
other ways.
Sent from my iPhone
On Apr 8, 2009, at 7:14 PM, Christian M Zmasek
wrote:
> Hi, Davide:
>
> Yes, help in testing code is always very welcome!
>
> Hopefully the project will get accepted!
>
> Christian
>
>
> Davide Rambaldi wrote:
>> Hi, I am available to help is there is need.
>>
>> I have a small experience with bioruby but I am really interested
>> in partecipate in a project with bioruby.
>>
>> May be I can help in testing the code?
>>
>> Davide
>>
>>
>> Il giorno 01/apr/09, alle ore 03:16, Christian M Zmasek ha scritto:
>>
>>
>>> Hi:
>>>
>>> As you probably read on previous messages, NESCent is again having
>>> a Google Summer of Code program (http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009
>>> ).
>>>
>>> I am serving as the mentor of a project entitled "phyloXML
>>> support in BioRuby" (see: https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009
>>> #phyloXML_support_in_BioRuby).
>>>
>>> In order to ensure a successful outcome (should the project be
>>> accepted), as well as to improve the chances of being accepted, I
>>> am looking for people willing to serve as co-mentors.
>>>
>>> Christian
>>>
>>>
>>> PS: Here is the full description of the project:
>>>
>>>
>>> phyloXML support in BioRuby
>>>
>>> Rationale Evolutionary trees are central to comparative
>>> genomics studies.
>>> Trees used in this context are usually annotated with a variety of
>>> data elements, such as taxonomic information, genome-related data
>>> (gene names, functional annotations) and gene duplication events,
>>> as
>>> well as information related to the evolutionary tree itself (branch
>>> lengths, support values). phyloXML is an XML data exchange standard
>>> that can represent this data. Trees in phyloXML format can be
>>> displayed and analyzed with Archaeopteryx
>>> (the successor to ATV
>>> >>> ),
>>>>
>>> which also allows manipulation and navigation of the tree. While
>>> tools exist to convert other formats (such as the widely used
>>> Newick
>>> and Nexus formats) to phyloXML, there is currently support for
>>> phyloXML in only one of the open source Bio* projects (in BioPerl
>>> , as a result of
>>> Google's Summer of Code 2008). Approach Build phyloXML
>>> support in the increasingly popular, dynamic, and
>>> fully objected oriented language Ruby. More specifically, extend
>>> the
>>> open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
>>> just been released). This will entail (i) the development of
>>> objects
>>> to represent all the elements of phyloXML (sequences, taxonomic
>>> data, annotations, etc), (ii) the development of a parser to read
>>> in
>>> phyloXML, and (iii) a phyloXML writer. Challenges Relating
>>> the data elements specific to phyloXML to the tree classes
>>> already in BioRuby while maintaining the standards of the BioRuby
>>> project. Development of a time and memory efficient phyloXML parser
>>> (the parser has to be able to process trees with thousands of
>>> external nodes, at least). Involved toolkits or projects
>>> BioRuby , phyloXML >> > Degree of difficulty and needed skills Medium. Requires
>>> experience in an object oriented programming
>>> language (such as C++, Java, or, ideally, Ruby). Experience in
>>> genomics or a related biological field is also critical. Knowledge
>>> of BioRuby will obviously help, as well as familiarity with XML.
>>> Mentors Christian Zmasek
>>>
>>>
>>> _______________________________________________
>>> BioRuby mailing list
>>> BioRuby at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>
>>
>> Davide Rambaldi,
>> Bioinformatics PhD student.
>> -----------------------------------------------------
>> Bioinformatic Group IFOM-IEO Campus
>> Via Adamello 16, Milano
>> I-20139 Italy
>>
>> [t] +39 02574303 066
>> [e] davide.rambaldi at ifom-ieo-campus.it
>> [i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/
>> DavideRambaldi (homepage)
>> [i] http://www.semm.it (PhD school)
>> [i] http://www.btbs.unimib.it/ (Master)
>>
>> -----------------------------------------------------
>>
>>
>>
>>
>>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
From mkikkawa at gmail.com Fri Apr 10 07:20:19 2009
From: mkikkawa at gmail.com (Masahide Kikkawa)
Date: Fri, 10 Apr 2009 16:20:19 +0900
Subject: [BioRuby] Bio::PubMed.efetch, bug?
Message-ID: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com>
Hi,
I recently installed bioruby-1.3.0 and encountered a bug.
Here is what I did
result = Bio::PubMed.efetch([123,456,789])
I expected the result to be Array of MEDLINE formatted String
as documented here:
http://bioruby.org/rdoc/classes/Bio/PubMed.html
However, the return value was one String containing all the results.
In the bioruby-1.2.1, it does return Array of MEDLINE formatted String.
Therefore I would recommend to apply following patch to ncbirest.rb
<
< result.strip!
< if opts["retmode"] == "text"
< result = result.split(/\n\n+/)
< end
<
< return result
---
> return result.strip
> #return result.strip.split(/\n\n+/)
---------------------------------------------------------------
Masahide Kikkawa, M.D., Ph. D.
Professor
Structural Biology
Graduate School of Science
Kyoto University
Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502
JAPAN
http://structure.biophys.kyoto-u.ac.jp/
Tel: +81-75-753-9421
FAX: +81-75-753-4218
---------------------------------------------------------------
From jan.aerts at gmail.com Fri Apr 10 07:30:32 2009
From: jan.aerts at gmail.com (Jan Aerts)
Date: Fri, 10 Apr 2009 08:30:32 +0100
Subject: [BioRuby] searching through large sets of features
Message-ID: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com>
Hi all,
I need to be able to handle millions of features for a visualization and
therefore be able to search through them efficiently. I've been made aware
of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data structure
that could do just that. One of the issues that this approach would solve is
that it doesn't need to load every single feature when the display
resolution is not high enough (no use trying to plot 5 million points next
to each other on a screen of 800 pixels wide; you can only plot 800
points....)
I'm now trying to write a ruby-implementation of an R-Tree that is more fit
for loci (i.e. one less dimension than an R-Tree, and has to handle
chromosome as well) at http://github.com/jandot/locustree. I'm trying to
make it a module that can be mixed in, so anything which has a chromosome,
start and stop can be indexed and searched using this R-Tree adaptation.
Does anyone know a ruby-implementation of R-Tree? Or is anyone interested in
helping me out?
Thanks,
jan.
From mail at michaelbarton.me.uk Fri Apr 10 13:33:30 2009
From: mail at michaelbarton.me.uk (Michael Barton)
Date: Fri, 10 Apr 2009 14:33:30 +0100
Subject: [BioRuby] searching through large sets of features
In-Reply-To:
References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com>
Message-ID:
There's also some ruby searching algorithms here
http://algorithms.rubyforge.org/
I'm happy to try and help if I can
Cheers
Mike
2009/4/10 Michael Barton
> Would a bloom tree fit this problem?
> http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/
>
> 2009/4/10 Jan Aerts
>
> Hi all,
>> I need to be able to handle millions of features for a visualization and
>> therefore be able to search through them efficiently. I've been made aware
>> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data
>> structure
>> that could do just that. One of the issues that this approach would solve
>> is
>> that it doesn't need to load every single feature when the display
>> resolution is not high enough (no use trying to plot 5 million points next
>> to each other on a screen of 800 pixels wide; you can only plot 800
>> points....)
>>
>> I'm now trying to write a ruby-implementation of an R-Tree that is more
>> fit
>> for loci (i.e. one less dimension than an R-Tree, and has to handle
>> chromosome as well) at http://github.com/jandot/locustree. I'm trying to
>> make it a module that can be mixed in, so anything which has a chromosome,
>> start and stop can be indexed and searched using this R-Tree adaptation.
>>
>> Does anyone know a ruby-implementation of R-Tree? Or is anyone interested
>> in
>> helping me out?
>>
>> Thanks,
>> jan.
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
>
From jan.aerts at gmail.com Fri Apr 10 13:57:20 2009
From: jan.aerts at gmail.com (Jan Aerts)
Date: Fri, 10 Apr 2009 14:57:20 +0100
Subject: [BioRuby] searching through large sets of features
In-Reply-To:
References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com>
Message-ID: <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com>
That is a _really_ good resource. Unfortunately, R-Trees are not in there
(or I didn't look well enough). I was thinking of sending him an email to
ask if he could quickly add that to his library :-), but saw that he's
switching to python. (But that particular announcement was on the 1st of
April, so let's hope it's a joke)
j.
2009/4/10 Michael Barton
> There's also some ruby searching algorithms here
> http://algorithms.rubyforge.org/
>
> I'm happy to try and help if I can
>
> Cheers
>
> Mike
>
> 2009/4/10 Michael Barton
>
> > Would a bloom tree fit this problem?
> >
> http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/
> >
> > 2009/4/10 Jan Aerts
> >
> > Hi all,
> >> I need to be able to handle millions of features for a visualization and
> >> therefore be able to search through them efficiently. I've been made
> aware
> >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data
> >> structure
> >> that could do just that. One of the issues that this approach would
> solve
> >> is
> >> that it doesn't need to load every single feature when the display
> >> resolution is not high enough (no use trying to plot 5 million points
> next
> >> to each other on a screen of 800 pixels wide; you can only plot 800
> >> points....)
> >>
> >> I'm now trying to write a ruby-implementation of an R-Tree that is more
> >> fit
> >> for loci (i.e. one less dimension than an R-Tree, and has to handle
> >> chromosome as well) at http://github.com/jandot/locustree. I'm trying
> to
> >> make it a module that can be mixed in, so anything which has a
> chromosome,
> >> start and stop can be indexed and searched using this R-Tree adaptation.
> >>
> >> Does anyone know a ruby-implementation of R-Tree? Or is anyone
> interested
> >> in
> >> helping me out?
> >>
> >> Thanks,
> >> jan.
> >> _______________________________________________
> >> BioRuby mailing list
> >> BioRuby at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioruby
> >>
> >
> >
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
From mail at michaelbarton.me.uk Fri Apr 10 14:51:10 2009
From: mail at michaelbarton.me.uk (Michael Barton)
Date: Fri, 10 Apr 2009 15:51:10 +0100
Subject: [BioRuby] searching through large sets of features
In-Reply-To: <4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com>
References: <4c7507a70904100030p26ebbcb5l7e106d20be0c6e21@mail.gmail.com>
<4c7507a70904100657x18d5fbc0oaded0e04a0f127ce@mail.gmail.com>
Message-ID:
You could use the kd-tree to narrow the search to the 1000 features each
side of the middle point, then traverse this set to exclude those outside of
the visualisation range. It's a bit of hack but it could save creating an
R-tree if there's no existing implementation.
2009/4/10 Jan Aerts
> That is a _really_ good resource. Unfortunately, R-Trees are not in there
> (or I didn't look well enough). I was thinking of sending him an email to
> ask if he could quickly add that to his library :-), but saw that he's
> switching to python. (But that particular announcement was on the 1st of
> April, so let's hope it's a joke)
> j.
>
>
> 2009/4/10 Michael Barton
>
>> There's also some ruby searching algorithms here
>> http://algorithms.rubyforge.org/
>>
>> I'm happy to try and help if I can
>>
>> Cheers
>>
>> Mike
>>
>> 2009/4/10 Michael Barton
>>
>> > Would a bloom tree fit this problem?
>> >
>> http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/
>> >
>> > 2009/4/10 Jan Aerts
>> >
>> > Hi all,
>> >> I need to be able to handle millions of features for a visualization
>> and
>> >> therefore be able to search through them efficiently. I've been made
>> aware
>> >> of a so-called R-Tree (http://en.wikipedia.org/wiki/R-tree) data
>> >> structure
>> >> that could do just that. One of the issues that this approach would
>> solve
>> >> is
>> >> that it doesn't need to load every single feature when the display
>> >> resolution is not high enough (no use trying to plot 5 million points
>> next
>> >> to each other on a screen of 800 pixels wide; you can only plot 800
>> >> points....)
>> >>
>> >> I'm now trying to write a ruby-implementation of an R-Tree that is more
>> >> fit
>> >> for loci (i.e. one less dimension than an R-Tree, and has to handle
>> >> chromosome as well) at http://github.com/jandot/locustree. I'm trying
>> to
>> >> make it a module that can be mixed in, so anything which has a
>> chromosome,
>> >> start and stop can be indexed and searched using this R-Tree
>> adaptation.
>> >>
>> >> Does anyone know a ruby-implementation of R-Tree? Or is anyone
>> interested
>> >> in
>> >> helping me out?
>> >>
>> >> Thanks,
>> >> jan.
>> >> _______________________________________________
>> >> BioRuby mailing list
>> >> BioRuby at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioruby
>> >>
>> >
>> >
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
>
From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 13 04:16:00 2009
From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson)
Date: Mon, 13 Apr 2009 13:16:00 +0900
Subject: [BioRuby] Parsing MSF alignment file
Message-ID: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp>
I tried to parse an alignment file given in the MSF format by using
Bio::GCG::Msf. It turned out though that points/dots (.) were used as a
gap character in the alignment and that can't be handled by
Bio::GCG::Msf. So, for what it's worth, I made these changes to
bio/appl/gcg/msf.rb:
$ diff msf.rb.old msf.rb.new
33,35c33,36
< if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ str[/.*/] then
< @heading = str[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like
this
< str.sub!(/.*/, '')
---
> preamble, at data = str.split(/^\/\/$/)
> if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ preamble[/.*/] then
> @heading = preamble[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0'
or like this
> preamble.sub!(/.*/, '')
37c38
< str.sub!(/.*\.\.$/m, '')
---
> preamble.sub!(/.*\.\.$/m, '')
48,49d48
< str.sub!(/.*\/\/$/m, '')
< a = $&.to_s.split(/^/)
51c50
< a.each do |x|
---
> preamble.split(/^/).each do |x|
59d57
< @data = str
Best regards,
Fredrik Johansson
--
***********************************
Fredrik Johansson, grad. student
Division of Bioinformatics
Medical Institute of Bioregulation
Kyushu University
3-1-1 Maidashi, Higashi-ku
Fukuoka 812-8582, Japan
fredjoha at bioreg.kyushu-u.ac.jp
***********************************
From ngoto at gen-info.osaka-u.ac.jp Mon Apr 13 12:54:54 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Mon, 13 Apr 2009 21:54:54 +0900
Subject: [BioRuby] Parsing MSF alignment file
In-Reply-To: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp>
References: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp>
Message-ID: <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp>
Hi,
On Mon, 13 Apr 2009 13:16:00 +0900
Fredrik Johansson wrote:
> I tried to parse an alignment file given in the MSF format by using
> Bio::GCG::Msf. It turned out though that points/dots (.) were used as a
> gap character in the alignment and that can't be handled by
> Bio::GCG::Msf. So, for what it's worth, I made these changes to
> bio/appl/gcg/msf.rb:
Thank you very much. Do you mean that Bio::GCG::Msf fails to
parse an alignment that contain two contiguous gaps (..) in the
end of a line? This is a bug, and the patch will soon be
applied to the git repository.
> $ diff msf.rb.old msf.rb.new
In the next time, please use "diff -u" (unified context format).
Thanks,
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 13 13:19:27 2009
From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson)
Date: Mon, 13 Apr 2009 22:19:27 +0900
Subject: [BioRuby] Parsing MSF alignment file
In-Reply-To: <20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp>
References: <49E2BC80.6070502@bioreg.kyushu-u.ac.jp>
<20090413125455.7EDCE1CBC4FD@idnmail.gen-info.osaka-u.ac.jp>
Message-ID: <49E33BDF.70200@bioreg.kyushu-u.ac.jp>
Yes, that's what happened. A regular expression matched all the way to
the last occurrence of two dots at the end of a line.
> Thank you very much. Do you mean that Bio::GCG::Msf fails to
> parse an alignment that contain two contiguous gaps (..) in the
> end of a line? This is a bug, and the patch will soon be
> applied to the git repository.
>
>
>> $ diff msf.rb.old msf.rb.new
>>
>
> In the next time, please use "diff -u" (unified context format).
>
>
Ok, I'll attach the output of diff -u here:
--- /usr/lib/ruby/gems/1.8/gems/bio-1.3.0/lib/bio/appl/gcg/msf.rb.old
2009-04-13 11:32:53.000000000 +0900
+++ /usr/lib/ruby/gems/1.8/gems/bio-1.3.0/lib/bio/appl/gcg/msf.rb
2009-04-13 13:36:26.000000000 +0900
@@ -30,11 +30,12 @@
# Creates a new Msf object.
def initialize(str)
str = str.sub(/\A[\r\n]+/, '')
- if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ str[/.*/] then
- @heading = str[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this
- str.sub!(/.*/, '')
+ preamble, at data = str.split(/^\/\/$/)
+ if /^\!\![A-Z]+\_MULTIPLE\_ALIGNMNENT/ =~ preamble[/.*/] then
+ @heading = preamble[/.*/] # '!!NA_MULTIPLE_ALIGNMENT 1.0' or
like this
+ preamble.sub!(/.*/, '')
end
- str.sub!(/.*\.\.$/m, '')
+ preamble.sub!(/.*\.\.$/m, '')
@description = $&.to_s.sub(/^.*\.\.$/, '').to_s
d = $&.to_s
if m =
/(.+)\s+MSF\:\s+(\d+)\s+Type\:\s+(\w)\s+(.+)\s+(Comp)?Check\:\s+(\d+)/.match(d)
then
@@ -45,10 +46,8 @@
@checksum = (m[6] ? m[6].to_i : nil)
end
- str.sub!(/.*\/\/$/m, '')
- a = $&.to_s.split(/^/)
@seq_info = []
- a.each do |x|
+ preamble.split(/^/).each do |x|
if /Name\: / =~ x then
s = {}
x.scan(/(\S+)\: +(\S*)/) { |y| s[$1] = $2 }
@@ -56,7 +55,6 @@
end
end
- @data = str
@description.sub!(/\A(\r\n|\r|\n)/, '')
@align = nil
end
From andrew.j.grimm at gmail.com Tue Apr 14 09:30:16 2009
From: andrew.j.grimm at gmail.com (Andrew Grimm)
Date: Tue, 14 Apr 2009 19:30:16 +1000
Subject: [BioRuby] Status of bioruby with Debian packages
Message-ID:
According to http://packages.qa.debian.org/b/bioruby.html , the latest
bioruby available as a Debian package is 1.1.0 or 1.1.0-1.
Are there plans to update bioruby's debian package?
This isn't an major issue for me - our ubuntu system already has bioruby
installed for ruby 1.8, and I was merely hoping to use ruby 1.9 merely for
faster performance. If a debian update isn't likely, then we might try using
rubygems on ubuntu and hope that the package management police don't arrest
us!
Andrew
PS: Is there anything better than just plain web search for searching
bioruby's mailing list archives?
From davide.rambaldi at ifom-ieo-campus.it Wed Apr 15 09:24:03 2009
From: davide.rambaldi at ifom-ieo-campus.it (Davide Rambaldi)
Date: Wed, 15 Apr 2009 11:24:03 +0200
Subject: [BioRuby] ACE assembly format
Message-ID: <7d40e94d36b1af5fd01fb9a908512aa2@ifom-ieo-campus.it>
Hi all, there is in bioruby a parser for the .ace format?
Something like Bio::Assembly::IO::ace in BioPerl? Many thanks.
Davide Rambaldi
--
[- Il Tuo 5 per Mille a favore della Ricerca dell'Istituto Europeo di
Oncologia
Tutti coloro che presentano il modello Unico, il modello 730 o pi?
semplicemente che ricevono dal proprio datore di lavoro il modello CUD, hanno
la facolt? di scegliere la destinazione del proprio 5 per mille.
Nella
casella riservata al Finanziamento agli Enti della Ricerca Sanitaria inserisci
il codice fiscale dello IEO (08 69 14 40 153) ed apponi la Tua firma.
Il Tuo 5
per Mille verr? destinato alla ricerca contro il cancro dell'Istituto Europeo
di Oncologia
NON COSTA NULLA E NON COMPORTA ALCUN AUMENTO DELLE IMPOSTE DA
VERSARE
NON ? UN'ALTERNATIVA ALL'8 PER MILLE
E' UN GESTO CONCRETO E DI GRANDE
VALORE
Per saperne di pi? vai al sito dello IEO www.ieo.it <
http://www.ieo.it > oppure scrivi a fondazione at ieo.it <
mailto:fondazione at ieo.it>
Segnala ad un amico questa opportunit?
-]
From ngoto at gen-info.osaka-u.ac.jp Wed Apr 15 11:04:00 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Wed, 15 Apr 2009 20:04:00 +0900
Subject: [BioRuby] Status of bioruby with Debian packages
In-Reply-To:
References:
Message-ID: <20090415110401.744641CBC544@idnmail.gen-info.osaka-u.ac.jp>
Hi,
On Tue, 14 Apr 2009 19:30:16 +1000
Andrew Grimm wrote:
> According to http://packages.qa.debian.org/b/bioruby.html , the latest
> bioruby available as a Debian package is 1.1.0 or 1.1.0-1.
>
> Are there plans to update bioruby's debian package?
I don't know, but I'm also a debian user, and I've sent a
wishlist to the Debian BTS.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=524180
In general, Debian specific issues should be asked to Debian
mailing lists, Bug Tracking System, and/or package maintainers.
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
From fredjoha at bioreg.kyushu-u.ac.jp Mon Apr 20 03:35:27 2009
From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson)
Date: Mon, 20 Apr 2009 12:35:27 +0900
Subject: [BioRuby] PubMed.efetch error
Message-ID: <49EBED7F.4010509@bioreg.kyushu-u.ac.jp>
PubMed.efetch does not give me correct output if I give an array as
argument. I think it is supposed to return an array of MEDLINE entries,
but it returns all entries in one string. I can get an array if I type
Bio::PubMed.efetch(entries).split("\n\n")
But that can't be the intended behaviour, right?
Regards,
Fredrik Johansson
--
***********************************
Fredrik Johansson, grad. student
Division of Bioinformatics
Medical Institute of Bioregulation
Kyushu University
3-1-1 Maidashi, Higashi-ku
Fukuoka 812-8582, Japan
fredjoha at bioreg.kyushu-u.ac.jp
***********************************
From tomoakin at kenroku.kanazawa-u.ac.jp Tue Apr 21 08:29:26 2009
From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA)
Date: Tue, 21 Apr 2009 17:29:26 +0900
Subject: [BioRuby] Read/write of simple fasta file increases the identifier.
Message-ID:
Hi,
With bioruby-1.3.0, reading a fasta file and convert to Bio::Sequence
with
to_seq and then write with output(:fasta)
causes the definition being longer each time the file is processed...
Is there a better interface to keep the definition line as the original?
Perhaps this is because, in some format there are cleary separate
entry_id and definition, while it is ambiguous in FASTA format.
However, it is better to be able to easily recover the original
definition line.
The following is the simple example case.
Surely this is quite simple that the entry need not converted
to Bio::Sequence, but in fact I would like to manipulate the sequence
such as getting subsequence or adding some sequence, translate, etc...
% cat > simple_fasta
>abc
acgttgac
% cat fastacat
#!/usr/local/bin/ruby
require 'bio'
ff = Bio::FlatFile.open(Bio::FastaFormat, ARGF)
while fe = ff.next_entry
seq = fe.to_seq;
puts seq.output(:fasta)
end
% ruby fastacat simple_fasta
>abc abc
acgttgac
% ruby fastacat simple_fasta | ruby fastacat
>abc abc abc
acgttgac
Sincerely yours,
--
Tomoaki NISHIYAMA
Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan
From ngoto at gen-info.osaka-u.ac.jp Tue Apr 21 14:17:26 2009
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 21 Apr 2009 23:17:26 +0900
Subject: [BioRuby] Bio::PubMed.efetch, bug?
In-Reply-To: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com>
References: <1A3D4F7C-90BB-400B-B79C-B67BDF255AD8@gmail.com>
Message-ID: <20090421141727.C5B6A1CBC4A8@idnmail.gen-info.osaka-u.ac.jp>
Hi,
On Fri, 10 Apr 2009 16:20:19 +0900
Masahide Kikkawa wrote:
> Hi,
>
>
> I recently installed bioruby-1.3.0 and encountered a bug.
>
> Here is what I did
> result = Bio::PubMed.efetch([123,456,789])
>
> I expected the result to be Array of MEDLINE formatted String
> as documented here:
> http://bioruby.org/rdoc/classes/Bio/PubMed.html
>
> However, the return value was one String containing all the results.
>
> In the bioruby-1.2.1, it does return Array of MEDLINE formatted String.
Thank you for reporting a bug. This is a regression introduced
with the refactoring of Bio::PubMed between 1.2.1 and 1.3.0.
> Therefore I would recommend to apply following patch to ncbirest.rb
>
> <
> < result.strip!
> < if opts["retmode"] == "text"
> < result = result.split(/\n\n+/)
> < end
> <
> < return result
> ---
> > return result.strip
> > #return result.strip.split(/\n\n+/)
>
Bio::NCBI::REST is not only for PubMed but also for many
databases in NCBI, and efetch may return free text,
depending on databases. Thus, the processing of the result
should only be applied to Bio::PubMed.
http://github.com/bioruby/bioruby/commit/a48a9a35b87dead069fe328ba7086977304af995
To confirm bug fix, tests for Bio::PubMed are also added.
http://github.com/bioruby/bioruby/commit/bf5ba6d4503f3ddb0ca31673882f5b396a932bbe
In addition, I found a bug about "retstart" and fixed the bug.
http://github.com/bioruby/bioruby/commit/fc0339fe8a42cd00199cfdc938590ae9626551bc
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org