[BioRuby] Google Summer of Code BioRuby project: Looking for co-mentors
Christian M Zmasek
czmasek at burnham.org
Wed Apr 1 01:16:01 UTC 2009
Hi:
As you probably read on previous messages, NESCent is again having a
Google Summer of Code program
(http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009).
I am serving as the mentor of a project entitled "phyloXML support in
BioRuby" (see:
https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby).
In order to ensure a successful outcome (should the project be
accepted), as well as to improve the chances of being accepted, I am
looking for people willing to serve as co-mentors.
Christian
PS: Here is the full description of the project:
phyloXML support in BioRuby
Rationale
Evolutionary trees are central to comparative genomics studies.
Trees used in this context are usually annotated with a variety of
data elements, such as taxonomic information, genome-related data
(gene names, functional annotations) and gene duplication events, as
well as information related to the evolutionary tree itself (branch
lengths, support values). phyloXML is an XML data exchange standard
that can represent this data. Trees in phyloXML format can be
displayed and analyzed with Archaeopteryx
<http://www.phylosoft.org/archaeopteryx/> (the successor to ATV
<http://bioinformatics.oxfordjournals.org/cgi/content/abstract/17/4/383>),
which also allows manipulation and navigation of the tree. While
tools exist to convert other formats (such as the widely used Newick
and Nexus formats) to phyloXML, there is currently support for
phyloXML in only one of the open source Bio* projects (in BioPerl
<http://www.bioperl.org/wiki/Phyloxml_Project_Demo>, as a result of
Google's Summer of Code 2008).
Approach
Build phyloXML support in the increasingly popular, dynamic, and
fully objected oriented language Ruby. More specifically, extend the
open source BioRuby project to support phyloXML (BioRuby 1.3.0 has
just been released). This will entail (i) the development of objects
to represent all the elements of phyloXML (sequences, taxonomic
data, annotations, etc), (ii) the development of a parser to read in
phyloXML, and (iii) a phyloXML writer.
Challenges
Relating the data elements specific to phyloXML to the tree classes
already in BioRuby while maintaining the standards of the BioRuby
project. Development of a time and memory efficient phyloXML parser
(the parser has to be able to process trees with thousands of
external nodes, at least).
Involved toolkits or projects
BioRuby <http://www.bioruby.org/>, phyloXML <http://www.phyloxml.org>
Degree of difficulty and needed skills
Medium. Requires experience in an object oriented programming
language (such as C++, Java, or, ideally, Ruby). Experience in
genomics or a related biological field is also critical. Knowledge
of BioRuby will obviously help, as well as familiarity with XML.
Mentors
Christian Zmasek
More information about the BioRuby
mailing list