[BioRuby] GSOC: Bioruby PhyloXML update 12
Diana Jaunzeikare
rozziite at gmail.com
Mon Aug 10 19:54:55 UTC 2009
Hi all,
What was done last week:
* Coding. Added changes so that now it is completely compatible with
phyloxml schema 1.10
* Testing. added more unit tests (now writer has 9 tests, 26 assertions;
parser: 40 tests, 134 assertions)
* Profiling. I discovered that writer is really slow. The reason is the
implementation of the Tree#children method, which does bfs_shortest_path
algorithm. I had idea of tracking node children inside the node class as an
array, but Naohisa Goto pointed out that then I would also have to deal with
new node, edge addition, removal, etc. So better solution seems to, for now
leave it as it is, and first improve Bio::Tree class. I am planning to do
that after GSOC, since there is only one week left.
* Refactored parser class, got around 3-fold speed increase. Now it can
parse Metazoa taxonomy 33MB file in ~14 seconds (Ubuntu 9.04, ruby 1.8.7
[i486-linux], Intel Core 2 Duo P8600 @2.4GHz)
Next week:
* Create howto wiki page with code examples and usage.
* Do more testing (Anybody has some more phyloxml xml files for me to test,
other than those on phyloxml.org?)
* Any other suggestions from you?
Questions/issues:
* Where should the HOWTO and code example documentation go? Seems reasonable
for it to go here
http://bioruby.open-bio.org/wiki/HOWTO:Trees and/or
http://bioruby.open-bio.org/wiki/Phyloxml_tree_format (which is linked from
previous link).
* How does integration to the master branch goes? Is all i have to do is
pull_request on github?
* I have implemented PhyloXML::Sequence#to_biosequence, however it returns
incomplete data, since info for Bio::Sequence#classification,
Bio::Sequence#species, Bio::Sequence#division would come from
PhyloXML::Taxonomy class, but it is not accessible from Sequence class.
Should there be PhyloXML::Node#to_biosequence method which would gather
information from both PhyloXML::Sequence and PhyloXML::Taxonomy? or maybe
Bio::Sequence should not hold taxonomic information?
You are all welcome to test my code. It is available on
http://github.com/latvianlinuxgirl/bioruby/tree/dev
Thanks,
Diana
More information about the BioRuby
mailing list