[Biopython-dev] GSoC Weekly Update: PhyloXML for Biopython
Eric Talevich
eric.talevich at gmail.com
Mon Jun 22 16:14:19 UTC 2009
Hi folks,
Previously (June 15-19) I:
* Wrote a pretty-printer for displaying a summary of the parsed tree
structure
* Made all existing unit tests pass
* Started unit tests for instantiation of each phyloXML object
* Profiled the parser and utilities using the cProfile module on the
unit
test suite. Summarized findings on the Biopython mailing list (nothing
exciting was discovered)
* Used a custom warning type to indicate noncompliance with the PhyloXML
spec
* Separated parsing code (Parser.py) from the phyloXML class definitions
(Tree.py) -- this should make Nexus/Newick compatibility feasible
* Improved the conversion from PhyloXML.Sequence to Bio.SeqRecord,
making
better use of annotations and using SeqFeature objects to represent
protein domains
This week (June 22-26) I will:
Work on the backlog:
* Finish unittests for parsing and instantiating core elements
* Compare parser performance with Bioperl and Archaeopterix
* Document results of parser testing and performance (on wiki or here)
* Document basic usage and performance characteristics of the parser on
the
Biopython wiki
Then, serialize phyloXML trees and write back to file:
* Write unit tests for serialization
* Write serialization methods for each class
* Write a top-level function for triggering serialization of the whole
hierarchy
Question:
Biopython has a couple of core objects that I'm reusing in my project. There
was a quirk in these libraries (related to this:
http://effbot.org/pyfaq/why-are-default-values-shared-between-objects.htm)
that made the objects slightly more awkward to instantiate, but the issues
were recently fixed. I'd like to merge these fixes soon.
So, GSoC requires a tarball of the code we write at the end of the summer.
Merging from upstream would bring code that I didn't write into my
development tree -- which I could probably filter out with the right
arguments to git-diff, but nonetheless, my project history would no longer
be entirely clean. Does Google care about this? Or is it safe to go ahead
and pull from the next stable release of Biopython (coming soon)?
Cheers,
Eric
http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:Biopython_support_for_parsing_and_writing_phyloXML
More information about the Biopython-dev
mailing list