[Biopython-dev] GSoC code+documentation review: PhyloXML for Biopython

Sat Jul 4 15:38:43 EDT 2009

Hi Eric;
Great stuff as always. You are rocking on this; I was digging
through your code at the end of the week and really happy with what
you've put together.

> user-oriented documentation for my project to the Biopython wiki:
> http://www.biopython.org/wiki/PhyloXML
> 
> What do you think? Any missing information, unclear wording, or outright
> lies?

What you have looks very good. A couple of thoughts on other things
that would be useful:

- In the usage section where you introduce clades, it might help to have
a high-level diagram of a simple tree and the corresponding PhyloXML
representation in terms of phylogeny and the clade parent/child
relationship. Understanding this representation is important for newcomers
and might ease them into using the classes.

- The examples in 'Using PhyloXML objects' are very good and to the
extent you have time to expand this, more of these would be very
useful. These real life type examples are the best way to help users
discover the features of PhyloXML. Based on Christian's highlighted
features on the PhyloXML page, a little brainstroming on some things
to tackle:

- Providing annotation data on a node of the tree.
- Adding orthology relationships to the tree; generally providing
  high level node data.

These would expose more of the extensive markup elements built into
PhyloXML and help users discover them.

> I also updated the project plan with some ideas for filling up the rest of
> July:
> http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML

I really like the idea of exploring interoperability with other
Biopython tree representations and generalizing there. In addition to
the Tree class in Bio.Nexus, the PyCogent tree representation looks
generalized:

http://pycogent.svn.sourceforge.net/viewvc/pycogent/trunk/cogent/core/tree.py?view=markup

Combining this with the PhyloXML examples above, maybe it would
worthwhile to think through and document a more complicated
pipeline. Something like starting with a protein, identifying
homologs, building a tree, adding annotation data, and outputting to
PhyloXML. This would be a great starting place to how to
interoperate, and also give users a jumping off point for providing
more phylogenies in PhyloXML. Similarly, a PhyloXML to networkx (or
other) display would also give a nice interoperable use case for
others to build off of.

Thanks for all your hard work on this,
Brad