[Biopython-dev] GSoC Weekly Update 11: PhyloXML for Biopython
Eric Talevich
eric.talevich at gmail.com
Mon Aug 3 14:57:59 UTC 2009
Hi all,
Previously (July 27-31) I:
- Added the remaining checks for restricted tokens
- Modified the tree, parser and writer for phyloXML 1.10 support -- it
validates now, and unit tests pass. PhyloXML 1.00 validation breaks,
but
that won't affect anyone except BioPerl, and they said they can deal
with
it on their end
- Changed how the Parser and Writer classes work to resemble other
Biopython parser classes more closely
- Picked standard attributes for BaseTree's Tree and Node objects
(informed
by PhyloDB, though the names are slightly different); added
properties to PhyloXML's Clade to mimic both types
- Made SeqRecord conversion actually work (with reasonable
round-tripping
capability); added a unit test
- Changed __str__ methods to not include the object's class name if
there's
another representative label to use (e.g. name) -- that's easy enough
to
add in the caller
- Sorted out the TreeIO read/parse/write API and added some support for
the Newick format, as recommended by Peter on biopython-dev
- Split some "plumbing" (depth_first_search) off from the Tree.find()
method. Since there are a lot of potentially useful methods to have on
phylogenetic tree objects, I think it's best to distinguish between
"porcelain" (specific, easy-to-use methods for common operations) and
"plumbing" (generalized or low-level methods/algorithms that porcelain
can rely on) in the Tree class in Bio.Tree.BaseTree.
- Started a function for networkx export. The edges are screwy right
now, so I haven't checked it in yet.
This week (Aug. 3-7) I will:
Scan the code base for lingering TODO/ENH/XXX comments
Discuss merging back upstream
Work on enhancements (time permitting):
- Clean up the Parser class a bit more, to resemble Writer
- Finish networkx export
- Port common methods to Bio.Tree.BaseTree (from Bio.Nexus.Trees and
other
packages)
Run automated testing:
- Re-run performance benchmarks
- Run tests and benchmarks on alternate platforms
- Check epydoc's generated API documentation and fix docstrings
Update wiki documentation with new features:
- Tree: base classes, find() etc.,
- TreeIO: 'phyloxml', 'nexus', 'newick' wrappers; PhyloXMLIO extras;
warn
that Nexus/Newick wrappers don't return Bio.Tree objects yet
- PhyloXML: singular properties, improved str()
Remarks:
- Most of the work done this week and last, shuffling base classes and
adding various checks, actually made the I/O functions a little
slower.
I don't think this will be a big deal, and the changes were necessary,
but it's still a little disappointing.
- The networkx export will look pretty cool. After exporting a Biopython
tree to a networkx graph, it takes a couple more imports and commands
to
draw the tree to the screen or a file. Would anyone find it handy to
have
a short function in Bio.Tree or Bio.Graphics to go straight from a
tree
to a PNG or PDF? (Dependencies: networkx, matplotlib or maybe
graphviz)
- I have to admit this: I don't know anything about BioSQL. How would I
use
and test the PhyloDB extension, and what's involved in writing a
Biopython interface for it?
Cheers,
Eric
http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:Biopython_support_for_parsing_and_writing_phyloXML
More information about the Biopython-dev
mailing list