[Biopython-dev] [Wg-phyloinformatics] GSoC Weekly Update 10: PhyloXML for Biopython
Christian Zmasek
czmasek at burnham.org
Wed Jul 29 21:12:52 UTC 2009
Hi, Eric:
Looks good!
Remarks:
- Bioperl's phyloXML driver was written for version 1.00 and might hurl if
given a v1.10 file -- so that's a potential problem if Biopython defaults
to writing v1.10 files. Should Writer take a option to specify the file
format version number? Right now it only writes valid phyloXML v1.00.
This is a nice thought, but to be honest, I would not do it, especially since it is likely there will be more versions in the future (although, hopefully, just extending 1.10, as opposed to the removal and change of elements.
- PhyloXMLIO also always writes branch_length as an XML node, not an
attribute. This validates and will be handled safely by any sane parser,
and fits better with the idea of an implicit root node in each clade
object, I think. (The parser still handles an attribute properly.) Any
objections?
This is fine!
- Above, I've listed more enhancements than I'll probably be able to finish
this week. Which should have higher priority? I know merging Bio.Nexus
and Bio.Tree would be the most useful, but since (1) Biopython
development still happens on CVS, not Git, and (2) another Tree-based
GSoC project is expected to land around the same time as mine, I think
doing the integration right now would be kind of painful. So I can focus
either on laying the groundwork in Bio.Tree.BaseTree, copying rather than
moving the relevant Nexus code, or else work mainly on exporting to other
useful object representations like networkx graphs, or any Biopython
classes I've missed (e.g. alignments). Suggestions?
Time permitting I would concentrate on exporting to other useful object representations and on Bio.Tree.BaseTree compatibility with BioSQL's PhyloDB extensions.
Christian
________________________________________
From: wg-phyloinformatics-bounces at nescent.org [wg-phyloinformatics-bounces at nescent.org] On Behalf Of Eric Talevich [eric.talevich at gmail.com]
Sent: Monday, July 27, 2009 10:56 AM
To: Phyloinformatics Group; BioPython-Dev Mailing List
Subject: [Wg-phyloinformatics] GSoC Weekly Update 10: PhyloXML for Biopython
Hi folks,
Previously (July 20-24) I:
Finished implementing I/O methods, Tree classes and tests for all phyloXML
elements.
Changed Writer to preserve node order in the XML; output now validates
under the phyloXML 1.00 schema (but 1.10 complains)
Did some drastic code reorganization.
- Bio.Tree:
- Moved Clade.find() and PhyloElement.__repr__ methods to BaseTree
classes
- Made Clade inherit from BaseTree.Tree in addition to BaseTree.Node,
and added the corresponding attributes
- Moved Bio.PhyloXML.Tree to Bio.Tree.PhyloXML
- Bio.TreeIO:
- Merged PhyloXML's Parser and Writer into PhyloXMLIO under the new
Bio.TreeIO module, and updated imports everywhere
- Added wrappers for Nexus read/write; doesn't return Bio.Tree objects
yet though
Added/updated unit tests for all of this.
Documented the code reorg on the Biopython wiki, adding Tree and TreeIO
pages and fixing the examples on the PhyloXML page.
Scrubbed docstrings and enabled epydoc processing.
This week (July 27-31) I will:
Finish implementing the phyloXML spec:
- Scan "simple types" for restricted tokens; check strings in constructors
- Take a stab at phyloXML 1.10 support (need a 'version' arg to Writer?)
- Clean up and reorganize any code that needs it
Enhancements (time permitting):
- Improve the SeqRecord conversion
- Work on Bio.Tree.BaseTree compatibility with BioSQL's PhyloDB extension
- Port common methods to Bio.Tree.BaseTree -- see Bio.Nexus.Tree, Bioperl
node objects, PyCogent, p4-phylogenetics
- Tree method: build_index (set left_idx, right_idx on all nodes):
- calculate left/right indexes for nested-set representation
- see http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html
- Export to networkx (http://networkx.lanl.gov/) -- also get graphviz export
for free, via networkx.to_agraph()
Remarks:
- Bioperl's phyloXML driver was written for version 1.00 and might hurl if
given a v1.10 file -- so that's a potential problem if Biopython defaults
to writing v1.10 files. Should Writer take a option to specify the file
format version number? Right now it only writes valid phyloXML v1.00.
- PhyloXMLIO also always writes branch_length as an XML node, not an
attribute. This validates and will be handled safely by any sane parser,
and fits better with the idea of an implicit root node in each clade
object, I think. (The parser still handles an attribute properly.) Any
objections?
- Above, I've listed more enhancements than I'll probably be able to finish
this week. Which should have higher priority? I know merging Bio.Nexus
and Bio.Tree would be the most useful, but since (1) Biopython
development still happens on CVS, not Git, and (2) another Tree-based
GSoC project is expected to land around the same time as mine, I think
doing the integration right now would be kind of painful. So I can focus
either on laying the groundwork in Bio.Tree.BaseTree, copying rather than
moving the relevant Nexus code, or else work mainly on exporting to other
useful object representations like networkx graphs, or any Biopython
classes I've missed (e.g. alignments). Suggestions?
Cheers,
Eric
http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:Biopython_support_for_parsing_and_writing_phyloXML
More information about the Biopython-dev
mailing list