[Biopython-dev] GSoC Weekly Update 8: PhyloXML for Biopython
Eric Talevich
eric.talevich at gmail.com
Mon Jul 13 15:21:20 UTC 2009
Hi all,
Previously (July 6-10) I:
- Addressed some comments from last week's code/doc review
- Enabled Pythonic syntax sugar (dictionary emulation, specialized
__str__ methods, singular properties for some plural attributes),
plus tests
- Wrote Clade.find() for flexible searching
- Checked Py2.4 compatibility (it's slower, but it works)
- Started Bio.Tree, Bio.TreeIO modules (integration)
This week (July 13-17) I will:
Extend the core to the rest of the spec:
- Adding unit tests and classes to support the remaining (non-core)
phyloXML elements
- Implement collapse_whitespace -- see the spec glossary
- Make Writer use the correct namespace prefixes
- "other" objects: assert the namespace is not phyloxml
- Use the schema document to validate the input file
Integrate with Biopython:
- Extract a Bio.Tree.BaseTree module from PhyloXML's tree classes
- Improve the SeqRecord conversion
Improve/revise documentation:
- Address remaining comments from code/doc review
- Revisit docstrings for all classes, functions, methods; consider
enabling
epydoc formatting
Questions:
- My serializer uses XML entity codes instead of unicode characters in
the output --
is that OK? It still round-trips successfully with the parser.
- Is there anything to do for BioSQL compatibility, besides extracting
sequences?
Cheers,
Eric
http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:Biopython_support_for_parsing_and_writing_phyloXML
More information about the Biopython-dev
mailing list