[Biopython-dev] PhyloXML helper functions
eric.talevich at gmail.com
Wed Jul 8 18:58:52 UTC 2009
On Tue, Jul 7, 2009 at 8:51 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> > 2. A find() method on Clade and maybe Phylogeny objects
> > Enhancements:
> > - The keyword argument could be a regular expression. Would that be
> This seems useful. Often people use crazy naming convention hacks,
> and might want to pull out something like all proteins from a
> particular organism based on a common prefix in the name.
> > To handle numbers, I'd have to convert every sub-node attribute value to
> > string, and that would be weird -- or else find() would have to skip
> > numerical attributes.
> Is this if you support regular expressions or either way? For the
> find, I think it's sufficient to define what you support and leave
> it at that set: any subset of searching will help people get their
> work done.
I implemented it. Here's the signature and docstring:
def find(self, cls=None, **kwargs)
"""Find all sub-nodes matching the given attributes.
The 'cls' argument specifies the class of the sub-node. Nodes that inherit
this type will also match. (The default, Tree.PhyloElement, matches any
standard phyloXML type.)
The arbitrary keyword arguments indicate the attribute name of the sub-node
the value to match: string, integer or boolean. Strings are evaluated as
regular expression matches; integers are compared directly for equality, and
booleans evaluate the attribute's truth value (True or False) before
To handle nonzero floats, search with a boolean argument, then filter the
If no keyword arguments are given, then just the class type is used for
The result is an iterable through all matching objects, by depth-first
search. (Not necessarily the same order as the elements appear in the
>>> tree = PhyloXML.read('phyloxml_examples.xml').phylogenies
>>> matches = tree.clade.find(code='OCTVU')
Taxonomy(code='OCTVU', scientific_name='Octopus vulgaris')
- Phylogeny.find just directly calls self.clade.find and returns the result.
- I still use PhyloElement instead of object for the default class. The
recursive function uses __dict__ to walk the tree, so allowing any object
be searched leads to chaos (e.g. int.__dict__ has 55 keys). Restricting
search to Tree-related nodes still accommodates most use cases, I think.
- Depth-first search - if a node that matches has subnodes that also match,
higher node will be yielded first, then the first matching subnode, and so
on. But: since the object dictionary doesn't keep XML node order, the
the matches are returned in isn't always what you'd expect. I think I can
mitigate this somewhat, but still -- documented weirdness.
More information about the Biopython-dev