[Biopython-dev] PhyloXML helper functions

Eric Talevich eric.talevich at gmail.com
Mon Jul 6 23:34:30 UTC 2009


Hey all,

I've been mulling a couple of methods for PhyloXML objects that I thought
could deserve some discussion.

1. Singular properties for some plural attributes

This goes back to the "confidences" issue: When I'm drilling down through a
phyloXML-derived tree, I keep expecting certain attributes to be singular
values when they're actually plural. Auto-completion catches it, of course,
but the resulting code would seem more obvious if I used the singular name
when I know the attribute consists of a list of one element.

The attributes I had in mind for this are taxonomies (Clade class) and
confidences (Clade and Phylogeny classes). Should any other attributes get
this treatment? Here's an example getter method -- Rubyists may ignore the
first line:

@property
def confidence(self):
    if len(self.confidences) > 1:
        raise RuntimeError, "More than one confidence item is available! Use
foo.confidences"
    elif len(self.confidences) == 0:
        raise RuntimeError, "No confidence item is available! You fail"
    else:
        return self.confidences[0]

Then this works as expected, similar to the way certain IO read() functions
work elsewhere in Biopython.


2. A find() method on Clade and maybe Phylogeny objects

The function definition and docstring would look like this:

def find(cls, **kwargs):
    """Find all sub-nodes matching the given attributes.

    The first argument specifies the class of the sub-node. (Use
Tree.PhyloElement
    to match any standard phyloXML type.) The arbitrary keyword arguments
indicate
    the attribute name of the sub-node and the value to match. The result is
an
    iterable through all matching objects.

    Example:
    >>> tree = PhyloXML.read('phyloxml_examples.xml').phylogenies[5]
    >>> matches = tree.clade.find(Taxonomy, code='OCTVU')
    >>> matches.next()
    Taxonomy(code='OCTVU', scientific_name='Octopus vulgaris')
    """

Enhancements:
- The keyword argument could be a regular expression. Would that be useful?
To handle numbers, I'd have to convert every sub-node attribute value to a
string, and that would be weird -- or else find() would have to skip
numerical attributes.
- Non-keyword arguments (*args) could specify just the not-None existence of
an attribute. Allowing regexes would make this unnecessary (e.g. name='.*')
- If no regular arguments are needed, cls could default to PhyloElement or
even "object" to match everything.
- To enable arbitrary hairiness, this function could accept a function as
the value of the keyword argument and return anything truthy. But at that
point, the user could probably just roll their own find_node() function.
However, it could still be useful to filter for numerical values.

What do you think?

Thanks,
Eric



More information about the Biopython-dev mailing list