[Biopython-dev] [Bug 3045] New: TreeMixin, please define enumerator and other convenience methods

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Apr 6 21:28:27 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3045

           Summary: TreeMixin, please define enumerator and other
                    convenience methods
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P4
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: joelb at lanl.gov


Hi again,

I frequently find the need to go back and forth between tree objects and
sequences defined over either the internal or the terminal nodes. Ideally these
should be done in concise list comprehensions for performance and readability
reasons. These list comprehensions necessarily mix indices into arrays and
objects from generators, and the enumerate() pattern is the most convenient
because of this mix. I suspect that many others have the same needs.   

The usage patterns for, say, setting a phyloXML property from an array prop_arr
should look something like:

[node.set_property(prop_arr[i], *prop_params,  **prop_keywords) 
            for i, node in tree.enumerate_internals()]

The three issues that frustrate such concision are (1) internal nodes, terminal
nodes, and all nodes are not currently on an equal footing with respect to
methods, (2) there are no enumerator methods, and (3) the get/set methods for
phyloXML are very awkward at the moment.  I deal with (3) in the next feature
request.

Here I give some convenience methods that I wish were defined in TreeMixin.  I
have tested them as standalone methods.  I hope you'll see fit to include them
at some point.

def count_internals(self):
    """Counts the number of  non-terminal (internal) nodes within this tree."""
    return [i for i,e in enumerate_internals(self)][-1] + 1

def enumerate_internals(self):
    """Returns an enumerator of non-terminal clades"""
    return  enumerate(self.find_clades(terminal=False))

def enumerate_terminals(self):
    """Returns an enumerator of terminal clades"""
    return  enumerate(self.find_clades(terminal=True))

def enumerate_all(self):
    """Returns an enumerator on all clades"""
    return  enumerate(self.find_clades())

Less critical but still useful are the following two methods (and one private
utility) that I find useful for operations on trees:

def is_semipreterminal(self):
    """True if any direct descendent is terminal."""
    if self.root.is_terminal():
        return False
    for clade in self.clades:
        if clade.is_terminal():
            return True
        return False

def terminal_neighbor_dists(self):
    """Return a list of distances between adjacent terminals"""
    return [self.distance(*i) for i in
_generate_pairs(self.find_clades(terminal=True))]

def _generate_pairs(self):
    import itertools
    pairs = itertools.tee(self)
    pairs[1].next()
    return itertools.izip(pairs[0], pairs[1])


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list