[Biopython-dev] Bio.Phylo: the home stretch

Peter biopython at maubp.freeserve.co.uk
Mon Apr 26 06:59:25 EDT 2010


On Sat, Apr 17, 2010 at 2:35 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> Hi all,
>
> There are two more decisions in Bio.Phylo that I'd like to settle on before
> the release of Biopython 1.54. They're holding open Bug 3045:
> http://bugzilla.open-bio.org/show_bug.cgi?id=3045

Sorry I didn't get round to this last weel.

> 1. *Do we need a get_all_clades() method on trees and clades?*
>
> Bio.Nexus has get_terminals(); I added the same to Bio.Phylo early on, and
> then get_nonterminals() to satisfy some demand for the opposite method:
>
>    def get_terminals(self, order='preorder'):
>        """Get a list of all of this tree's terminal (leaf) nodes."""
>        return list(self.find_clades(terminal=True, order=order))
>
>    def get_nonterminals(self, order='preorder'):
>        """Get a list of all of this tree's nonterminal (internal) nodes."""
>        return list(self.find_clades(terminal=False, order=order))
>
> They're both trivial, but the idea is to make the module easy to jump into
> without reading the docs first. (find_clades() is a generator function that
> several other functions use internally; to do useful things in Bio.Phylo you
> still need to learn how to use it eventually.)
>
> So (a) do we need yet another sugar function that retrieves all tree nodes,
> both internal and external? (b) if so, what should it be called?
>
> The implementation would be:    list(self.find_clades(order=order))
> Also accomplished as:    tree.get_terminals() + tree.get_nonterminals()

I'd say no, we don't need it. You can always add it later, but removing
something from the API is complicated with deprecations etc.

> 2. *Rename find_clades() to find(), or something else?*
>
> I've previously renamed:
>
> find() => find_any()
> -- given the same parameters as find_clades(), return the first match found,
> or else None (useful in an if statement)
>
> find_all() => find_elements()
> -- phyloXML trees have some complex objects as tree attributes, containing
> other objects. This function searches for those directly, and for trees
> without such attributes (e.g. all Newick trees), this happens to be the same
> as find_clades()
>
> So: find_clades() can search inside complex objects attached to trees, but
> yields the corresponding clade object rather than the non-clade element
> itself. This lets you search clades by e.g. clade.taxonomy.scientific_name,
> or clade.sequence.type. It should be the first "find_*" function users reach
> for. Should we give it a shorter name to encourage that, and shorten the
> code that uses it?

Hmm. I think find_clades() is sensible.

> Here's a first crack at documentation:
> http://github.com/etal/biopython/commit/8056a198804a08e3e03ac943c45744ad020dd53f

There is a very short tree example in the Alignment chapter section on
Clustalw using Bio.Nexus.Trees - we should just replace that with "See
Chapter X" on loading and manipulating trees.

Peter



More information about the Biopython-dev mailing list