[Biopython] deleting in-group paralogs from newick trees
Eric Talevich
eric.talevich at gmail.com
Mon Aug 8 19:33:52 UTC 2011
On Mon, Aug 8, 2011 at 2:08 PM, Jessica Grant <jgrant at smith.edu> wrote:
> Hello,
>
> I am looking at large phylogenetic trees that have many paralogs. I would
> like to simplify my trees so that all monophyletic paralog groups are
> collapsed--or all sequences except the shortest branch are deleted. Is
> there a Biopython module that can help? I started looking at Phylo, but
> couldn't see an obvious way.
>
Hi Jessica,
Yes, Phylo is the right module to use. If I understand your problem
correctly, the tree methods you want are is_monophyletic() and
collapse_all(). Both operate on a clade within the tree. You'd traverse the
tree with get_nonterminals(), check if a paralog group under a clade is
monophyletic, and if so, collapse it.
Do you have a list of paralogs already? And, do you know which groups might
be monophyletic?
If you have groups/clades already, it's simple:
>>> tree = Phylo.read('mytree.nwk', 'newick')
>>> for clade in tree.get_nonterminals(order='postorder'):
... mono_parent = clade.is_monophyletic([SOME_PARALOG_GROUP])
... if mono_parent:
... mono_parent.collapse_all()
If you don't know the groups yet, then the test inside the loop is a little
more elaborate. You can look for overlaps between a clade's tips and and the
paralog list using sets:
>>> paralogs = set(PARALOG_LIST)
# Inside the loop:
>>> tips = set([str(t) for t in clade.get_terminals()])
>>> overlap = tips.intersect(paralogs)
>>> if len(overlaps) >= 2:
# The rest of the loop...
Hope that helps,
Eric
More information about the Biopython
mailing list