[Biopython-dev] gsoc weekly update

Mark Holder mtholder at gmail.com
Mon Jul 1 15:15:47 UTC 2013

Hi Yanbo,

It looks like you are making nice progress.

1. A comment on tests:
I noticed that the upgma and nj tests (from last week) just verify
that the trees produced are of the right class and can be written as
newick. It is probably worth strengthening those tests to make them
check that the branch lengths are correct.

2. A thought on character weighting:
You might think about adding support for tree construction from a
"compressed" input character matrix. By compressed, I mean one in
which you store unique data patterns (unique columns in an alignment)
and a pattern weight for that column rather than storing every
character separately.  The pattern weight is typically the number of
times that the pattern was observed in the original ("raw") character
matrix (but it is nice to support floats as weights). Richer
implementations of a compressed matrix also store the complete mapping
of data patterns to original character indices to enable the
recreation of the original matrix, but that feature is rarely used in
tree inference.

I don't know if biopython has this form of data compression
implemented, but it is very widely used in phylogenetic inference. It
can be used in any inference technique that treats characters as
independent and identically distributed.

If biopython does not support this form of compression, then it may be
worth writing the TreeConstruction code to work with character weights
in the event that someone else implements this feature.  Or you could
at least add a #\TODO comment in the code where ever character
weighting would be used (so that it would be easy to fix later).

all the best,

PS: I've been travelling, but I'm back in Lawrence now. I'm happy to
chat with you this week about parsimony algorithms if you have

On Mon, Jul 1, 2013 at 4:29 AM, Yanbo Ye <yeyanbo289 at gmail.com> wrote:
> Hi all,
> I post an update for the project 'Phylogenetics in Biopython: Filling in the
> gaps'.
> http://blog.yeyanbo.com/posts/google-summer-of-code-4.html
> Best,
> Yanbo
> --
> 叶彦波
> 中科院武汉病毒所生物信息学课题组
> Yanbo Ye
> Bioinformatics Group, Wuhan Institute Of Virology, Chinese Academy of
> Sciences

Mark Holder

mtholder at gmail.com
mtholder at ku.edu

Department of Ecology and Evolutionary Biology
University of Kansas
6031 Haworth Hall
1200 Sunnyside Avenue
Lawrence, Kansas 66045

lab phone:  785.864.5789
fax (shared): 785.864.5860

More information about the Biopython-dev mailing list