[Biopython-dev] Plan for Bio.CodonAlign development

Wed May 14 03:15:18 UTC 2014

Hi all,

In this summer, I would like to further enhance the CodonAlign

module that I developed last year. Here are a couple of things

in my mind. Any suggestions are greatly appreciated.

1) Right now, the most awkward step to build a Codon Alignment

using Bio.CodonAlign is how to accurately match protein sequences

to nucleotide sequences. If there are multiple insertion

(frameshift) events in nucleotide sequence, the current code will

not work. To address this, some third party program such as

exonerate will help. I would like to add an option, so that the

code will accept a file containing information of amino acids

-- nucleotides correspondence produced by exonerate. Or to allow

the program to call exonerate internally to get the

correspondence info.

2) The Bio.CodonAlign module now contains 3 counting based

methods and 1 ML method for dN, dS estimation. I noticed that

the result produced by the code is slightly different from what

PAML gives. I will dig into this and figure out the reason. Some

more dN, dS estimation methods will also be added.

3) The code for chisq test for MKtest is borrowed from Eric. I

will look into the biopython's own version of chi2

(Bio.Phylo.PAML.chi2
http://web.archiveorange.com/archive/v/5dAwXsd7pIljyMSmtWeb)

and make it work for my purpose. The correction of counts in

MKtest will also be implemented.

4) If time permits, I want to implement the BEB (Bayes

Empiricial Bayes) approach to infer sites under positive

selection. I'm not sure if the algorithm is suitable for a

python implementation since it's slow under pure C (codeml).

But I'll at least give a try.

Thank you!

Zheng Ruan