[Biopython-dev] Plan for Bio.CodonAlign development
Zheng Ruan
zruan1991 at gmail.com
Tue May 13 23:15:18 EDT 2014
Hi all,
In this summer, I would like to further enhance the CodonAlign
module that I developed last year. Here are a couple of things
in my mind. Any suggestions are greatly appreciated.
1) Right now, the most awkward step to build a Codon Alignment
using Bio.CodonAlign is how to accurately match protein sequences
to nucleotide sequences. If there are multiple insertion
(frameshift) events in nucleotide sequence, the current code will
not work. To address this, some third party program such as
exonerate will help. I would like to add an option, so that the
code will accept a file containing information of amino acids
-- nucleotides correspondence produced by exonerate. Or to allow
the program to call exonerate internally to get the
correspondence info.
2) The Bio.CodonAlign module now contains 3 counting based
methods and 1 ML method for dN, dS estimation. I noticed that
the result produced by the code is slightly different from what
PAML gives. I will dig into this and figure out the reason. Some
more dN, dS estimation methods will also be added.
3) The code for chisq test for MKtest is borrowed from Eric. I
will look into the biopython's own version of chi2
(Bio.Phylo.PAML.chi2
http://web.archiveorange.com/archive/v/5dAwXsd7pIljyMSmtWeb)
and make it work for my purpose. The correction of counts in
MKtest will also be implemented.
4) If time permits, I want to implement the BEB (Bayes
Empiricial Bayes) approach to infer sites under positive
selection. I'm not sure if the algorithm is suitable for a
python implementation since it's slow under pure C (codeml).
But I'll at least give a try.
Thank you!
Zheng Ruan
More information about the Biopython-dev
mailing list