[Biopython-dev] Plan for Bio.CodonAlign development

Wed May 14 07:55:46 UTC 2014

On Wed, May 14, 2014 at 4:15 AM, Zheng Ruan <zruan1991 at gmail.com> wrote:
> Hi all,
>
>
> In this summer, I would like to further enhance the CodonAlign
> module that I developed last year. Here are a couple of things
> in my mind. Any suggestions are greatly appreciated.
>

That is a great idea :)

>
> 1) Right now, the most awkward step to build a Codon Alignment
> using Bio.CodonAlign is how to accurately match protein sequences
> to nucleotide sequences. If there are multiple insertion
> (frameshift) events in nucleotide sequence, the current code will
> not work. To address this, some third party program such as
> exonerate will help. I would like to add an option, so that the
> code will accept a file containing information of amino acids
> -- nucleotides correspondence produced by exonerate. Or
> to allow the program to call exonerate internally to get the
> correspondence info.
>

Would Bio.SearchIO work nicely here?

How easy is it at the moment for the simple case where you
have an unaligned set of CDS sequences and matching
aligned proteins? e.g. does this "back translation" threading
help:

https://github.com/peterjc/pico_galaxy/tree/master/tools/align_back_trans

>
> 2) The Bio.CodonAlign module now contains 3 counting based
> methods and 1 ML method for dN, dS estimation. I noticed that
> the result produced by the code is slightly different from what
> PAML gives. I will dig into this and figure out the reason. Some
> more dN, dS estimation methods will also be added.
>
>
> 3) The code for chisq test for MKtest is borrowed from Eric. I
> will look into the biopython's own version of chi2
> (Bio.Phylo.PAML.chi2
> http://web.archiveorange.com/archive/v/5dAwXsd7pIljyMSmtWeb)
> and make it work for my purpose. The correction of counts in
> MKtest will also be implemented.
>
>
> 4) If time permits, I want to implement the BEB (Bayes
> Empiricial Bayes) approach to infer sites under positive
> selection. I'm not sure if the algorithm is suitable for a
> python implementation since it's slow under pure C (codeml).
>
> But I'll at least give a try.
>
>
> Thank you!
>
> Zheng Ruan

5) Should we try to find a good lower case module name,
to try to follow PEP8 better?

Thanks,

Peter