[Biopython-dev] Python_MKT
Juraj Bergman
jurajbergman at hotmail.com
Sat Sep 7 14:14:59 UTC 2013
Hi,
I've made some improvements in my MKT module - mainly using Kruskal's algorithm to rewrite the multi_short_path() function (thanks for the suggestion Zheng!) and I added some new functions as well (pathway_a(), pathways_n()).
links:https://www.dropbox.com/s/zgnz8xwlcsispzf/Python_MKT.pdfhttps://www.dropbox.com/s/1z3opj4rbb0ms14/Python_MKT.py
Regards,
Juraj
Date: Fri, 6 Sep 2013 00:00:06 -0400
Subject: Fwd: [Biopython-dev] Python_MKT
From: zruan1991 at gmail.com
To: biopython-dev at biopython.org; jurajbergman at hotmail.com
Hi Juraj,
I am also planing to implement MK test into my GSoC framework. I just went through you code and it is really independent. Will you be also to modify it to utilize the MultipleSeqAlignment, Alphabet and CodonTable module of Biopython so that it is more extendable?
As to the multi_short_path() function, you really confused me. Is your implementation guaranteed to find the shortest path? This problem can be abstracted as finding the minimum spanning tree in graph theory and a good algorithm is known (Prim algorithm or Kruskal algorithm). My idea of linking multiple codons is first generate a codon by codon matrix representing the synonymous and nonsynonymous substitutions each codon needs to change to the other in advance. Then finding the minimum spanning tree that connect all the node in the matrix with minimum length (least synonymous substitutions). I plan to implement this and you may have more insight about my suggestions. Thanks!
Best,Zheng Ruan
On Thu, Sep 5, 2013 at 10:33 AM, Juraj Bergman <jurajbergman at hotmail.com> wrote:
Dear all,
I'm resending my implementation of the McDonald-Kreitman test.
Link to the description of the module:https://www.dropbox.com/s/zgnz8xwlcsispzf/Python_MKT.pdf
Link to the code:https://www.dropbox.com/s/1z3opj4rbb0ms14/Python_MKT.py
I apologise for the initial mistake of sending attachments instead of links.
Kind regards,
Juraj Bergman
P.S. Regarding the multi_short_path() function - I realize that it is very, very repetitive butI have not (yet) managed to find a suitable loop construction that would replace the current code. The multi_short_path() function is by far the most complex function of the modulebecause its purpose is to find the codon network with the least amount of overall nucleotide substitutions and the least amount of non-synonymous nucleotide substitutions (given any combination of codons). Each codon is being represented as multiple lists of two integers (depending on the overall amount of codons being processed). The first integer specifies the amount of synonymous and the second specifies the amount of non-synonymous substitutions.For example, if 10 codons are being fitted in a network, then there are 10x10 = 100 combinations of codon-codon pathways, each represented with a two-integer list, and out of these 100 lists, the 'best' 10 have to be chosen to get the most optimal codon networ!
k (and the repetitiveness of thefunction mainly arises because of this process). This is, in short, a description of the function and I would appreciate any pointers that would help to make the code more succinct :)
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
More information about the Biopython-dev
mailing list