[Biopython] Is there any Biopython tool to degenerate a nucleotide sequence

Jeremy Jeremy.molbio at gmail.com
Sat Aug 1 23:23:23 UTC 2015


Sampson, Jared <Jared.Sampson <at> nyumc.org> writes:

> 
> 
> Hi Jeremy - 
> 
> Nice work, thanks for sharing.
> 
> 
> However (and someone please correct me if I'm wrong here!), it looks like 
the current Leucine substitution, "YTN" would also match both of Phe's 
codons ("TTT" and "TTC"), and the current Arginine ("MGN") also matches two 
of Serine's codons ("AGT" and "AGC").
>   FWIW, the PhyloTools script also produces the same erroneous degenerate 
codons.  I've sent the contact address on that site a bug report.
> 
> I've updated a fork of your original gist to implement fixes for these 
residues, along with a couple stylistic changes (hope you don't mind). 
 Please feel free to incorporate them
>  into your original.  If you want to double check the rest of the 
degen_dict, there's a nice table on Wikipedia.
> 
> Also, if you're looking to make other improvements, it might be nice to 
add a "frame=1" argument to degenerate_sequence() to optionally accommodate 
the other two reading frames rather than chopping leftover bases.
> 
> Cheers,
> Jared
> 
> 
> 
> 
> -- 
> Jared Sampson
> Xiangpeng Kong Lab
> NYU Langone Medical Centerhttp://kong.med.nyu.edu/
> 
> 
> 
> On Jul 31, 2015, at 1:39 AM, Jeremy <Jeremy.molbio <at> gmail.com> wrote:
> 
> 
> Carlos Pena <mycalesis <at> gmail.com> writes:
> 
> Dear Biopython members,
> I want to take a nucleotide string and degenerate those bases that can
> undergo synonymous change.
> For example, a string of just one codon.
> * Input:  AAC
> * Output:  AAY
> Since both AAC and AAT are translated to Asparagine (N) we can
> degenerate this codon to AAY (because the third position could produce a
> synonymous change).
> This is already solved in the Perl library 
Degenhttp://www.phylotools.com/ptdegendocumentation.htm
> I could use some glue to execute this Perl code from Python but
> I cannot include this library in my project because they are using the
> GPL license while I use BSD.
> So I thought asking around before writing a Python script to do this for 
> 
> me.
> 
> thanks for any pointers,
> carlos
> 
> 
> Hi Carlos,
> I hacked up something that should return the same output as the Degen 
1.4 
> Perlweb tool.  
> The gist can be found here: 
 https://gist.github.com/biojerm/6242381eb4ad3ef18ac6
> I am pretty new to both Python and Biopython, so the please let me know 
if 
> you have any feedback on both form, styling, and/or function. 
> I know the method is currently quite fragile. Below are a few thoughts 
on 
> the method's weaknesses
> 1)The method does not handle sequences that are not evenly divisible by 
3.
> 2)I think the method would be a lot more useful if you could call it on 
a 
> single or set of FASTA files or a GB files.  But, I have not learned 
 how 
> to program that yet.  
> 3) I probably should return the degenerate sequences as Seq files, but 
at 
> the moment they are simple strings.
> 4)Tests...need to figure those out too.  
> Please let me know if you find this useful or and if there are any must 
> have features for your purposes.
> Thanks,
> Jeremy
> _______________________________________________
> Biopython mailing list  -  Biopython <at> mailman.open-
bio.orghttp://mailman.open-bio.org/mailman/listinfo/biopython
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ------------------------------------------------------------
> This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any 
unauthorized review, use, disclosure, or distribution is prohibited. If you 
have received this email in error please notify the sender by return email 
and delete the original message. Please note, the recipient should check 
this email and any attachments for the presence of viruses. The 
organization accepts no liability for any damage caused by any virus 
transmitted by this email.
> =================================
> 
> 
> _______________________________________________
> Biopython mailing list  -  Biopython <at> mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


Hi Jared,
Thanks for editing the code.  Your improvements in both style and function 
are greatly appreciated.  I originally  was trying to mimic the output of 
the bioPerl function.  However, I think your improvements help to maintain 
accuracy of the original sequence.  I will incorporate your method into the 
final function.  I will also try and introduce different frames and 
possibly different codon tables.  



More information about the Biopython mailing list