[Biopython] Back translation support in Biopython

Peter Cock p.j.a.cock at googlemail.com
Sun Jul 22 12:51:12 UTC 2012


On Sat, Jul 21, 2012 at 10:44 PM, Igor Rodrigues da Costa
<igorrcosta at hotmail.com> wrote:
>
>
> Hi Peter,
> I would eliminate the problem of ID mapping (or at least
> pass it to the user) by using only the function that uses
> one sequence pair.

Making the function for doing one sequence pair part
of the public API seems sensible then.

> The other option is to check if the codon and the amino
> acid are equivalent at run time, using a given genetic
> code. I did this in my program that back translated
> using only the aligned protein sequence and the
> Uniprot/GI accession numbers (I did the search using
> Bio.Entrez), but in my case the nucleotide dictionary
> was only some different ways the nucleotide sequence
> could be imported from NCBI, each of them returning
> a different sequence.

Certainly optionally checking the translation seems wise.
There are potential complications with things like
ambiguous bases, but in general this is useful.

> I can't see any need for different gap characters
> between both alignments, and I feel there can be both
> a Bio.SeqIO (using a pair of sequences only) and a
> Bio.AlignIO (using multiple sequences, probably slower
> if checking at run time) versions of this function.

I agree that an alignment based function, and a
single sequence based function make sense - but
probably under Bio.Align rather than Bio.SeqIO and
Bio.AlignIO which are specifically for input/ouput
functionality.

Thanks for your thoughts,

Peter



More information about the Biopython mailing list