[Biopython] Translation of partial codons

Martin Mokrejs mmokrejs at fold.natur.cuni.cz
Thu Mar 21 16:24:45 UTC 2013



Peter Cock wrote:
> Hi all,
> 
> I was prompted by a recent BioPerl thread to check out how Biopython
> handles translation of partial codons:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2013-March/037085.html
> 
> Here's a tiny example, a partial sequence ending "CC". If we assume
> this is an incomplete codon, i.e. "CCN", we can translate this into an
> amino acid - in this case with the standard table it is translated
> unambiguously as proline, "P".
> 
>>>> from Bio.Seq import translate
>>>> translate("AAACCC")
> 'KP'
>>>> translate("AAACC")
> 'K'
>>>> translate("AAAC")
> 'K'
>>>> translate("AAA")
> 'K'
>>>> translate("CCN")
> 'P'
>>>> translate("CC")
> ''
>>>> translate("C")
> ''
>>>> translate("")
> ''
> 
> This behaviour surprised me, and as far as I recall this Biopython
> behaviour is undocumented. Since I rewrote the current translation
> code, I am partly to blame for not considering this corner case.
> Whatever we agree should happen will need some unit tests.
> 
> Personally I think Biopython should be raising an exception on these
> partial codons - or at least a warning, rather than as it does now
> silently ignoring them. I don't think we need yet another option here.
> 
> If the user knows they are dealing with incomplete sequences (e.g.
> partial CDS from an EST assembly or PCR product), then they can
> explicitly check the length and add "N" or "NN" to round it up to a
> whole number of codons (ensure the length is a multiple of three).
> 
> Any thoughts?

I agree that biopython should give an error as the length cannot be divided
by 3 without slack.

Martin



More information about the Biopython mailing list