[Biopython] Translation of partial codons

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 21 15:58:22 UTC 2013


Hi all,

I was prompted by a recent BioPerl thread to check out how Biopython
handles translation of partial codons:

http://lists.open-bio.org/pipermail/bioperl-l/2013-March/037085.html

Here's a tiny example, a partial sequence ending "CC". If we assume
this is an incomplete codon, i.e. "CCN", we can translate this into an
amino acid - in this case with the standard table it is translated
unambiguously as proline, "P".

>>> from Bio.Seq import translate
>>> translate("AAACCC")
'KP'
>>> translate("AAACC")
'K'
>>> translate("AAAC")
'K'
>>> translate("AAA")
'K'
>>> translate("CCN")
'P'
>>> translate("CC")
''
>>> translate("C")
''
>>> translate("")
''

This behaviour surprised me, and as far as I recall this Biopython
behaviour is undocumented. Since I rewrote the current translation
code, I am partly to blame for not considering this corner case.
Whatever we agree should happen will need some unit tests.

Personally I think Biopython should be raising an exception on these
partial codons - or at least a warning, rather than as it does now
silently ignoring them. I don't think we need yet another option here.

If the user knows they are dealing with incomplete sequences (e.g.
partial CDS from an EST assembly or PCR product), then they can
explicitly check the length and add "N" or "NN" to round it up to a
whole number of codons (ensure the length is a multiple of three).

Any thoughts?

Thanks,

Peter



More information about the Biopython mailing list