[Biopython] Translation of partial codons

Ivan Gregoretti ivangreg at gmail.com
Thu Mar 28 14:55:23 UTC 2013


Hello Peter,

Given that CCC is one of the fourth codons for Proline:

In [132]: Seq('CCC').translate(table=1)
Out[132]: Seq('P', ExtendedIUPACProtein())

Would you please tell us which of the next two cases will rise an
exception in the future?

In [133]: Seq('CC').translate(table=1)
Out[133]: Seq('', ExtendedIUPACProtein())

In [134]: Seq('CCN').translate(table=1)
Out[134]: Seq('P', ExtendedIUPACProtein())


Thank you,

Ivan


Ivan Gregoretti, PhD




On Sat, Mar 23, 2013 at 10:55 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Mar 21, 2013 at 5:56 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Mar 21, 2013, at 12:19 PM, Peter Cock <p.j.a.cock at googlemail.com>
>>  wrote:
>>
>>> On Thu, Mar 21, 2013 at 5:10 PM, Iddo Friedberg <idoerg at gmail.com> wrote:
>>>> Suggstions so far:
>>>> 1. Raise an exception. This may cause  code running on existing data to
>>>> change behavior. I.e. it ran before well on bad length sequences, but as of
>>>> the new code installtion, things will break.
>>>
>>> Yes, but in most cases this will be a good thing. The minority of people
>>> knowingly dealing with partial sequences can make this explicit by first
>>> ensuring their sequence is a multiple of three in length (by padding or
>>> cropping as most appropriate to their use case).
>>>
>>>> 2. Add a default length_check=True to the translate method. Again, this may
>>>> cause exiting code to behave differently wiht the same data once user
>>>> upgrades. Unless the user explicitly changes the call to
>>>> myseq.translate(length_check=False)
>>>
>>> A sensible approach to making likely errors explicit, with an easy work-around
>>> for the old implicit truncation. The downside is yet another argument to the
>>> translate functions/methods, which are already pretty complicated. I prefer (1).
>>>
>>>> 3. My suggestion: use length_check=False as default. Code behaves the same
>>>> as before, so no data-induced breakages. If the user wants to check length,
>>>> the explicitly pass a True value. So we give the option of checking length,
>>>> and retaining code-behavior legacy.
>>>>
>>>> length_check, being an argument, does not need to be passed explicitly.
>>>
>>> I don't like this, even though it is backwards compatible for the corner
>>> case. I think the old behaviour is a bug.
>>>
>>> Regards,
>>>
>>> Peter
>>
>> That's basically the approach we took on the bioperl end, e.g. the
>> old behavior was an unintended bug (it was a little more complex
>> than that in reality, but in essence it boils down to that).  It was
>> too magic, and the old behavior can be regained with a parameter
>> setting.  I don't think we throw an exception, but maybe we should...
>>
>> Anyway, I would think going with something that would be following
>> the tenant of least surprise would be very python-esque :)
>>
>> chris
>
> I started work on making this an exception, and from our test suite
> realised that simple ORF finding is an example where this change
> is likely to be noticed. I have therefore for now just added a new
> warning if translating partial codons, which can be upgraded to a
> full exception in future (or removed depending on how people react).
>
> https://github.com/biopython/biopython/commit/c0112a7b79a61eabe0adea78bb70d572f1950cde
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list