[Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Jun 10 17:51:48 EDT 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2783


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-06-10 17:51 EST -------
(In reply to comment #5)
> 
> On Bug 2381, comment #51, Leighton wrote:
> > In terms of nomenclature:
> > 
> > The default behaviour of translate() as Peter proposed: read through
> > in-frame and translate with the appropriate codon table - is fine in
> > nearly all circumstances.  Most other circumstances are covered by
> > stopping at the first in-frame stop codon, which Peter has implemented,
> > and is an option we all seem to agree on.
> > 
> > Biologically-speaking, this behaviour is not always correct for CDS in
> > prokaryotes, where alternative start codons may occur a significant
> > minority of the time.  These will be mistranslated if no provision is
> > made for them.  I think a useful biological sequence object should at
> > least try to mimic actual biology, so we should provide an option to
> > handle this.
> > 
> > We should not assume that a sequence is a CDS unless it is specified by
> > the user.  It seems reasonable to me that the term 'cds' should occur in
> > any such argument from the user.
> > 
> > We have at least two options for how to proceed with a CDS: i) we can
> > provide a strict CDS-type translation, which requires confirmation that
> > the sequence is, in fact, a CDS; ii) we can provide a weak CDS-type
> > translation, which only modifies the way the start codon is translated.
> > In both cases, behaviour is specific to CDS, and so having 'cds' in the
> > argument name *somewhere* seems obvious, and entirely reasonable.
> 
> Leighton's option (ii) is start codon only modification.  This is what
> I implemented in the patch on comment 1 (attachment 1259 [details]).
> We haven't agreed on a good name for this - which is partly why I went
> back to revisit the alternative:
> 
> Leighton's option (i) is strict CDS-type translation.  As Leighton suggests,
> having "cds" in the argument name here makes sense. ...

After some reflection I have decided to check in code doing what Leighton
called option (i), strict CDS-type translation (as provided in BioPerl via
their "complete" argument). This code was based on the above patch (attachment
1298), but with the check for an extra in frame stop codon (which was missing
but described in the docstrings).

I also went with the shorter argument name, just "cds" rather than
"complete_cds", but (until the next release) I am open to changing this new
option name. Please bring this up on the mailing list if you don't like "cds"
or thing it is unclear. Thanks.

Marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list