[Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Mon May 11 08:40:49 EDT 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2783
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-05-11 08:40 EST -------
Created an attachment (id=1298)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=1298&action=view)
Patch for Bio/Seq.py to support complete CDS translation with non-standard
start codons
I've recently been doing CDS translations for viral/bacterial genes with
alternative start codons - and would like to fix this limitation in Biopython,
rather than having to hack around it.
On Bug 2381, comment #14, I wrote:
> For comparison, the following is copied from the BioPerl documentation about
> their sequence object's translate method. It would be nice to follow some of
> the same naming conventions for any optional arguments.
>
> http://www.bioperl.org/Core/Latest/bptutorial.html#iii_3_1_manipulating_sequence_data_with_seq_methods
>
> If we want to translate full coding regions (CDS) the way major nucleotide
> databanks EMBL, GenBank and DDBJ do it, the translate() method has to perform
> more checks. Specifically, translate() needs to confirm that the sequence has
> appropriate start and terminator codons at the very beginning and the very end
> of the sequence and that there are no terminator codons present within the
> sequence in frame 0. In addition, if the genetic code being used has an
> atypical (non-ATG) start codon, the translate() method needs to convert the
> initial amino acid to methionine. These checks and conversions are triggered
> by setting ``complete'' to 1:
>
> $prot_obj = $my_seq_object->translate(-complete => 1);
>
On Bug 2381, comment #51, Leighton wrote:
> In terms of nomenclature:
>
> The default behaviour of translate() as Peter proposed: read through in-frame
> and translate with the appropriate codon table - is fine in nearly all
> circumstances. Most other circumstances are covered by stopping at the first
> in-frame stop codon, which Peter has implemented, and is an option we all seem
> to agree on.
>
> Biologically-speaking, this behaviour is not always correct for CDS in
> prokaryotes, where alternative start codons may occur a significant minority
> of the time. These will be mistranslated if no provision is made for them. I
> think a useful biological sequence object should at least try to mimic actual
> biology, so we should provide an option to handle this.
>
> We should not assume that a sequence is a CDS unless it is specified by the
> user. It seems reasonable to me that the term 'cds' should occur in any such
> argument from the user.
>
> We have at least two options for how to proceed with a CDS: i) we can provide
> a strict CDS-type translation, which requires confirmation that the sequence
> is, in fact, a CDS; ii) we can provide a weak CDS-type translation, which only
> modifies the way the start codon is translated. In both cases, behaviour is
> specific to CDS, and so having 'cds' in the argument name *somewhere* seems
> obvious, and entirely reasonable.
Leighton's option (ii) is start codon only modification. This is what I
implemented in the patch on comment 1 (attachment 1259). We haven't agreed on
a good name for this - which is partly why I went back to revisit the
alternative:
Leighton's option (i) is strict CDS-type translation. As Leighton suggests,
having "cds" in the argument name here makes sense. Regarding the BioPerl
argument name for this functionality, "complete", on Bug 2381 comment 19,
Martin wrote:
> The "complete" is a cryptic naming, I wouldn't be fond of it.
>
I think you are both right about the naming. Would complete_cds=True would be
clear? In fact, I quite like the idea of using cds=True which is short and
also fairly clear. This patch adds a complete_cds=Boolean argument to the
Bio.Seq translate methods and function, which should act like the BioPerl
equivalent. It includes doctests showing the new functionality.
I would like to use either of these approaches in Biopython - but not both ;)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list