[Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Nov 4 16:11:49 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2381





------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-04 11:11 EST -------
(In reply to comment #32)
> > In which of these examples do you understand that the first position is
> > being forced to a Methionine?

With my suggested code, you would not just be forcing the first codon to be a
methionine.  You would also be asking for the first codon to be validated as a
start codon (initialisation codon).

> None are particularly clear, but only one of them doesn't give me the wrong
> idea...

In some cases I seem to have guessed different possible meanings for some of
these suggested names - so those are probably unclear.

> > >>> translate("TTGAAACCCTAG", init=True, to_stop=True)
> 
> Because I've read this thread (or looked at the docs) - I understand this one
> ;)

To me this suggests something special is happening with the initialisation of
the translation - but I agree its not clear what without checking the
documentation.

> > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True)
> 
> I don't intuitively understand this.  Does it mean that the sequence should be
> translatable?

Ditto - an argument called force_as_translating means nothing to me.  You're
calling a translation method so what can forcing a translation mean?

> > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True)
> 
> Does this mean that the sequence will be translated from the first methionine
> the method finds?

I would have guessed force_methionine would ignore the value of the first three
nucleotides in order to treat them as a methionine (even if they are not a
start codon).

> > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True)
> 
> As above, and does force_stop mean that you add a '*' to the end of the
> translation?  Or that you stop at a stop codon?

Like Leighton, I would be confused by "force_stop".  It could mean add a stop
symbol to the end of the amino acid sequence even if there isn't one there
already.

> > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True)
> 
> 'alt_start' I would think referred to allowing translation from alternative
> start codons.  I don't know what alt_stop would mean...

I think "alt_start" would be misleading for the intended dual functionality. 
Consider the typical use case for this option - translating a CDS, which most
of the time will use the typical start codon AUG / ATG (but not all ways). 
We'd want the start codon validated - and it often won't be an alternative
start codon.  So calling the argument "alt_start" is confusing.

> > Also, I don't think this option will be used very often. 
> 
> Maybe not.  The first use case that comes to mind is QA on CDS-finding:
> 
> # Check if sequence is CDS:
> assert candidate_cds.translate(init=True)
> # Check if reported CDS start is valid
> assert est[37:].translate(init=True)
> 
> A second use case is slower in presenting itself...

I think translating a CDS is quite a common task - so a very long argument
would be bad.

Instead of the "init" start codon option in attachment 1032, I'd also be happy
with a single boolean argument which does start codon validation, treats this
as a methionine, checks the sequence is a multiple of three in length, checks
for a final stop codon, and checks for no additional stop codons.  We'd ruled
out calling this "complete", but maybe "cds" would be better?

> > So, it shouldn't be a problem if its name is too long to type, and it would
> > be better if it is easy to understand.
> 
> That's a fair argument, I think.  On the whole, though, I would favour a
> short, unambiguous, slightly cryptic name over a very long, unambiguous
> name, over an ambiguous name of any length.

There is a lot of subjectiveness in argument naming - clearly we have not come
up with a perfect suggestion yet.

Unfortunately "init" can be misunderstood (I'm not 100% sure what you were
trying to say in comment 31, but I think you thought from the name "init" could
be some sort of optional optimisation initialisation).

How about "cds_start" instead of "init"?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list