[Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Oct 20 05:14:42 EDT 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2381





------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk  2008-10-20 05:14 EST -------
Martin wrote in comment #19:
> Peter wrote in comment #18:
> 
> > e.g. As I wrote in comment 17,
> > > I'm thinking we could also support "start" and "end" optional arguments 
> > > (named after those used in the python string methods, and behaving in
> > > the same way) for specifying a sub-sequence to be translated.  Using
> > > start=0, 1 or 2 would give the three forward reading frames.
> > 
> > This would give an alternative to:
> > 
> > my_seq[i:j].translate(table)
> > 
> > as:
> > 
> > my_seq.translate(table, start=i, end=j)
> > 
> > As with the python string methods, potentially the implementation could
> > be slightly faster as a new Seq object doesn't need to be created for
> > the slice. On the other hand, it does then offer two ways of doing the
> > same thing.
> 
> The second approach would be I think often handy.

If we did add this, then arguably we should do this for all the other methods
too (transcribe, reverse_complement, etc).  I'm not convinced this adds any
value.  Martin, why do you like the second approach (using start & end
arguments) over the first (slicing the sequence before translation)?

------------------------------------------------------

Using BioPerl's idea of a "complete" argument (boolean) isn't popular:

Martin wrote in comment #19
>>
>> The "complete" is a cryptic naming, I wouldn't be fond of it...
>>

Leighton wrote in comment #21
>
> Ditto the 'complete' naming - it's not clear at all.
>

This was to control two related features:

(a) Validate the first codon is a valid start codon, and translate it as M
(even if going on the genetic code it would normally be say L).  This should be
a boolean argument defaulting to False, possible names "start", "check_start",
"from_start", ...

Variations on this like "find the first in frame start codon" are getting into
gene/ORF finding and I don't see this are part of the remit for a translate
method.

(b) Stop translating at the first in frame stop codon (see my comment 18). 
Again, a boolean argument, and for compatibility with previous Biopython
conventions, defaulting to False (i.e. read through).  Possible names "stop",
"to_stop", "auto_stop", "terminate", ...

In this case, how should the method behave if there is no final stop codon -
raise an error or not?  Also should the stop codon be included in the returned
sequence (note that the Bio.Translate module did not include the stop symbol).

You might want to control these two options independently, so having them as
two arguments is more flexible.

------------------------------------------------------

This bug has started discussing ORF/gene finding - I see this as separate to
the translate method.  Could we do this on the mailing list or a separate bug
please?

------------------------------------------------------

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list