[Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Nov 6 20:34:36 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2381





------- Comment #50 from bsouthey at gmail.com  2008-11-06 15:34 EST -------
(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> 
> > > [seq.translate() for seq in seqlist if seq.is_cds()]
> > > 
> > > I prefer the second option, for readability, but YMMV.
> > 
> > Note the above wouldn't give you translations starting with methionine, you'd
> > need something like:
> > 
> > [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()]
> > 
> > (assuming we call the "init" option "cds_start")
> 
> Fair point... my focus was on putting that filter into the list comprehension.
> 
> > Or, going with the complete_cds option you could build a list of translations
> > of valid CDSs like this:
> > 
> > proteins = []
> > for seq in seqlist :
> >     try :
> >         proteins.append(seq.translate(complete_cds=True))
> >     except ValueError :
> >         #Not a valid CDS, excluded
> >         pass
> > 
> > Not a one liner, but I think in a real situation you'd want to do something
> > with the invalid CDSs anyway (even if just logging them).
> 
> True enough.  It comes down in part to a preference of style, as the same could
> be achieved with
> 
> proteins = []
> for seq in seqlist :
>     if seq.is_cds():
>         proteins.append(seq.translate(complete_cds=True))
>     else:
>         #Not a valid CDS, excluded
>         pass
> 
> I think the clarity of this arrangement to my eyes comes from 'is/is not a cds'
> being - naturally-speaking - a property or attribute of the sequence itself. 
> The 'cds_start' argument in your example is then an instruction to treat the
> translation as though you have a CDS, and implement some specialised behaviour
> that is appropriate under that circumstance, rather than to implement a test
> that raises an error if it is failed.  By separating the 'is_cds()' call from
> the 'cds_start' argument, you gain the ability to translate the sequence with
> either the methionine or the coded amino acid, without losing the test of the
> sequence being a CDS.
> 
> Of course, using the 'cds_start=True' argument could force a call to
> self.is_cds(), anyway.  Your non-one-liner could then be as you originally
> wrote:
> 
> proteins = []
> for seq in seqlist :
>     try:
>         proteins.append(seq.translate(complete_cds=True))
>     except ValueError:
>         #Not a valid CDS, excluded
>         pass
> 
> The two advantages I see to having the is_cds() method as a separate call are
> that it permits separation of the determining the CDS status of the sequence,
> and that it provides a filter that is more readable than attempting to
> translate the sequence to find out if it's a valid CDS.  If the 'cds_start'
> argument forces a self.is_cds() test, then the usage can be - I think - exactly
> as you've been proposing throughout the thread.
> 

The use of 'cds' alone is wrong because cds refer to DNA not translation and
not to protein sequences. The use of cds is confusing or at least vague until
you determine how it works. Also it could be wrong in the sense it is a valid
cds (see the GUG initiation in mammalian NAT1 example at the NCBI link) just
not allowed by the table in Bio.Data.CodonTable.

I don't object to the purpose, rather I do object to the name. My overriding
issue here is that 'cds_start' does not convey the purpose of this argument and
this is likely to remain for some time in the API. One interpretation that also
comes to mind is that it is the location of the start of the cds in the
sequence (cds start at...).

I really feel that the name must clearly reflect that it invokes a test that
the first codon are in the 'start_codon' list (defined by the selected table
from Bio.Data.CodonTable). This is not a check that it is the start of a cds
rather it is a check for a possible open reading frame (as not all open reading
frames are cds).  


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list