[Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Nov 6 17:32:52 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2381





------- Comment #48 from lpritc at scri.sari.ac.uk  2008-11-06 12:32 EST -------
(In reply to comment #47)
> (In reply to comment #46)

> > [seq.translate() for seq in seqlist if seq.is_cds()]
> > 
> > I prefer the second option, for readability, but YMMV.
> 
> Note the above wouldn't give you translations starting with methionine, you'd
> need something like:
> 
> [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()]
> 
> (assuming we call the "init" option "cds_start")

Fair point... my focus was on putting that filter into the list comprehension.

> Or, going with the complete_cds option you could build a list of translations
> of valid CDSs like this:
> 
> proteins = []
> for seq in seqlist :
>     try :
>         proteins.append(seq.translate(complete_cds=True))
>     except ValueError :
>         #Not a valid CDS, excluded
>         pass
> 
> Not a one liner, but I think in a real situation you'd want to do something
> with the invalid CDSs anyway (even if just logging them).

True enough.  It comes down in part to a preference of style, as the same could
be achieved with

proteins = []
for seq in seqlist :
    if seq.is_cds():
        proteins.append(seq.translate(complete_cds=True))
    else:
        #Not a valid CDS, excluded
        pass

I think the clarity of this arrangement to my eyes comes from 'is/is not a cds'
being - naturally-speaking - a property or attribute of the sequence itself. 
The 'cds_start' argument in your example is then an instruction to treat the
translation as though you have a CDS, and implement some specialised behaviour
that is appropriate under that circumstance, rather than to implement a test
that raises an error if it is failed.  By separating the 'is_cds()' call from
the 'cds_start' argument, you gain the ability to translate the sequence with
either the methionine or the coded amino acid, without losing the test of the
sequence being a CDS.

Of course, using the 'cds_start=True' argument could force a call to
self.is_cds(), anyway.  Your non-one-liner could then be as you originally
wrote:

proteins = []
for seq in seqlist :
    try:
        proteins.append(seq.translate(complete_cds=True))
    except ValueError:
        #Not a valid CDS, excluded
        pass

The two advantages I see to having the is_cds() method as a separate call are
that it permits separation of the determining the CDS status of the sequence,
and that it provides a filter that is more readable than attempting to
translate the sequence to find out if it's a valid CDS.  If the 'cds_start'
argument forces a self.is_cds() test, then the usage can be - I think - exactly
as you've been proposing throughout the thread.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list