[Biopython-dev] derive from Seq

Eric Talevich eric.talevich at gmail.com
Sun Feb 21 16:36:13 UTC 2010


On Sun, Feb 21, 2010 at 7:03 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Sat, Feb 20, 2010 at 7:01 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > I've seen a technique like this used to good effect:
> >
> > ...
> >
> > The same functionality is then available in a functional or OO style,
> with
> > minimal code duplication. And for interactive sessions, where converting
> > strings to Seqs is a bit more of an inconvenience, "from Bio.Seq import
> *"
> > becomes quick and handy.
>
> Doesn't that describe the Bio.Seq module as it is pretty well?
> In addition to the Seq object methods, there are several functions
> which can be used on strings or Seq (like) objects.
>
> Peter
>

I'm not fully up to speed on the debate or the use cases that triggered it,
but I'm guessing the goal is better code flexibility without sacrificing
performance. Here's some code to consider:

def transcribe(dna, alphabet=None):
    """Transcribe a DNA sequence into RNA. Returns a string."""
    if isinstance(dna, Seq) or isinstance(dna, MutableSeq):
        # At first, maybe issue a warning here
        alphabet = dna.alphabet
        dna = str(dna)
    if alphabet is not None:
        # Validate
        base = Alphabet._get_base_alphabet(alphabet)
        if isinstance(base, Alphabet.ProteinAlphabet):
            raise ValueError("Proteins cannot be transcribed!")
        if isinstance(base, Alphabet.RNAAlphabet):
            raise ValueError("RNA cannot be transcribed!")
    return dna.replace('T','U').replace('t','u')

class Seq:
    # ...
    def transcribe(self):
        transcript = transcribe(self._data)
        # Rebuild the Seq object
        if self.alphabet==IUPAC.unambiguous_dna:
            alphabet = IUPAC.unambiguous_rna
        elif self.alphabet==IUPAC.ambiguous_dna:
            alphabet = IUPAC.ambiguous_rna
        else:
            alphabet = Alphabet.generic_rna
        return Seq(transcript, alphabet)


Notes:
 - The standalone takes an optional 'alphabet' argument, and performs
validation if requested.
 - Since the standalone function now has the same functionality as the Seq
method, Seq can dispatch to the function -- rather than the other way
around, as it is currently -- and then just rebuild a Seq object.
 - The standalone function now always returns the same type (str). Since
this might break some existing code, a little shim and deprecation dance may
be needed in real life. But I think returning a plain string is the Right
Thing: there's "one obvious way" to work with Seq objects or plain strings.
 - If the grand proposal is to eventually move the alphabet attribute to
SeqRecord, this provides an intermediate step and a more convenient
foundation for testing the idea.

Best,
Eric



More information about the Biopython-dev mailing list