[Biopython-dev] derive from Seq
Eric Talevich
eric.talevich at gmail.com
Sun Feb 21 11:36:13 EST 2010
On Sun, Feb 21, 2010 at 7:03 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:
> On Sat, Feb 20, 2010 at 7:01 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > I've seen a technique like this used to good effect:
> >
> > ...
> >
> > The same functionality is then available in a functional or OO style,
> with
> > minimal code duplication. And for interactive sessions, where converting
> > strings to Seqs is a bit more of an inconvenience, "from Bio.Seq import
> *"
> > becomes quick and handy.
>
> Doesn't that describe the Bio.Seq module as it is pretty well?
> In addition to the Seq object methods, there are several functions
> which can be used on strings or Seq (like) objects.
>
> Peter
>
I'm not fully up to speed on the debate or the use cases that triggered it,
but I'm guessing the goal is better code flexibility without sacrificing
performance. Here's some code to consider:
def transcribe(dna, alphabet=None):
"""Transcribe a DNA sequence into RNA. Returns a string."""
if isinstance(dna, Seq) or isinstance(dna, MutableSeq):
# At first, maybe issue a warning here
alphabet = dna.alphabet
dna = str(dna)
if alphabet is not None:
# Validate
base = Alphabet._get_base_alphabet(alphabet)
if isinstance(base, Alphabet.ProteinAlphabet):
raise ValueError("Proteins cannot be transcribed!")
if isinstance(base, Alphabet.RNAAlphabet):
raise ValueError("RNA cannot be transcribed!")
return dna.replace('T','U').replace('t','u')
class Seq:
# ...
def transcribe(self):
transcript = transcribe(self._data)
# Rebuild the Seq object
if self.alphabet==IUPAC.unambiguous_dna:
alphabet = IUPAC.unambiguous_rna
elif self.alphabet==IUPAC.ambiguous_dna:
alphabet = IUPAC.ambiguous_rna
else:
alphabet = Alphabet.generic_rna
return Seq(transcript, alphabet)
Notes:
- The standalone takes an optional 'alphabet' argument, and performs
validation if requested.
- Since the standalone function now has the same functionality as the Seq
method, Seq can dispatch to the function -- rather than the other way
around, as it is currently -- and then just rebuild a Seq object.
- The standalone function now always returns the same type (str). Since
this might break some existing code, a little shim and deprecation dance may
be needed in real life. But I think returning a plain string is the Right
Thing: there's "one obvious way" to work with Seq objects or plain strings.
- If the grand proposal is to eventually move the alphabet attribute to
SeqRecord, this provides an intermediate step and a more convenient
foundation for testing the idea.
Best,
Eric
More information about the Biopython-dev
mailing list