[BioPython] Seq

Andrew Dalke dalke@bioreason.com
Wed, 15 Sep 1999 11:15:00 -0600


Bradley Marshall <bradbioperl@yahoo.com>:
> I hacked a little at a sequence object.  I need all the feedback you
> can give me.  I'm tyring to follow the LSR IDL.  Whaddayathink?

Jeff answered some already.  Here's my response.

I'll start with a comment about the LSR.  It is designed for read-only
use; except for the ability to add annotations.  So, do you
want sequences to be mutable or immutable?  That is, should something
like

  seq = Bio.Seq("EKALDWERDNA")
  seq[2:4] = "LA"
  print seq.to_str()
EKLADWERDNA

be allowed?

I would prefer an immutable design, where the underlying sequence
for the Seq object not be changed.  My reasons are pretty much the
same as why Python doesn't make immutable strings (eg, so you can
index them in a dict/hash table).

Saying that, that's also why I like having very light-weight objects.
If you have to create many different objects to do something, then
you want to reduce copying information like the id, desc, etc.


Now, about your implementation of it.  There are some things you
can do to make it more Pythonic.  Take your pyDNA1 class, which
does a DNA object.  You could also implement the functions
__len__, __getitem__ and __getslice__, like this:

    def __len__(self):
        return self.length
    def __getitem__(self, i):
        return self.seq[i]
    def __getslice__(self, i, j):
        return self.seq[i:j]

This let you have the seqence imitate a string.  Another question is,
would you want the regions to be their own objects, like

    def __getslice__(self, i, j):
        return Bio.Seq(self.seq[i:j])

If you support this, you'll need to start making up fake id&desc
terms for them (again, you can see that I'm pushing a lighter-weight
design).

Finally, on variable names.  I prefer the following following naming
scheme:
  CONSTANTS are in UPPER CASE
  ModuleNames and ClassNames are in MixedCase
  variable_names, function_names and method_names are
        underscore_seperated_lower_case

There are, of course, exceptions.  Still, it is nice to be consistent
and you have things like:
>                self.SeqRegion = region
>                self.name = name

						Andrew
						dalke@bioreason.com