[Biopython-dev] Circular sequences
Antony Lee
antony.lee at berkeley.edu
Tue Jan 15 21:45:19 UTC 2013
Hi all,
While working on a (more sane?) rewrite of the Restriction library
(https://github.com/biopython/biopython/pull/148), I found the need
to add a circular/linear attribute to sequence objects (just as the
currently existing Restriction library does). So I quickly added such
a class, independently of whatever Biopython currently provides. But
it seems like the module would be better integrated in the rest of
Biopython if it used Bio.Seq.Seq instead.
I saw that CircularSeqs have already been discussed on the mailing
list, and the main issue was with indexing and slicing. So here are my
thoughts about how such an object should behave. Assume a circular seq
s of length 10. Simple indexing works modulo 10 (and negative indices
work identically). Methods that return one or more indices return the
indices modulo 10. Slicing with both ends defined (i.e. s[x:y(:z)])
wrap as many times as needed around the sequence if y >= x, and make at
most one complete cycle if y < x (i.e. add len(s) as many times as
needed to y to make it bigger than x, and stop there). Slicing with one
or both ends undefined (ie. s[:], s[x:], s[:y]) raises an IndexError
(because, well, I read s[x:] as "return the elements of s starting from
the x'th until the end"... but there is no such end.). (A second option
would be to return an infinite iterable for s[x:], but that doesn't take
care of s[:y] anyways, not to mention the bugs that may appear from
that.)
A few other issues were addressed in the previous thread. I think that
adding CircularSeqs does not make sense at all (so __add__ raises a
ValueError), and translation can either check for the presence of a stop
codon and raise ValueError otherwise, or return an infinite iterator.
Another thing that may be useful for a restriction analysis library is a
good way to represent a dsDNA sequence with some overhangs. Any
thoughts?
Antony
More information about the Biopython-dev
mailing list