[Biopython] define circular DNA (?)

Peter Cock p.j.a.cock at googlemail.com
Tue Mar 8 13:24:07 UTC 2011


On Tue, Mar 8, 2011 at 12:12 PM, Peter Cherepanov
<p.cherepanov at imperial.ac.uk> wrote:
> ideally, it would be an object were the last letter is hard-linked to the first. For example, we should be able to define:
>
> c = CircularSeq('ATGCGGGGA')
>
> where:
>
> c[1:9]  equals  ATGCGGGGA   (or, more awkwardly, c[0:9], if the original
> Python string numbering must be retained for some reasons)
> c[8:7]  equals  GAATGCATG
> c[1:1] equals A  (on a python string it is c[0:1]  =  A, of course)
>
> Ideally, we would want to number such sequences from 1, after all these
> are the kind of objects we deal in biology.

Absolutely not - it would put the circular sequence completely out of
sync with the existing sequence objects in Biopython and the Python
string. Don't worry - you'll get used to zero based counting, and
the Python slicing is very beautiful once you understand it.

> And, most importantly of all, if must be able to:
> c.find('GGAATG') to return "7"
>

Well, 6 in zero based counting, but yes, that would be the expected
result for find (and similarly for rfind). We'd also need to do something
with the split and rsplit methods to include looking for matches over
the origin.

Peter




More information about the Biopython mailing list