[BioPython] The count method of a Seq (or MutableSeq) object

Peter biopython at maubp.freeserve.co.uk
Fri Mar 6 14:13:42 UTC 2009


On Fri, Mar 6, 2009 at 1:14 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hey all;
> Great discussion on this. My preference is for a new function,
> and I like Leighton's naming suggestion.

Yes, "overlapping_count" is a reasonable choice.  Its a bit long, but
it is clear.

> Also, unless someone has a use case for the current count()
> function, we should deprecate and eventually remove it. Overriding
> the string API where it makes sense is good, but here it seems to be
> creating confusion and not solving a problem. If someone needs the
> real string count, they can always do str(your_seq).count("GG").

There is the very common use case of my_seq.count("A"), or similar,
with single character search strings, and lots of code does this (both
in Biopython and I'm sure user's scripts).  For single letters of
course, a non-overlapping count and an overlapping count do the same
thing - deprecating the count method would cause a lot of unnecessary
upheaval.

Ignoring that, given we want the Seq to generally behave like a python
string, I think removing the count method would still be a bad idea.

[As a compromise, assuming we add an overlapping_count method and do a
Biopython 1.50 beta release, the beta release could include a warning
in the count method when used with a multi-character search string,
suggesting the user might in fact need a non-overlapping count.  Or is
this a bit too crazy?]

Peter



More information about the Biopython mailing list