[BioPython] The count method of a Seq (or MutableSeq) object

Peter biopython at maubp.freeserve.co.uk
Thu Mar 5 12:26:13 UTC 2009


Hi All,

As the following examples show, and the python string method's
docstring clearly states, the python string's count method uses a
non-overlapping search:

>>> "AAA".count("A")
3
>>> "AAA".count("AA") # you might expect 2
1
>>> "BBBB".count("BB") # you might expect 3
2

Up until Biopython 1.44, the Seq object's count method only worked for
single characters.  From Biopython 1.45 onwards it accepted longer
strings and followed the built in python string count behaviour.
However, as Noel pointed out on Bug 2779 our docstring does not make
it clear that this does a non-overlapping search.  In fact, as
Leighton suggests, one might the Seq object to use an overlapping
search in the Seq object's count method.
http://bugzilla.open-bio.org/show_bug.cgi?id=2779

We should either:

(a) stick with the python string compatible behaviour (which has been
a general principle for the Seq class), but document this issue more
clearly as a non-overlapping search does run counter to some potential
biological uses.

or,

(b) Or change the behaviour as Leighton suggests to do an overlapping
search.  This could break any code relying on the old python
string-like behaviour.

What do people here think?  Any preferences?

[I don't want to get into details about the implementation here on the
main list]

Peter



More information about the Biopython mailing list