[BioPython] The count method of a Seq (or MutableSeq) object

Brad Chapman chapmanb at 50mail.com
Fri Mar 6 22:46:39 UTC 2009


Me:
> > Also, unless someone has a use case for the current count()
> > function, we should deprecate and eventually remove it. Overriding
> > the string API where it makes sense is good, but here it seems to be
> > creating confusion and not solving a problem. If someone needs the
> > real string count, they can always do str(your_seq).count("GG").

Bruce:
> I have already given one user case where overlapping counts is totally 
> inappropriate! Unique codon counting

Sorry, I was a bit terse in my previous e-mail. My thought on
deprecation was actually based on your and Noel's emails; both of
you presented cases where you had biological expectations for count
which are not met by the standard string count behaviour. 

For Noel, this is handled by the proposed overlapping_count
function. For your example, I think it would be better handled by
functionality that returned a list of codons, like:

Seq("ATGGAACAT").codon_list(phase=0)
["ATG", "GAA", "CAT"]

Bruce:
> I just do not understand you logic of requiring a conversion when the 
> Seq object is designed to 'behave like a python string'.

This is representing a biological sequence, so I think where a biologist
user's intuition opposes what a standard python string does we
should evaluate for an option that is more in line with expectations.
My point about the string was just that if you are thinking as a python programmer
and really want python string behavior, it is pretty easy to get.

Peter:
> There is the very common use case of my_seq.count("A"), or similar,
> with single character search strings, and lots of code does this (both
> in Biopython and I'm sure user's scripts).  For single letters of
> course, a non-overlapping count and an overlapping count do the same
> thing - deprecating the count method would cause a lot of unnecessary
> upheaval.

Good point; I totally overlooked that. Retract my suggestion. I do
like your warning idea, but maybe we can get by here with
documentation and by highlighting the alternative fuctions.
It looked like you're already all over the documentation, so
hopefully the new functionality will fix up any confusion,

Thanks all,
Brad



More information about the Biopython mailing list