[BioPython] The count method of a Seq (or MutableSeq) object

Bruce Southey bsouthey at gmail.com
Fri Mar 6 15:06:07 UTC 2009


Peter wrote:
> On Fri, Mar 6, 2009 at 1:14 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>   
>> Hey all;
>> Great discussion on this. My preference is for a new function,
>> and I like Leighton's naming suggestion.
>>     
>
> Yes, "overlapping_count" is a reasonable choice.  Its a bit long, but
> it is clear.
>
>   
>> Also, unless someone has a use case for the current count()
>> function, we should deprecate and eventually remove it. Overriding
>> the string API where it makes sense is good, but here it seems to be
>> creating confusion and not solving a problem. If someone needs the
>> real string count, they can always do str(your_seq).count("GG").
>>     
I have already given one user case where overlapping counts is totally 
inappropriate! Unique codon counting is extremely important in many 
areas including gene prediction (possible splicing sites) and molecular 
evolution (like codon usage).

Another valid case given was DNA restriction sites were you may want 
both overlapping and unique counts. For example, if DNA is digested by 
one enzyme that has unique sites in the sequence then followed by a 
second enzyme that has unique sites in the digested product but possibly 
duplicates in the original sequence.

I just do not understand you logic of requiring a conversion when the 
Seq object is designed to 'behave like a python string'.

>
> There is the very common use case of my_seq.count("A"), or similar,
> with single character search strings, and lots of code does this (both
> in Biopython and I'm sure user's scripts).  For single letters of
> course, a non-overlapping count and an overlapping count do the same
> thing - deprecating the count method would cause a lot of unnecessary
> upheaval.
>
> Ignoring that, given we want the Seq to generally behave like a python
> string, I think removing the count method would still be a bad idea.
>   
I agree.
> [As a compromise, assuming we add an overlapping_count method and do a
> Biopython 1.50 beta release, the beta release could include a warning
> in the count method when used with a multi-character search string,
> suggesting the user might in fact need a non-overlapping count.  Or is
> this a bit too crazy?]
>   
Yes it is too crazy and does not fit into the current established 
behavior of Biopython.

Bruce



More information about the Biopython mailing list