[BioPython] The count method of a Seq (or MutableSeq) object
Bruce Southey
bsouthey at gmail.com
Fri Mar 6 15:06:07 UTC 2009
Peter wrote:
> On Fri, Mar 6, 2009 at 1:14 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
>> Hey all;
>> Great discussion on this. My preference is for a new function,
>> and I like Leighton's naming suggestion.
>>
>
> Yes, "overlapping_count" is a reasonable choice. Its a bit long, but
> it is clear.
>
>
>> Also, unless someone has a use case for the current count()
>> function, we should deprecate and eventually remove it. Overriding
>> the string API where it makes sense is good, but here it seems to be
>> creating confusion and not solving a problem. If someone needs the
>> real string count, they can always do str(your_seq).count("GG").
>>
I have already given one user case where overlapping counts is totally
inappropriate! Unique codon counting is extremely important in many
areas including gene prediction (possible splicing sites) and molecular
evolution (like codon usage).
Another valid case given was DNA restriction sites were you may want
both overlapping and unique counts. For example, if DNA is digested by
one enzyme that has unique sites in the sequence then followed by a
second enzyme that has unique sites in the digested product but possibly
duplicates in the original sequence.
I just do not understand you logic of requiring a conversion when the
Seq object is designed to 'behave like a python string'.
>
> There is the very common use case of my_seq.count("A"), or similar,
> with single character search strings, and lots of code does this (both
> in Biopython and I'm sure user's scripts). For single letters of
> course, a non-overlapping count and an overlapping count do the same
> thing - deprecating the count method would cause a lot of unnecessary
> upheaval.
>
> Ignoring that, given we want the Seq to generally behave like a python
> string, I think removing the count method would still be a bad idea.
>
I agree.
> [As a compromise, assuming we add an overlapping_count method and do a
> Biopython 1.50 beta release, the beta release could include a warning
> in the count method when used with a multi-character search string,
> suggesting the user might in fact need a non-overlapping count. Or is
> this a bit too crazy?]
>
Yes it is too crazy and does not fit into the current established
behavior of Biopython.
Bruce
More information about the Biopython
mailing list