[BioPython] Question about Seq.count()

Thu Oct 18 12:48:41 UTC 2007

Peter
Well after a day of not thinking very hard I found my problem and it didn't
have anything to do with strings at all. That was just my best guess at the
time of writing this e-mail. Sorry about that =(

On 10/18/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Jimmy Musselwhite wrote:
> > Just kidding, it didn't work great. It only "fixed" it because I was
> > printing out the output of count() and so it was just executing 100
> times
> > slower and thus eating RAM 100 times slower :(
> >
> > It doesn't seem like there is a good way for me to fix this.
>
> Both of these are using the python string method to count "GG", the only
> difference is the tostring() method has the additional small overhead of
> an extra function call:
>
> my_seq.data.count("GG")
> my_seq.tostring().count("GG")
>
> However, comparing these:
>
> my_seq.data.count("G")         # using python's string count method
> my_seq.tostring().count("G")   # using python's string count method
> my_seq.count("G")              # using an iterator internally
>
> It could be that the Seq record's current single letter search is simply
> very memory efficient compared than the python string's more flexible
> multi-letter search.
>
> How are you measuring the RAM?  If like to see memory usage figures for
> the five simple examples above on a large sequence - plus doing this
> directly on the equivalent string.
>
> Are you using Linux or Windows or Mac OS, and what version of python?  I
> know there have been some string optimisations in Python 2.5 (although I
> don't know if any are relevant to the count method).
>
> Peter
>
>