[BioPython] Question about Seq.count()
Peter
biopython at maubp.freeserve.co.uk
Thu Oct 18 10:22:22 UTC 2007
Jimmy Musselwhite wrote:
> Just kidding, it didn't work great. It only "fixed" it because I was
> printing out the output of count() and so it was just executing 100 times
> slower and thus eating RAM 100 times slower :(
>
> It doesn't seem like there is a good way for me to fix this.
Both of these are using the python string method to count "GG", the only
difference is the tostring() method has the additional small overhead of
an extra function call:
my_seq.data.count("GG")
my_seq.tostring().count("GG")
However, comparing these:
my_seq.data.count("G") # using python's string count method
my_seq.tostring().count("G") # using python's string count method
my_seq.count("G") # using an iterator internally
It could be that the Seq record's current single letter search is simply
very memory efficient compared than the python string's more flexible
multi-letter search.
How are you measuring the RAM? If like to see memory usage figures for
the five simple examples above on a large sequence - plus doing this
directly on the equivalent string.
Are you using Linux or Windows or Mac OS, and what version of python? I
know there have been some string optimisations in Python 2.5 (although I
don't know if any are relevant to the count method).
Peter
More information about the Biopython
mailing list