[BioPython] Question about Seq.count()
Peter
biopython at maubp.freeserve.co.uk
Wed Oct 17 22:03:51 UTC 2007
Jimmy Musselwhite wrote:
> Now the code I want to do is
> record.seq.count(search)
>
> but what I am forced to do is
> record.seq.tostring().count(search)
>
> The problem here is that when I am forced to use .tostring() on every single
> seq object it devastates my memory usage in a BIG way. It eats up about
> 1.2gigs and then crashes. If I remove the .tostring() and just tell if to
> search for 'A', it will run fine and use memory at about 1/100th the rate
In the short term, try record.seq.data.count(search) which is what the
tostring() method is doing anyway (the Seq object stores the sequence
internally as a string). Does that help?
We might be tweaking the Seq object after the next release to act a bit
more like a string - at which point the .data property might go away.
> So my question sums down to, is there any way to make .count() be able to
> search for strings and not just characters?
You I'd never noticed that - I would call it a bug...
>>> from Bio.Seq import Seq
>>> my_seq = Seq("AAACACACGGTTTT")
>>> my_seq.data.count("GG")
1
>>> my_seq.data.count("G")
2
>>> my_seq.tostring().count("G")
2
>>> my_seq.tostring().count("GG")
1
>>> my_seq.count("G")
2
>>> my_seq.count("GG")
0
Peter
More information about the Biopython
mailing list