[BioPython] Question about Seq.count()

Peter biopython at maubp.freeserve.co.uk
Wed Oct 17 22:03:51 UTC 2007


Jimmy Musselwhite wrote:
> Now the code I want to do is
> record.seq.count(search)
> 
> but what I am forced to do is
> record.seq.tostring().count(search)
 >
> The problem here is that when I am forced to use .tostring() on every single
> seq object it devastates my memory usage in a BIG way. It eats up about
> 1.2gigs and then crashes. If I remove the .tostring() and just tell if to
> search for 'A', it will run fine and use memory at about 1/100th the rate

In the short term, try record.seq.data.count(search) which is what the 
tostring() method is doing anyway (the Seq object stores the sequence 
internally as a string).  Does that help?

We might be tweaking the Seq object after the next release to act a bit 
more like a string - at which point the .data property might go away.

> So my question sums down to, is there any way to make .count() be able to
> search for strings and not just characters?

You I'd never noticed that - I would call it a bug...

 >>> from Bio.Seq import Seq
 >>> my_seq = Seq("AAACACACGGTTTT")
 >>> my_seq.data.count("GG")
1
 >>> my_seq.data.count("G")
2
 >>> my_seq.tostring().count("G")
2
 >>> my_seq.tostring().count("GG")
1
 >>> my_seq.count("G")
2
 >>> my_seq.count("GG")
0

Peter




More information about the Biopython mailing list