[BioPython] Question about Seq.count()

Jimmy Musselwhite jimmy.musselwhite at gmail.com
Wed Oct 17 22:52:07 UTC 2007


Just kidding, it didn't work great. It only "fixed" it because I was
printing out the output of count() and so it was just executing 100 times
slower and thus eating RAM 100 times slower :(

It doesn't seem like there is a good way for me to fix this.

On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
>
> Thanks guys! That worked great.
>
> On 10/17/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
> >
> > Jimmy Musselwhite wrote:
> > > Now the code I want to do is
> > > record.seq.count(search)
> > >
> > > but what I am forced to do is
> > > record.seq.tostring().count(search)
> > >
> > > The problem here is that when I am forced to use .tostring() on every
> > single
> > > seq object it devastates my memory usage in a BIG way. It eats up
> > about
> > > 1.2gigs and then crashes. If I remove the .tostring() and just tell if
> > to
> > > search for 'A', it will run fine and use memory at about 1/100th the
> > rate
> >
> > In the short term, try record.seq.data.count (search) which is what the
> > tostring() method is doing anyway (the Seq object stores the sequence
> > internally as a string).  Does that help?
> >
> > We might be tweaking the Seq object after the next release to act a bit
> > more like a string - at which point the .data property might go away.
> >
> > > So my question sums down to, is there any way to make .count() be able
> > to
> > > search for strings and not just characters?
> >
> > You I'd never noticed that - I would call it a bug...
> >
> > >>> from Bio.Seq import Seq
> > >>> my_seq = Seq("AAACACACGGTTTT")
> > >>> my_seq.data.count("GG")
> > 1
> > >>> my_seq.data.count("G")
> > 2
> > >>> my_seq.tostring().count("G")
> > 2
> > >>> my_seq.tostring().count("GG")
> > 1
> > >>> my_seq.count("G")
> > 2
> > >>> my_seq.count("GG")
> > 0
> >
> > Peter
> >
> >
>



More information about the Biopython mailing list