[BioPython] Question about Seq.count()

Jimmy Musselwhite jimmy.musselwhite at gmail.com
Wed Oct 17 23:06:03 UTC 2007


Man I"m sorry, I didn't read that well enough. It doesn't work for you
either. I'm gonna stop responding to this e-mail now :) I'm clearly tired or
something.


On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
>
> In response to the first reply you gave me, where you said this
>
> You I'd never noticed that - I would call it a bug...
>
>  >>> from Bio.Seq import Seq
>  >>> my_seq = Seq("AAACACACGGTTTT")
>  >>> my_seq.data.count("GG")
> 1
>  >>> my_seq.data.count("G")
> 2
>  >>> my_seq.tostring().count("G")
> 2
>  >>> my_seq.tostring().count("GG")
> 1
>  >>> my_seq.count("G")
> 2
>  >>> my_seq.count("GG")
> 0
>
>
> I've tried that many many times and I always get 0 when I do
> my_seq.count("GG")
> I just rebuilt biopython from the latest CVS tarball and it still does not
> work. I have no idea why yours works and mine doesn't.
>
> On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
> >
> > Just kidding, it didn't work great. It only "fixed" it because I was
> > printing out the output of count() and so it was just executing 100 times
> > slower and thus eating RAM 100 times slower :(
> >
> > It doesn't seem like there is a good way for me to fix this.
> >
> > On 10/17/07, Jimmy Musselwhite < jimmy.musselwhite at gmail.com> wrote:
> > >
> > > Thanks guys! That worked great.
> > >
> > > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote:
> > > >
> > > > Jimmy Musselwhite wrote:
> > > > > Now the code I want to do is
> > > > > record.seq.count(search)
> > > > >
> > > > > but what I am forced to do is
> > > > > record.seq.tostring().count(search)
> > > > >
> > > > > The problem here is that when I am forced to use .tostring() on
> > > > every single
> > > > > seq object it devastates my memory usage in a BIG way. It eats up
> > > > about
> > > > > 1.2gigs and then crashes. If I remove the .tostring() and just
> > > > tell if to
> > > > > search for 'A', it will run fine and use memory at about 1/100th
> > > > the rate
> > > >
> > > > In the short term, try record.seq.data.count (search) which is what
> > > > the
> > > > tostring() method is doing anyway (the Seq object stores the
> > > > sequence
> > > > internally as a string).  Does that help?
> > > >
> > > > We might be tweaking the Seq object after the next release to act a
> > > > bit
> > > > more like a string - at which point the .data property might go
> > > > away.
> > > >
> > > > > So my question sums down to, is there any way to make .count() be
> > > > able to
> > > > > search for strings and not just characters?
> > > >
> > > > You I'd never noticed that - I would call it a bug...
> > > >
> > > > >>> from Bio.Seq import Seq
> > > > >>> my_seq = Seq("AAACACACGGTTTT")
> > > > >>> my_seq.data.count("GG")
> > > > 1
> > > > >>> my_seq.data.count("G")
> > > > 2
> > > > >>> my_seq.tostring().count("G")
> > > > 2
> > > > >>> my_seq.tostring().count("GG")
> > > > 1
> > > > >>> my_seq.count("G")
> > > > 2
> > > > >>> my_seq.count("GG")
> > > > 0
> > > >
> > > > Peter
> > > >
> > > >
> > >
> >
>



More information about the Biopython mailing list