[BioPython] Question about Seq.count()

Jimmy Musselwhite jimmy.musselwhite at gmail.com
Wed Oct 17 22:48:09 UTC 2007


Thanks guys! That worked great.

On 10/17/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Jimmy Musselwhite wrote:
> > Now the code I want to do is
> > record.seq.count(search)
> >
> > but what I am forced to do is
> > record.seq.tostring().count(search)
> >
> > The problem here is that when I am forced to use .tostring() on every
> single
> > seq object it devastates my memory usage in a BIG way. It eats up about
> > 1.2gigs and then crashes. If I remove the .tostring() and just tell if
> to
> > search for 'A', it will run fine and use memory at about 1/100th the
> rate
>
> In the short term, try record.seq.data.count(search) which is what the
> tostring() method is doing anyway (the Seq object stores the sequence
> internally as a string).  Does that help?
>
> We might be tweaking the Seq object after the next release to act a bit
> more like a string - at which point the .data property might go away.
>
> > So my question sums down to, is there any way to make .count() be able
> to
> > search for strings and not just characters?
>
> You I'd never noticed that - I would call it a bug...
>
> >>> from Bio.Seq import Seq
> >>> my_seq = Seq("AAACACACGGTTTT")
> >>> my_seq.data.count("GG")
> 1
> >>> my_seq.data.count("G")
> 2
> >>> my_seq.tostring().count("G")
> 2
> >>> my_seq.tostring().count("GG")
> 1
> >>> my_seq.count("G")
> 2
> >>> my_seq.count("GG")
> 0
>
> Peter
>
>



More information about the Biopython mailing list