[BioPython] The count method of a Seq (or MutableSeq) object

Bartek Wilczynski bartek at rezolwenta.eu.org
Thu Mar 5 08:28:14 EST 2009


On Thu, Mar 5, 2009 at 1:26 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

> (a) stick with the python string compatible behaviour (which has been
> a general principle for the Seq class), but document this issue more
> clearly as a non-overlapping search does run counter to some potential
> biological uses.
>
> or,
>
> (b) Or change the behaviour as Leighton suggests to do an overlapping
> search.  This could break any code relying on the old python
> string-like behaviour.
>
> What do people here think?  Any preferences?
>
> [I don't want to get into details about the implementation here on the
> main list]
>

I don't use the count method much, so I don't have a strong opinion on that.

As Leighton pointed out, searching for sequences looks like  a good
job for Bio.Motif

It's currently doable, but (since Bio.Motif mostly deals with more
complex motifs than a single sequence)
the interface is not polished and it's not optimized for performance.

Currently the code to do this would look like this:

m=Bio.Motif.Motif()
m.add_instance(Seq("GG",m.alphabet))
for i in m.search_instances(your_long sequence):
    print "found GG at position",i

If there is a need to keep backwards compatibility for .count(), I can
make changes to Bio.Motif to make it easier for people to use it.

-- 
Bartek



More information about the BioPython mailing list