[Biopython] Bio.Motif search_pwm

Peter Cock p.j.a.cock at googlemail.com
Wed Aug 1 08:31:15 UTC 2012


On Wed, Aug 1, 2012 at 6:14 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> I was using the search_pwm method in Bio.Motif (which btw
> is very useful, thanks Bartek) to search for motif instances
> on both strands of a sequence. If the motif starts at position
> and is located on the forward strand, this function returns
> +position; if it is located on the reverse strand, it returns
> -position. So for position==0, we cannot deduce from the
> sign whether the motif is located on the forward or on the
> backward strand.

That is a problem :(

> How about using Python-style negative indices to indicate
> the strand? For example, +20 means that the motif is
> located at [20:20+motif_length] on the forward strand,
> while -20 means that the motif is located at [-20:-20+motif_length].
>
> Alternatively, we could return the strand explicitly.

Either makes sense, but would be a break - but probably
a necessary break in backwards compatibility.

> In the same function, I wish we could get rid of this line:
>
> sequence=sequence.tostring().upper()
>
> since this assumes that sequence is a Biopython Seq
> object, and not a plain string.

Allowing a plain string makes good sense. +1

> We could either use str(sequence) instead of sequence.tostring()
> to cover both cases,

That would also accept other objects accidentally, e.g. a list,
and probably lead to some obscure errors downstream.

> or have the Seq class inherit from strings (which we have
> been discussing for some time; see
> https://redmine.open-bio.org/issues/2351).

Or perhaps the Seq is already string like enough for this function
(it supports upper()) so no casting is needed? That would be
simpler - although likely not a fast.

Or, we could follow the pattern used in Bio.SeqUtils and try
the tostring() method, catching any AttributeError and then
treating it like a string (since real strings don't have this).
The advantage of this route is low risk.

Peter



More information about the Biopython mailing list