[Biopython-dev] Bio.Motif Suggestions

Bartek Wilczynski bartek at rezolwenta.eu.org
Mon Apr 20 11:04:44 EDT 2009


On Mon, Apr 20, 2009 at 4:35 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Mon, Apr 20, 2009 at 2:55 PM, Dave Bridges <dave.bridges at gmail.com> wrote:
>>
>>> > Is there an alphabet that accepts spaces which might be necessary for
>>> > correct alignment of a motif, and if so will that work with the rest of
>>> > motif.py?
>>>
>>
>> That's a tougher one. It wasn't really needed so far (DNA motifs
>> rarely have spaces), but I guess that for protein motifs it's a very
>> important thing.
>> I have some code for doing that, but I will need to find it. I'll
>> write you later about it.
>>
>
> What would a space in a motif mean?  Clearly something different from
> a wildcard like N or X in nucleotide or protein sequences.  Does it
> mean a gap of variable length?  If it means a gap of one character
> then surely just using a "-" would be sensible (as used in multiple
> sequence alignments), for which we have a gapped alphabet system
> setup.
>

I think that once we start talking about gapped motifs, we are really
talking about
multiple alignments on steroids. This hasn't been done so far because you don't
really need it for DNA motifs, but in case of protein motifs we need to make it
compatible with multiple alignments. I think it would be great to be
able to easily
convert multiple alignments into motifs. This would allow us to  use
the power of
BIo.AlignIO for IO and Bio.Motif for searching and comparisons.The question is
how to design API for these  functions. What about:

align= Bio.AlignIO.read(....)
motif=Bio.Motif.from_alignment(align)
...

> Note that there are some issues with the current Bio.Motif code and
> alphabets, which should be addressed.  For example, generic alphabets
> don't have a letters property giving the list of expected letters, so
> using set() on the sequences themselves might be more appropriate in
> places.


Yes, I was using Bio.Motif only for DNA motifs myself, so there was
not much consideration
given to proper handling of alphabets. I'll need to clear it up now.


cheers
 Bartek



More information about the Biopython-dev mailing list