[BioPython] [Biopython-dev] Refactoring motif analysis code

Bartek Wilczynski bartek at rezolwenta.eu.org
Mon Dec 1 20:53:59 UTC 2008


Hi all,

I've done some work regarding the motif analysis in Biopython. I've
done the following stuff:
- refactored the Bio.AlignAce and Bio.MEME to use one common motif object
- Put all of the refactored code in the Bio.Motif directory
- Added more code (from my attic) to do motif comparisons and
computing thresholds
  (this was actually written by my colleague Norbert Dojer, but I
adapted it and I have his permission to contribute the code)
- written a short tutorial on the usage of Bio.Motif (that's where I'd put it).
- Written a basic test suite for the new motif.

I haven't added it to cvs yet, but posted it as an attchment to the
enhancement proposal in bugzilla:
http://bugzilla.open-bio.org/show_bug.cgi?id=2694

I have cvs access, so I can commit the changes myself, but I'd like to
wait for an "OK" from someone more involved in the release process.

Since Giovanni and Bruce have responded to my previous call for comments,
I'll  try to answer them below:

On Mon, Nov 24, 2008 at 4:54 PM, Bruce Southey <bsouthey at gmail.com> wrote:

>
> Actually I am not that thrilled with the licenses for these packages and
> similar packages because these are free only for academic use. To me this
> clashes with the spirit of an open-sourced project especially a BSD-licensed
> one. But if there is a need for such modules then these modules should be
> included.
>

I have similar feelings about the "academic-use-only" licenses. On the
other hand,
since most of the biopython users are in academia, then I don't see it
as a big problem.
Also, since I don't have any truly open and free replacement for these
programs, I think
it's better to keep them. In fact the new Bio.Motif package provides
some methods for motif
comparisons, which at least to some extent can be used as a
replacement for the respective
functions of CompareACE and MAST.

As a side note, I think that there is no point in providing parsers
for every single motif finder that
comes out, and I don't think that AlignAce and MEME are the best or
the most representative ones.
It just happened that these parsers were written "to scratch someone's
itch". I think that the other
functionality (motif searching, comparisons,weblogo) might be more
useful to people.

> While it is only free for academic use, have you seen TAMO?
> *TAMO: a flexible, object-oriented framework for analyzing transcriptional
> regulation using DNA-sequence motifs. *
> Bioinformatics. 2005 Jul 15;21(14):3164-5.
> <http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/14/3164>
>
> http://fraenkel.mit.edu/TAMO/

Yes, I've seen it and I've even recommended it on the biopython
mailing list when there was no
 replacement in biopython. However, their library is free only for
academia and AFAIK it's not using
biopython datastructures, so needs some work to integrate with TAMO if
you are using Biopython.
Bio.Motif is meant to provide free software for Motif analysis.

> Well, I am not sure how many used Bio.AlignAce given the Parser.py bug :-)
> Based on the CVS, both have been untouched for about three years.
>
Well, I've not used it myself for a while... I'm no longer doing
de-novo motif discovery.
However, it still works so it's potentially useful. I think this is
largely due to the lack of documentation
for the Bio.AlignAce and Bio.MEME tools (partially my fault).
Hopefully people will start using this
if they read the tutorial.

> Also, what species are these used for?
> One of the papers of AlignAce indicate that the base composition was set for
> yeast.
>
They're both general purpose, you can set the gc content for alignAce
and even an HMM for MEME.

>
> Personally I would be interested in a general protein motif finding module
> because of my current research. However, I do have a different view with
> respect to the Biopython community as indicated above with the licenses.

Both MEME and AlignAce can be used to find motifs in proteins, but it
has not so much to do
with Bio.Motif, since it does not provide any motif-finnding
capabilities by itself. In general Bio.Motif
should be able to deal with protein motifs, but I've never tested it
(I'm mostly using it for DNA motifs),
 so I'll be happy to help if you find bugs.

On Mon, Nov 24, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
>
> I would just like to tell you that I have tried the TAMO framework you
> suggested me, and found it very useful.

Yes, I remember, but the problem is with the TAMO license. I think
that the Motif object might be still
useful since it is free, allows to read motifs from databases like
JASPAR to scan sequences  and/or
compare them with "your" motifs.


> I am not using it anymore because I don't need it, but I remember that I liked:
> - the methods to represent motifs as matrixes of frequencies/occurrencies etc..
done
> - the fact that it was easy to create a motif from an alignment of sequences
depending on your definition of easy, it's there
> - the integration it had with this website:
> http://weblogo.berkeley.edu/logo.cgi.
done

> I would suggest you to provide integration with this other web
> service, which enable to plot the difference between two sequence
> logos: http://www.twosamplelogo.org/examples.html.

This I haven't done yet, but I'll try to provide functionality for
that (shouldn't take too long).

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433



More information about the Biopython mailing list