[BioPython] biopython integration with make-like tools (e.g. waf, paver)

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Mon Nov 17 11:26:21 EST 2008


On Mon, Nov 17, 2008 at 12:53 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Giovanni Marco Dall'Olio wrote:
>> Hi,
>> a general question.
>>
>> Are you used to organize your python/biopython scripts in pipelines or
>> workflows?
>> For example, many people use automatic build tools like 'make' to
>> organize their scientific scripts.
>> Let's say you want to study the structure of a protein from pdb. I
>> would create a script to download it from pdb.org, one to parse its
>> format, and others to do the analysis; then, I would write a Makefile
>> to put everything together.
>
> Personally in this situation I tend to just write a wrapper python
> script (or sometimes a shell script or batch file) to call the sub
> scripts.  i.e. the KISS principle.

wrapper scripts often are not the very optimal solution.
- Over time, they tend to be become very complex and full of commented
statements.
When you complete a part of your experiment (e.g. you download your
input sequences from ncbi) you will likely to comment out the
statement that you used to download it.
If you then discover that the sequences you have downloaded were
wrong, you have to decomment-out the same statement, but here you can
make some errors
It is very difficult to remember which statements you commented out
because they were wrong and when, and the wrapper script become messy
very quickly, while it will take always much time to you to maintain.
I have used wrapper scripts for a year during my master project and I
think that's not really KISS. It seems very difficult to reproduce an
analysis done without a pipeline.
- make can have a nasty syntax, but it is a standard. If you type
'make help' you get help, and if you type 'make all' usually you will
carry out the whole analysis, without having to worry on which scripts
are be run in particular.
- there are other build system than make, some of them are written in
python and/or for python.
That means you won't have to necessarly learn a new programming
syntax. Have a look at rake, all the examples I've seen are very
clean. I'll let you know when I will have learnt waf or paver.
- makefiles like tools usually already support multi-threading. If I
want to run a program on a cluster, the easiest thing for me is to
write a makefile, and it works already.
- makefile allows you to re-execute parts of your analysis easily when
your input files or your scripts changes.
This is very useful, I don't want to write a wrapper script that
checks if a file has been modified since the last time I have used it
to calculate some results - because make tools already do that.

>
> I really don't think Makefiles are a sensible solution to this problem
> - although it is possible.  A Makefile lets you deal with simple
> dependencies (e.g. building an index file, or running a BLAST search
> and saving it to disk) but I prefer to just deal with this within my
> python scripts (e.g. if the index is missing, build it; if the BLAST
> output is missing, call BLAST).

Wouldn't you prefer something like:
- if the blast output doesn't exist, OR it exists but it is older than
the script used to launch it, or older than the input sequence, then
run it again?
that's the kind of things that makefile tools can do for you already,
without having to write complicated python conditions.


> Why do you think you need a Makefile? Are you intending to provide the
> workflow to other people?  Using a complicated Makefile means the
> project is harder for a new developer to understand (they need to
> learn a whole new programming language/tool).

The best thing would be to learn how to write workflows, like the ones
from taverna and similar.
But it takes time, and I think it is better if you know the two things.
As I was saying before, make has the worst syntax, but maybe there are
other building tools which are better.

> This may also hinder
> cross platform deployment (the average Windows machine won't have make
> installed).
>
> Peter
>



-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


More information about the BioPython mailing list