[Biopython-dev] run_tests.py rewrite

Wed Feb 4 00:35:07 UTC 2009

On 2/3/09, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> > However, this way, test_docstring will be difficult to
>  > mantain in the future.
>  > A better solution would be to have run_test.py go through
>
> > all biopython's modules, and then execute every doctest it
>  > encounters.
>  > You can do this with doctest.DocTestFinder (have a look at
>  > nose's code, which does it already:
>
>
> Can doctest.DocTestFinder handle missing external dependencies? For example, if a user installed Biopython without NumPy, then the NumPy-dependent modules should be skipped and not flagged as errors.

mmm no idea, sorry :(

>
>  > Moreover, why the typical user should be running
>  > biopython's tests?
>
>
> To make sure that it works. Biopython interacts with and therefore depends more on 3rd party software, web servers, and file formats than most other Python modules. Things are more likely to break than for example for a more self-contained library such as NumPy. I always run the Biopython tests, and I would advise every user to do so too. In addition, the tests can function as example scripts showing how to use Biopython. It is important that all users can run those scripts.

I think that all the tests which check if biopython can run correctly
on a computer should be separated from all the others.
Why do I have to test whether biopython correctly translate the
sequence ACTAGCT to a protein code when I install biopython? It should
have been already checked by the developers/volonteers. If I want to
install biopython on my computer, I want to run only the tests needed
to make it sure it can work fine on my configuration, not all of them.

As an example, take pytable, a library to handle HDF5 files with python.
The authors claim that they have written more than 10^6 tests for it.
However, when you install pytables from source, you don't have to run
all of these tests: but only a subset of them, the ones required to
check if it can run correctly on your computer. Consider that some of
the tests on pytables take hours or days to complete, because they
check the handling of big binary files.

The idea is that, if we separate the tests on the code from the ones
on the configuration, we will be able to enhance the test section of
biopython a lot.
For example, at the moment there are not many tests to check
biopython's behaviour with big sequence files (e.g. 1 GB). It would be
useful to have such tests, because now it is becoming common to handle
big files in bioinformatics, and it would be possible to do some
profiling on that.

With that strategy, it would make sense to adopt a tool like nose
which enchance the test framework a lot.
For example, it will be very difficult to write tests on big files
without using global fixtures (which the basic unittest doesn't
support).
This means that if you want to write a test which studies the handling
of 1 GB sequence file with biopython, with the basic python testing
framework, you are forced to open the file on every test (setUp
function) while with a global fixture, you will be able to do it in a
very elegant way.
nose has a lot of many other interesting features: it supports
fixtures for doctests, it can be used to profile the execution of all
tests, and it supports many plugins.
For example, have a look at these ones:
http://darcs.idyll.org/~t/projects/pinocchio/doc/#stopwatch-selecting-tests-based-on-execution-time

>
>
>  > What about having support to global fixtures?
>  > For example, many test scripts begin in the same way: they
>  > 'import
>  > numpy', check for python's version, etc.. All of
>  > this could be moved
>  > to a global fixture and then executed only once for all the
>  > tests.
>
>
> Hmm... currently the Biopython tests can be written essentially independently of each other, without knowing much about the testing overall framework. I think that that makes it easier for new users/developers to add tests. I think we should avoid the situation that somebody first has to study Biopython's testing framework to be able to add a test.

You could write a skeleton for biopython's tests, and it will be a lot
useful (e.g. have a look at this recipe for elixir:
http://elixir.ematia.de/trac/wiki/Recipes/Testing)
>
>
>  --Michiel.
>
>
>
>

-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it