[Biopython-dev] run_tests.py rewrite

Wed Feb 4 05:22:05 EST 2009

>>  > Moreover, why the typical user should be running
>>  > biopython's tests?
>>
>>
>> To make sure that it works. Biopython interacts with and therefore
>> depends more on 3rd party software, web servers, and file formats
>> than most other Python modules. Things are more likely to break
>> than for example for a more self-contained library such as NumPy.
>> I always run the Biopython tests, and I would advise every user to
>> do so too. In addition, the tests can function as example scripts
>> showing how to use Biopython. It is important that all users can run those scripts.
>
> I think that all the tests which check if biopython can run correctly
> on a computer should be separated from all the others.
> Why do I have to test whether biopython correctly translate the
> sequence ACTAGCT to a protein code when I install biopython? It should
> have been already checked by the developers/volonteers. If I want to
> install biopython on my computer, I want to run only the tests needed
> to make it sure it can work fine on my configuration, not all of them.

As an end user, I would still prefer to know that even simple things
like translation have been checked as working on my machine.  With a
very simple example like this is it unlikely to break on some setups
and not others, but for many test cases it is very hard to make this
judgement call.  The only real way to "to make it sure it can work
fine on my configuration" is to just test everything - and it doesn't
take that long anyway.

> As an example, take pytable, a library to handle HDF5 files with python.
> The authors claim that they have written more than 10^6 tests for it.
> However, when you install pytables from source, you don't have to run
> all of these tests: but only a subset of them, the ones required to
> check if it can run correctly on your computer. Consider that some of
> the tests on pytables take hours or days to complete, because they
> check the handling of big binary files.

OK, this is a little different - simply because of the time taken.  If
the full test suite takes hours or more, then I can see why the
pytables people only distribute a subset of the tests.

> The idea is that, if we separate the tests on the code from the ones
> on the configuration, we will be able to enhance the test section of
> biopython a lot.
> For example, at the moment there are not many tests to check
> biopython's behaviour with big sequence files (e.g. 1 GB). It would be
> useful to have such tests, because now it is becoming common to handle
> big files in bioinformatics, and it would be possible to do some
> profiling on that.

If you want developers to download 1 GB files as part of building and
testing Biopython, it will be a hurdle/barrier to development.  Even
for existing developers, it would make setting up a new machine that
much more complicated.  Other than looking at performance
speed/memory, we can check most features of large multi-record files
with much smaller examples.

Peter