[Biopython-dev] run_tests.py rewrite

Wed Feb 4 06:37:56 EST 2009

On Wed, Feb 4, 2009 at 11:22 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>>  > Moreover, why the typical user should be running
>>>  > biopython's tests?
>>>
>>>
>>> To make sure that it works. Biopython interacts with and therefore
>>> depends more on 3rd party software, web servers, and file formats
>>> than most other Python modules. Things are more likely to break
>>> than for example for a more self-contained library such as NumPy.
>>> I always run the Biopython tests, and I would advise every user to
>>> do so too. In addition, the tests can function as example scripts
>>> showing how to use Biopython. It is important that all users can run those scripts.
>>
>> I think that all the tests which check if biopython can run correctly
>> on a computer should be separated from all the others.
>> Why do I have to test whether biopython correctly translate the
>> sequence ACTAGCT to a protein code when I install biopython? It should
>> have been already checked by the developers/volonteers. If I want to
>> install biopython on my computer, I want to run only the tests needed
>> to make it sure it can work fine on my configuration, not all of them.
>
> As an end user, I would still prefer to know that even simple things
> like translation have been checked as working on my machine.  With a
> very simple example like this is it unlikely to break on some setups
> and not others, but for many test cases it is very hard to make this
> judgement call.  The only real way to "to make it sure it can work
> fine on my configuration" is to just test everything - and it doesn't
> take that long anyway.

It doesn't take long, but the developers are forced to write tests
which don't take long.
However, this doesn't mean that big tests are not necessary.
Many libraries I have installed have two separated commands, 'setup.py
test' and 'setup.py test_all'.

>> As an example, take pytable, a library to handle HDF5 files with python.
>> The authors claim that they have written more than 10^6 tests for it.
>> However, when you install pytables from source, you don't have to run
>> all of these tests: but only a subset of them, the ones required to
>> check if it can run correctly on your computer. Consider that some of
>> the tests on pytables take hours or days to complete, because they
>> check the handling of big binary files.
>
> OK, this is a little different - simply because of the time taken.  If
> the full test suite takes hours or more, then I can see why the
> pytables people only distribute a subset of the tests.
>
>> The idea is that, if we separate the tests on the code from the ones
>> on the configuration, we will be able to enhance the test section of
>> biopython a lot.
>> For example, at the moment there are not many tests to check
>> biopython's behaviour with big sequence files (e.g. 1 GB). It would be
>> useful to have such tests, because now it is becoming common to handle
>> big files in bioinformatics, and it would be possible to do some
>> profiling on that.
>
> If you want developers to download 1 GB files as part of building and
> testing Biopython, it will be a hurdle/barrier to development.  Even
> for existing developers, it would make setting up a new machine that
> much more complicated.  Other than looking at performance
> speed/memory, we can check most features of large multi-record files
> with much smaller examples.

well it is not necessary to put an 1 GB file in the repo.. we could
generate it with the random or hmm module, using always the same seed
:). It would be a 'package' global fixture.

>
> Peter
>

-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it