[Biopython-dev] Rethinking Biopython's testing framework

Tue Dec 30 23:33:16 UTC 2008

Brad wrote:
>>> Agreed with the distinction between the unit tests and the "dump
>>> lots of text and compare" approach. I've written both and do think
>>> the unit testing/assertion model is more robust since you can go
>>> back and actually get some insight into what someone was thinking
>>> when they wrote an assertion.

Peter worte:
>> I have probably written more of the "dump lots of text and compare"
>> style tests.  I think these have a number of advantages:
>> (1) Easier for beginners to write a test, you can almost take any
>> example script and use that.  You don't have to learn the unit test
>> framework.
>> ...

Giovanni wrote:
> I agree with what you say, but I think that all the 'dump and compare'
> tests should be organized in various functions.
> This will make easier to use and understand them, and they will be
> compatible with the nose framework.

If we organise the "dump and compare" tests into various functions
(e.g. using the unittest framework), and turn print statements into
asserts etc, then yes they would become nose compatible.  However,
this is a lot of work, and for relatively little gain.  Also, doing so
we lose the simplicity (e.g. my points made earlier) and make it
harder for newcomers to write further tests.

Nevertheless, we could regard Michiel's plan of 24 Dec as a step
towards this, in that it simplifies writing unittest based tests (in
that they won't need an expected output file which must also be kept
in CVS/SVN).

I'm not sure what you meant by "This will make easier to use and
understand them, ...".  Switching the unit test coding style makes no
difference to the end user's point of view, they run the test suite
using "python setup.py test" (typically as part of installation from
source, or from the tests directory using "python run_tests.py") and
won't see any difference in how the tests work internally.

In terms of understanding the unit tests: If you are a beginner
wanting to look at a unit test to give a feel for how to use the code,
then frankly those of our unit tests which simple do some imports and
print some output are MUCH easier to understand.  By their nature they
are essentially example Biopython scripts.  On the other hand, those
of our unit tests using the unittest framework have all these each
object classes defined, and split up the setup/clean up into separate
methods etc.  In some senses this is "clutter" which is not helpful if
you want to regard the unit test also as a usage example.

>> (2) Debugging a failing test in IDLE is much easier - using unit tests
>> you have all that framework between you and the local scope where the
>> error happens.
>
>> (3) For many broad tests, manually setting up the expected output for
>> an assert is extremely tedious (e.g. parsing sequences and checking
>> their checksums).
>
> This is an interesting discussion if you want to talk about it a bit.

It could be, but I don't want to get side tracked (distracted) from
pressing ahead with Michiel's plan (the email of 24th Dec, or
something similar) which seems to be a worthwhile small improvement to
the current status.

> An advantage of unittest are the two setUp and tearDown methods (fixtures).
> With those, you are sure that all the tests are run with the right
> environment and that all variables are dropped before executing a new
> test.

For some tests, yes, this is useful - in particular where there are
lots of independent small things you want to test.  In other
situations you want to test a work flow, with a series of cumulative
steps each building on each other.  This would end up as a single
large test function/method.

> Also, if you want to do a lot of dump and compare tests, consider
> writing some big doctest scripts.
> It will require a bit more of work to write them, but they will be
> easier to understand, and they will also become good tutorials for the
> users.

Certainly some of the current simple "dump and compare" tests might be
converted into doctests (and we could do this within the current
Biopython framework).  However, the requirements for good
documentation and good test coverage differ - you'd want to include
tests for atypical code which you would not want to encourage as good
coding practice.  I'm quite keen for further usage of doctests - but I
see them primarily as an improvement to our documentation.

Peter wrote:
>> We could discuss a modification to run_tests.py so that if there is no
>> expected output file output/test_XXX for test_XXX.py we just run
>> test_XXX.py and check its return value (I think Michiel had previously
>> suggested something like this).

Note that Michiel's email of 24th Dec is another approach to this
topic - either would work, but his plan makes the division between the
two test types much more explicit.

Giovanni wrote:
> I think this should be done inside the test itself.
> All the tests should return only a boolean value (passed or not) and a
> description of the error.
> The tests that make use of an expected output file, they should open
> it and do the comparison by theirselves, not in run_tests.py.

Your plan would work, but it means the simplicity of this style of
unit test is lost.  Rather than doing this change (which would be a
moderate amount of tedious work), I would rather go all the way and
make them unittest based like the rest of our test suite.

>> Perhaps for more robustness, capture
>> the output and compare it to a predefined list of regular expressions
>> covering the typical outputs.  For example, looking at
>> output/test_Cluster, the first line is the test name, but rest follows
>> the patten "test_... ok". I imaging only a few output styles exist.
>> With such a change, half the unit test's (e.g. test_Cluster.py)
>> wouldn't need their output file in CVS (output/test_Cluster).
>
> mmm have you changed this file in the cvs recently? I can't find what
> you are referring to.

For this example, the unit test Tests/test_Cluster.py is here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/test_Cluster.py?cvsroot=biopython

Its expected output file Test/output/test_Cluster is here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/output/test_Cluster?cvsroot=biopython

Peter