[BioPython] Congrats (and code contribution)!

Andrew Dalke dalke@bioreason.com
Sun, 19 Sep 1999 18:32:19 -0600


> Or I could help with the testing since I'm a test engineer.

Cool!  In my experience it's been the hardest getting QA-type
people to work on these sorts of projects, but they are very
necessary.

On that topic, I'm planning to contribute the regression test
driver we use in-house.  It is derived from the standard Python
regrtest code, though with the ability to fake out the module
location (I'll get to that in a moment).  Think of the rest of
this email as a proposal, and directed to the general biopython
readers.

Regression tests are very important.  As one book says, they
are the heartbeat of a project.  With a good regression suite,
run regularly, you can get this nice feeling that things are
actually working.  It becomes much more comfortable making
changes because easier to find out if you made any stupid
mistakes, or made non-backwards compatible changes.

(This is especially true when all you are doing are performance
changes which shouldn't affect anything at all.)

We break regression tests up into two types: simple ones and
complex one.  The complex ones would be ones which takes a
considerable time (say, over 5 minutes) and which you don't
want taking place very often.

The simple tests are ones which overall take about a minute
or two and can (and should) be run before committing changes
to version control.


The simple regression tests are placed in the subdirectory
called "tests", as is the driver.  When "make check" is run
in the top-level, it descends into the tests directory and
runs the regression code.

This finds all files in the local directory which start with
"test_" (or you give it a list of files), and exec's them.  The
test output is compared to a gold standard in tests/output.
A test fails if the code generates an uncaught exception or if
the output differs.

(BTW, there is also has an option to generate the standard
comparison.)

So our module layout looks something like this:

  MODULE/
    Makefile

    tests/
      Makefile
      regrtest.py -- driver program
      test_*.py -- regression tests

      output/
        test_* -- gold standard for the tests

The Makefiles, for those interested, are created by a modified
version of automake/autoconf.  I contributed the changes to the
GNU maintainers some time ago, but they still haven't integrated
them in.  I would like to use the framework for biopython, since
it automates most of the work setting up Python-specific Makefiles.
But it call for us distributing our own version of automake. :(

Anyway...  There is some trickyness in getting the module name
and location correct.  The regression code (in tests/) does an
"import MODULE" and expects to get the module located in "../".
To do this in the normal Python way calls for setting up
PYTHONPATH to include "../.." to reach the MODULE/ directory.

This doesn't work for two reasons:
  1) if this is a distributed package, the directory might be
named "MODULE-2.93".
  2) there may be neighbors to the MODULE you might not want
to test, and hence don't want ../.. on the PYTHONPATH.

To fix this, our regression test code modification has the option
> -p <name>=<directory>: treat the directory as the source for
> a package of the given name

implemented as
>        if o == '-p' or o == "--package":
>            pos = string.find(a, "=")
>            if pos == -1:
>                print "package option `-p' is missing `='"
>                return 2
>            module_name = a[:pos]
>            dir_name = a[pos+1:]
>            imp.load_module(module_name, None, dir_name,
>                            ("", "r", imp.PKG_DIRECTORY) )

This also let us run the regression tests against the installed
module using something like:

installcheck:
        $(PYTHON) br_regrtest.py --package MODULE=$(PYTHON_SITE_INSTALL)



Since I don't think this made sense to most people, I'm planning
to get a mock-up of what I think the initial code looks like, and
I'll include the regression test and Makefile data which goes along
with it.

Depends on how long I decide to be at the computer today...

						Andrew
						dalke@bioreason.com

P.S.
  I also have a code coverage utility for Python, which I've used
to make sure the regression tests cover everything.  I'll talk
about that one later.