[Biopython-dev] Rethinking Biopython's testing framework

Mon Nov 24 06:44:13 UTC 2008

Hi everybody,

Biopython's testing framework is built on top of Python's unit testing framewerk. Python's unit testing framework makes use of assertion statements to compare the result of a command to the expected result. Biopython uses test scripts that print output to stdout, together with an output file that contains the correct output. After running each test script, it compares the generated output with the correct output to see if the test was successful.

This approach can be useful for modules that deal with different file formats. For example, you can read in a file in one format, write it out in a different format, and compare it with the expected result.

However, more than half of Biopython's tests do not actually make use of this testing framework:

test_BioSQL
test_CAPS
test_Cluster
test_CodonTable
test_Compass
test_Crystal
test_DocSQL
test_EmbossPrimer
test_Entrez
test_Fasta
test_GACrossover
test_GAMutation
test_GAOrganism
test_GAQueens
test_GARepair
test_GASelection
test_GFF
test_GFF2
test_GraphicsChromosome
test_GraphicsDistribution
test_GraphicsGeneral
test_HMMCasino
test_HMMGeneral
test_HotRand
test_KDTree
test_KeyWList
test_LogisticRegression
test_Medline
test_NNExclusiveOr
test_NNGene
test_NNGeneral
test_Pathway
test_PopGen_FDist
test_PopGen_FDist_nodepend
test_PopGen_SimCoal
test_PopGen_SimCoal_nodepend
test_Registry
test_Restriction
test_SCOP_Astral
test_SCOP_Cla
test_SCOP_Des
test_SCOP_Dom
test_SCOP_Hie
test_SCOP_Raf
test_SCOP_Residues
test_SCOP_Scop
test_Wise
test_docstrings
test_kNN
test_lowess
test_psw

These tests have trivial output, for example test_Cluster:

test_Cluster
test_clusterdistance (test_Cluster.TestCluster) ... ok
test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok
test_kcluster (test_Cluster.TestCluster) ... ok
test_matrix_parse (test_Cluster.TestCluster) ... ok
test_median_mean (test_Cluster.TestCluster) ... ok
test_somcluster (test_Cluster.TestCluster) ... ok
test_treecluster (test_Cluster.TestCluster) ... ok

----------------------------------------------------------------------
Ran 7 tests in 0.015s

OK

I suspect that for many of the remaining tests Biopython's unit testing framework doesn't bring any real advantage, but is used anyway solely because it currently is the standard in Biopython.

Personally, I find Python's unit testing framework easier to understand than Biopython's testing framework. It doesn't need a separate output file, and it is easier to match each line of code with the correct behavior.

I would therefore like to suggest to move from Biopython's testing framework to Python's testing framework. This also relieves us of the task of explaining Biopython's testing framework to contributors, and allows us to make better use of what Python already provides. Comparing output line-by-line, as Biopython's testing framework currently does, can still be used by test scripts that need this functionality.

Comments, suggestions, anybody?

--Michiel.