[Biopython] New project for Google Summer of Code 2011

Fri Apr 8 12:51:25 UTC 2011

Hey Mikael,

Regarding the tests, there are two kind of tests you should be concerned
about:

1. The first is obviously what you included, checking if the code performs
scientifically well. This can be done by checking against previously known
results, from those tool you mentioned. Seems great. But this won't make it
to the final distribution as these tests would be too cumbersome..

2. The second "kind" of testing, is including some unit testing using some
examples to check if the code A. runs and B. performs as it should in these
restrictive tests. You can have a look here at some hints Eric and Diana
gave me last year:

Unit testing is a software engineering technique of writing code that
> tests small portions of the regular code. They are written in separate
> classes and usually test a single function with various input
> parameters. You can checkout BioPython repository and see how they
> look (probably in a directory called test, but I am not that familiar
> with BioPyhton code base). http://en.wikipedia.org/wiki/Unit_testing
>
> It reminded me about software engineering technique called
> refactoring. You don't have to read it now, but this is very good
> source on it http://sourcemaking.com/refactoring
>

Yep, Diana covered it, but here are a few links for future reference:
>
> http://docs.python.org/library/unittest.html
> http://docs.python.org/library/doctest.html
> http://github.com/biopython/biopython/blob/master/Tests/test_PDB.py
>

My advice is that you should include specifically that you will devote time
to 1. test the code for scientific correctness and 2. to add unit tests to
Biopython to make sure it becomes easy to include in the main release and to
distribute. There is no need to detail exactly what you are going to do in
each test (comparing to this or that tool). On the other hand, I believe
compiling a benchmark might be a bit too much for each small feature. Again,
my advice, and this is my personal opinion, is that you should keep a pool
of 4 or 5 proteins that you know the results beforehand and you test them as
you go. At each big "step", those testing periods, you should run your newly
developed functions on this proteins and make sure they come out ok, as well
as running any previous unit tests to see if the code you wrote before is
still performing top-notch. You could add to the first week a line saying
you'll develop a stable benchmark to test your functions throughout
development.

Finally, regarding the concluding remarks, I think that one week is not
enough time to optimize, test, and distribute the code between people and
receive their comments :) Specially in August! I'd focus on packaging and
making sure the whole module plays well with Biopython and then focus on
some optimization if there's need of one. Testing should be minimal since
you've been doing it as you code. Also, you need time to package your code,
review commits, etc, to prepare submission for the final evaluations.
Lastly, while a plan is a plan, I'm sure if you get chosen and you start
coding you will find very interesting things to code that are not in that
plan. Leave some time alloted for these "random ideas" that will surely show
up.

Cheers,

João