[Biopython-dev] "Online" tests, was [Bug 1972]

Peter (BioPython Dev) biopython-dev at maubp.freeserve.co.uk
Fri Mar 31 20:41:18 UTC 2006


Bill Barnard wrote:
> I've made a first cut unit test, tentatively named
> test_Parsers_for_newest_formats, which retrieves and parses some small
> records for Prosite, Prodoc, SwissProt, and Medline records. I tried
> these types first, based on a quick search of the code tree to see where
> there was existing code that makes use of Bio.WWW.

Sounds good to me.  But not a very snappy name - how about something 
shorter like test_OnlineFormats.py instead?

> Testing Blast in the same way doesn't seem sensible to me, and it looks
> as though any effort there should be in the XML Parser area, rather than
> in the thankless task of parsing HTML. (I suspect that's what you've
> already decided.)

It was decided fairly recently to prioritise XML output for Blast.  The
plain text output is fairly stable, but my impression is that the HTML
was/is a moving target and a thankless job.

I think the Blast test should actually submit a short protein/nucleotide 
sequence known to be in the online database.  Maybe do some basic sanity 
testing like check it returns at least N results and the best hit is at 
least a certain score.

>>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded 
>>there are multiple parsers to be checked (e.g. record and feature parsers).
> 
> I'll take a look at more parsers, as I figure out where they are. I will
> take the same approach of looking through the code tree for existing
> parsers using find/grep. It looks as though there are a fair number
> which may be obsolete. I would appreciate any guidance in figuring out
> which ones would be most useful to check.
> 
> (Is this exercise useful? I was just learning my way around the code
> using the on-line course at the Pasteur Institute, and found a minor bug
> which I fixed. Since any bug should really be covered by a test as well
> as being fixed, I wanted to now add the test. I like cleaning up
> problems as I find them, but I may not be doing anything that's of more
> than minor utility for Biopython...)

Like yourself, I'm only familiar with a fraction of the BioPython code.

I'll volunteer to add cases for GenBank, Fasta and GEO files.

>>We should probably produce a streamlined test output file WITHOUT 
>>details which are likely to change in later versions of the test file 
>>e.g. revisions to genbank files.
> 
> Since the test only verifies the record can be retrieved, parsed, and is
> the actual record requested it emits very little output. My last run
> emitted:
> 
> ...
 > WARNING - Ignoring line: DT   20-DEC-2005, integrated into 
UniProtKB/Swiss-Prot.
> ...

Those WARNING lines are my doing, see bug 1946

http://bugzilla.open-bio.org/show_bug.cgi?id=1946

> Is this what you mean by "streamlined test output"?

Pretty much.

>>One question is should the test "cache" any downloaded files (say for a 
>>day) which would be helpful for anyone trying to debug a particular 
>>issue and re-running the online tests?  Or is this just making life too 
>>complicated.
> 
> This could be done, but I doubt I would do it unless it really seemed
> useful...

OK :)

Peter




More information about the Biopython-dev mailing list