[Biopython-dev] "Online" tests, was [Bug 1972]
Peter (BioPython Dev)
biopython-dev at maubp.freeserve.co.uk
Fri Mar 31 20:41:18 UTC 2006
Bill Barnard wrote:
> I've made a first cut unit test, tentatively named
> test_Parsers_for_newest_formats, which retrieves and parses some small
> records for Prosite, Prodoc, SwissProt, and Medline records. I tried
> these types first, based on a quick search of the code tree to see where
> there was existing code that makes use of Bio.WWW.
Sounds good to me. But not a very snappy name - how about something
shorter like test_OnlineFormats.py instead?
> Testing Blast in the same way doesn't seem sensible to me, and it looks
> as though any effort there should be in the XML Parser area, rather than
> in the thankless task of parsing HTML. (I suspect that's what you've
> already decided.)
It was decided fairly recently to prioritise XML output for Blast. The
plain text output is fairly stable, but my impression is that the HTML
was/is a moving target and a thankless job.
I think the Blast test should actually submit a short protein/nucleotide
sequence known to be in the online database. Maybe do some basic sanity
testing like check it returns at least N results and the best hit is at
least a certain score.
>>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded
>>there are multiple parsers to be checked (e.g. record and feature parsers).
>
> I'll take a look at more parsers, as I figure out where they are. I will
> take the same approach of looking through the code tree for existing
> parsers using find/grep. It looks as though there are a fair number
> which may be obsolete. I would appreciate any guidance in figuring out
> which ones would be most useful to check.
>
> (Is this exercise useful? I was just learning my way around the code
> using the on-line course at the Pasteur Institute, and found a minor bug
> which I fixed. Since any bug should really be covered by a test as well
> as being fixed, I wanted to now add the test. I like cleaning up
> problems as I find them, but I may not be doing anything that's of more
> than minor utility for Biopython...)
Like yourself, I'm only familiar with a fraction of the BioPython code.
I'll volunteer to add cases for GenBank, Fasta and GEO files.
>>We should probably produce a streamlined test output file WITHOUT
>>details which are likely to change in later versions of the test file
>>e.g. revisions to genbank files.
>
> Since the test only verifies the record can be retrieved, parsed, and is
> the actual record requested it emits very little output. My last run
> emitted:
>
> ...
> WARNING - Ignoring line: DT 20-DEC-2005, integrated into
UniProtKB/Swiss-Prot.
> ...
Those WARNING lines are my doing, see bug 1946
http://bugzilla.open-bio.org/show_bug.cgi?id=1946
> Is this what you mean by "streamlined test output"?
Pretty much.
>>One question is should the test "cache" any downloaded files (say for a
>>day) which would be helpful for anyone trying to debug a particular
>>issue and re-running the online tests? Or is this just making life too
>>complicated.
>
> This could be done, but I doubt I would do it unless it really seemed
> useful...
OK :)
Peter
More information about the Biopython-dev
mailing list