[Bioperl-l] Splitting Bioperl and Test related Suggestions

Chris Fields cjfields at uiuc.edu
Thu Jul 5 09:52:12 EDT 2007


On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote:

> ...
> I agree, although would the dev releases still need to pass all the  
> tests? I'm thinking of people installing via CPAN.

Remains to be decided.  All current tests (net and non-non) should  
pass.  Any bug fixes should try to have added tests if possible, with  
in-process stuff as TODO's.  Network tests are left up to user  
discretion, so if they fail for any particular reason there is a way  
around them.

> I also agree with what was said in a previous post about bringing  
> back bioperl-run (and some others) back into the same repository as  
> bioperl-core (after a successful move over to svn) and have  
> Build.PL deal with creating the packages etc for CPAN. This would  
> hopefully help keep the run package (and others) up to speed with  
> the core package.

It's up to how we want to have everything split.  I don't think it's  
immediately prescient (there are more important priorities, i.e.  
bugs, svn) but I would say folding everything back into live and  
'splitting' them out using an automated Build process is a viable  
option.

> I also agree with previous posts about organising and/or having  
> some naming convention for test data files. I think an approach  
> whereby data files were organised into directory trees (1 - 3 deep)  
> with names that elude to the type of data in that subtree/file  
> rather than the tests that use it etc. For example:
>
> t/data
>    |__ formats
>    |           |__ seq
>    |           |        |__ legal_fasta
>    |           |        |              |__ extension.fas
>    |           |        |              |__ extension.fasta
>    |           |        |              |__ extension.foo
>    |           |        |              |__ extension.bar
>    |           |        |              |__ no_extension
>    |           |        |              |__ interleaved.fas
>    |           |        |              |__ non_interleaved.fas
>    |           |        |              |__ single_seq.fas
>    |           |        |              |__ multiple_seq.fas
>    |           |        |              |__ desc_line1.fas
>    |           |        |              |__ desc_line2.fas
>    |           |        |
>    |           |        |__ illegal_fasta
>    |           |        |              |__ illegal_chars.fas
>    |           |        |              |__  
> some_other_illegal_alternative.fas
>    |           |        |
>    |           |        |__ legal_genbank
>    |           |        |              |__ etc etc
>    |           |        |
>    |           |        |__ illegal_genank
>    |           |                      |__ etc etc
>    |           |
>    |           |__ aln
>    |           |__ blast
>    |           |        |__ legal_blastx
>    |           |        |
>    |           |        |__ legal_blastp
>    |           |        |
>    |           |        |__ legal_tblastx
>    |           |        |
>    |           |        |__ legal_plastpsi
>    |           |        |
>    |           |        |__ legal_wublast
>    |           |__ foo
>    |           |__ bar
>    |           |__ misc
>    |
>    |__ etc
>
> This type of setup, might lend itself to having a test script  
> simply try to parse all the files in a directory to ensure nothing  
> fails (for legal file formats) and fails for illegal formats.  
> Naming of the file paths would help test authors to identify a  
> suitable data file for their own tests before adding their own to  
> the t/data dir. It might also help to identify areas where example  
> test data is currently lacking.

...
This seems like more of a 'guess sequence' and format validation  
issue, something we've talked about before:

http://bugzilla.open-bio.org/show_bug.cgi?id=1508

The way I feel about it is sequence format validation and sequence  
parsing should be separate issues and therefore in separate classes  
(with parsing optionally preceded by validation), but that's  
something for another discussion.

> Thinking about this a little more, I think it would be a good idea  
> to include Test::Exception in t/lib. We should also be testing that  
> warnings and exceptions are generated when expected - e.g. illegal  
> characters in seq files etc etc. Without these sorts of tests we  
> are only getting half the story. This testing might account for a  
> large chunk of the poor test coverage, particularly when it comes  
> to branches in the code.
>
> Anyway, this type of reorganisation couldn't take place until the  
> svn repo is up and working.
>
> I'd appreciate any comments on the above!
> Nath

chris



More information about the Bioperl-l mailing list