[Bioperl-l] Splitting Bioperl and Test related Suggestions

Thu Jul 5 08:09:05 UTC 2007

Chris Fields wrote:
> I think what's partially responsible for slowing down releases is the  
> expectation that each dev release is supposed to have all bugs fixed,  
> work for every OS, etc.  In other words, act like a stable release.
>
> A developer release by nature is living on the edge, so why not have  
> regular dev releases?  We keep telling users to update to using  
> bioperl-live whenever something breaks, anyway.  We could decide to  
> split stuff off along the way into more 'stable' sections if there  
> were more demand for it, and have the more API-volatile code  
> (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the  
> 'dev' tag until we feel it's ready for prime time.
>
> chris
>
> On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote:
>
>   
-- snip --

I agree, although would the dev releases still need to pass all the 
tests? I'm thinking of people installing via CPAN.

I also agree with what was said in a previous post about bringing back 
bioperl-run (and some others) back into the same repository as 
bioperl-core (after a successful move over to svn) and have Build.PL 
deal with creating the packages etc for CPAN. This would hopefully help 
keep the run package (and others) up to speed with the core package.

I also agree with previous posts about organising and/or having some 
naming convention for test data files. I think an approach whereby data 
files were organised into directory trees (1 - 3 deep) with names that 
elude to the type of data in that subtree/file rather than the tests 
that use it etc. For example:

t/data
    |__ formats
    |           |__ seq
    |           |        |__ legal_fasta
    |           |        |              |__ extension.fas
    |           |        |              |__ extension.fasta
    |           |        |              |__ extension.foo
    |           |        |              |__ extension.bar
    |           |        |              |__ no_extension
    |           |        |              |__ interleaved.fas
    |           |        |              |__ non_interleaved.fas
    |           |        |              |__ single_seq.fas
    |           |        |              |__ multiple_seq.fas
    |           |        |              |__ desc_line1.fas
    |           |        |              |__ desc_line2.fas
    |           |        |
    |           |        |__ illegal_fasta
    |           |        |              |__ illegal_chars.fas
    |           |        |              |__ 
some_other_illegal_alternative.fas
    |           |        |
    |           |        |__ legal_genbank
    |           |        |              |__ etc etc
    |           |        |
    |           |        |__ illegal_genank
    |           |                      |__ etc etc
    |           |
    |           |__ aln
    |           |__ blast
    |           |        |__ legal_blastx
    |           |        |
    |           |        |__ legal_blastp
    |           |        |
    |           |        |__ legal_tblastx
    |           |        |
    |           |        |__ legal_plastpsi
    |           |        |
    |           |        |__ legal_wublast
    |           |__ foo
    |           |__ bar
    |           |__ misc
    |
    |__ etc

This type of setup, might lend itself to having a test script simply try 
to parse all the files in a directory to ensure nothing fails (for legal 
file formats) and fails for illegal formats. Naming of the file paths 
would help test authors to identify a suitable data file for their own 
tests before adding their own to the t/data dir. It might also help to 
identify areas where example test data is currently lacking.

Thinking about this a little more, I think it would be a good idea to 
include Test::Exception in t/lib. We should also be testing that 
warnings and exceptions are generated when expected - e.g. illegal 
characters in seq files etc etc. Without these sorts of tests we are 
only getting half the story. This testing might account for a large 
chunk of the poor test coverage, particularly when it comes to branches 
in the code.

Anyway, this type of reorganisation couldn't take place until the svn 
repo is up and working.

I'd appreciate any comments on the above!
Nath