[Open-bio-l] a common repository for test datasets/use cases for all Bio* projects

Thu Dec 4 11:49:57 EST 2008

Giovanni Marco Dall'Olio wrote:
> So, the point is.. what if we create a common repository for all this
> kind of testing data, to be used in common with all the other Bio*
> projects?
> Wouldn't it be good if all the Bio* fasta parser are able to parse the
> same files and give the same results, demonstrating that all of them
> work fine or are wrong at the same time?
> 
> I am doing this because me (and Tiago), in the biopython mailing list, would
> like to develop a module to calculate Fst statistics over SNP data, and
> there is no point of collecting some good test datasets and not sharing them
> with other similar projects in other programming languages.
> 
> The same goes for much of the documentation, like use cases: if we
> collect a good base of use cases related to bioinformatics, it would
> be easier to coordinate the efforts of all the Bio* projects and
> compare the different approaches used to solve the same issue by the
> different comunities.
> 
> At the moment, I have created a simple git repository on github:
> - http://github.com/dalloliogm/bio-test-datasets-repository
> but , it is still empty and maybe github is not the ideal hosting for
> such a project, since the free account has a 100MB space limit.

The EMBOSS project on Open Bio has its own set of test cases for all
applications, and validation for source code documentation and
application documentation. Our tests run as perl scripts using scripts
and data that are distributed with EMBOSS.

We would be interested in joining a common effort.

regards,

Peter Rice