From dalloliogm at gmail.com Thu Jan 15 06:21:49 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 15 Jan 2009 12:21:49 +0100 Subject: [Open-bio-l] a common repository for test datasets/use cases for all Bio* projects In-Reply-To: <3DD9AC3A-56C9-4514-A7DB-DBA649AA2976@bioperl.org> References: <5aa3b3570810280406i52c61a4cxecc39016a432876b@mail.gmail.com> <3DD9AC3A-56C9-4514-A7DB-DBA649AA2976@bioperl.org> Message-ID: <5aa3b3570901150321q431ada7cm639cb351a51a563c@mail.gmail.com> On Thu, Dec 4, 2008 at 6:06 PM, Jason Stajich wrote: > I don't know if this is really the best email list for this -- although not > sure what other common list should be used. > > We actually a started a project like this many moons ago, but no one > contributed examples... > > http://code.open-bio.org/cgi/viewcvs.cgi/biodata/ For the moment I am putting some use cases in this repository: - http://github.com/dalloliogm/bio-test-datasets-repository/tree/master What I am doing, basically, it is just to collect messages from the biopython mailing list (hope I am not doing anything illegal) and problems encountered in our lab work, and put them there. If you give me access to the biodata's cvs or wiki I can put them there (even if I would prefer a git repository). I don't have much time to do more than this now... but over the time I can improve many things. Well.. let me just say this stupid thing now or later it will be too late :) I don't like the name 'biodata'... what about something like 'biotests' or 'biodatasets', or 'bio-test-datasets'? > > We can start a common SVN repository for this if you like or a github on OBF > if that is more likely to garner contributions. > In terms of documentation - you are certainly welcome to make a > documentation repository but I would argue a wiki or wiki-like soln would be > best for documentation. > Whether a common wiki can be maintained among the projects (or merge the > wikifarms someday) is something to contemplate too. > > -jason > > On Oct 28, 2008, at 4:06 AM, Giovanni Marco Dall'Olio wrote: > >> Hi! >> My name is Giovanni, I come from biopython's mailing list. >> >> I would like to make you a proposal. >> Every module/program written in bioinformatics needs to be tested >> before it can be used to produce results that can be published. >> >> For example, let's say I want to write another fasta file parser, like >> SeqIO.FastaIO in biopython : I would have have to test the script >> against some real fasta files, just to make sure that it doesn't parse >> them in a wrong way, or that it losts data. >> Or, let's say I want to write a script to calculate Fst statistics >> over some population genetics data: I will have to compare the results >> of my scripts against other programs, check if it gives me the right >> result for a set for which I already know the Fst value, and maybe >> ideate some other kind of checks to be sure my script doesn't do weird >> things, like losing input data on the way. >> >> So, the point is.. what if we create a common repository for all this >> kind of testing data, to be used in common with all the other Bio* >> projects? >> Wouldn't it be good if all the Bio* fasta parser are able to parse the >> same files and give the same results, demonstrating that all of them >> work fine or are wrong at the same time? >> >> I am doing this because me (and Tiago), in the biopython mailing list, >> would >> like to develop a module to calculate Fst statistics over SNP data, and >> there is no point of collecting some good test datasets and not sharing >> them >> with other similar projects in other programming languages. >> >> The same goes for much of the documentation, like use cases: if we >> collect a good base of use cases related to bioinformatics, it would >> be easier to coordinate the efforts of all the Bio* projects and >> compare the different approaches used to solve the same issue by the >> different comunities. >> >> At the moment, I have created a simple git repository on github: >> - http://github.com/dalloliogm/bio-test-datasets-repository >> but , it is still empty and maybe github is not the ideal hosting for >> such a project, since the free account has a 100MB space limit. >> >> >> -- >> ----------------------------------------------------------- >> >> My Blog on Bioinformatics (italian): http://bioinfoblog.it >> _______________________________________________ >> Open-Bio-l mailing list >> Open-Bio-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/open-bio-l > > Jason Stajich > jason at bioperl.org > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Thu Jan 15 08:53:35 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jan 2009 13:53:35 +0000 Subject: [Open-bio-l] a common repository for test datasets/use cases for all Bio* projects In-Reply-To: <5aa3b3570901150321q431ada7cm639cb351a51a563c@mail.gmail.com> References: <5aa3b3570810280406i52c61a4cxecc39016a432876b@mail.gmail.com> <3DD9AC3A-56C9-4514-A7DB-DBA649AA2976@bioperl.org> <5aa3b3570901150321q431ada7cm639cb351a51a563c@mail.gmail.com> Message-ID: <320fb6e00901150553u64be38edtee631c8198d3a7a1@mail.gmail.com> Jason wrote: >> I don't know if this is really the best email list for this -- although not >> sure what other common list should be used. >> >> We actually a started a project like this many moons ago, but no one >> contributed examples... >> >> http://code.open-bio.org/cgi/viewcvs.cgi/biodata/ I presume this repository has been unused for some time, and thus wasn't moved to SVN last year. If there is enough interest to warrant restarting this "biodata" project, perhaps it should be moved over to SVN (maybe when we do the Biopython CVS to SVN move)? Peter From hlapp at gmx.net Thu Jan 15 10:31:00 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Jan 2009 10:31:00 -0500 Subject: [Open-bio-l] a common repository for test datasets/use cases for all Bio* projects In-Reply-To: <320fb6e00901150553u64be38edtee631c8198d3a7a1@mail.gmail.com> References: <5aa3b3570810280406i52c61a4cxecc39016a432876b@mail.gmail.com> <3DD9AC3A-56C9-4514-A7DB-DBA649AA2976@bioperl.org> <5aa3b3570901150321q431ada7cm639cb351a51a563c@mail.gmail.com> <320fb6e00901150553u64be38edtee631c8198d3a7a1@mail.gmail.com> Message-ID: <2B65E656-974C-472D-ABAF-DB0D592FEA0F@gmx.net> On Jan 15, 2009, at 8:53 AM, Peter wrote: > I presume this repository [...] should be moved over to SVN I'd be much in favor of that. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From dalloliogm at gmail.com Thu Jan 15 11:21:49 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 15 Jan 2009 12:21:49 +0100 Subject: [Open-bio-l] a common repository for test datasets/use cases for all Bio* projects In-Reply-To: <3DD9AC3A-56C9-4514-A7DB-DBA649AA2976@bioperl.org> References: <5aa3b3570810280406i52c61a4cxecc39016a432876b@mail.gmail.com> <3DD9AC3A-56C9-4514-A7DB-DBA649AA2976@bioperl.org> Message-ID: <5aa3b3570901150321q431ada7cm639cb351a51a563c@mail.gmail.com> On Thu, Dec 4, 2008 at 6:06 PM, Jason Stajich wrote: > I don't know if this is really the best email list for this -- although not > sure what other common list should be used. > > We actually a started a project like this many moons ago, but no one > contributed examples... > > http://code.open-bio.org/cgi/viewcvs.cgi/biodata/ For the moment I am putting some use cases in this repository: - http://github.com/dalloliogm/bio-test-datasets-repository/tree/master What I am doing, basically, it is just to collect messages from the biopython mailing list (hope I am not doing anything illegal) and problems encountered in our lab work, and put them there. If you give me access to the biodata's cvs or wiki I can put them there (even if I would prefer a git repository). I don't have much time to do more than this now... but over the time I can improve many things. Well.. let me just say this stupid thing now or later it will be too late :) I don't like the name 'biodata'... what about something like 'biotests' or 'biodatasets', or 'bio-test-datasets'? > > We can start a common SVN repository for this if you like or a github on OBF > if that is more likely to garner contributions. > In terms of documentation - you are certainly welcome to make a > documentation repository but I would argue a wiki or wiki-like soln would be > best for documentation. > Whether a common wiki can be maintained among the projects (or merge the > wikifarms someday) is something to contemplate too. > > -jason > > On Oct 28, 2008, at 4:06 AM, Giovanni Marco Dall'Olio wrote: > >> Hi! >> My name is Giovanni, I come from biopython's mailing list. >> >> I would like to make you a proposal. >> Every module/program written in bioinformatics needs to be tested >> before it can be used to produce results that can be published. >> >> For example, let's say I want to write another fasta file parser, like >> SeqIO.FastaIO in biopython : I would have have to test the script >> against some real fasta files, just to make sure that it doesn't parse >> them in a wrong way, or that it losts data. >> Or, let's say I want to write a script to calculate Fst statistics >> over some population genetics data: I will have to compare the results >> of my scripts against other programs, check if it gives me the right >> result for a set for which I already know the Fst value, and maybe >> ideate some other kind of checks to be sure my script doesn't do weird >> things, like losing input data on the way. >> >> So, the point is.. what if we create a common repository for all this >> kind of testing data, to be used in common with all the other Bio* >> projects? >> Wouldn't it be good if all the Bio* fasta parser are able to parse the >> same files and give the same results, demonstrating that all of them >> work fine or are wrong at the same time? >> >> I am doing this because me (and Tiago), in the biopython mailing list, >> would >> like to develop a module to calculate Fst statistics over SNP data, and >> there is no point of collecting some good test datasets and not sharing >> them >> with other similar projects in other programming languages. >> >> The same goes for much of the documentation, like use cases: if we >> collect a good base of use cases related to bioinformatics, it would >> be easier to coordinate the efforts of all the Bio* projects and >> compare the different approaches used to solve the same issue by the >> different comunities. >> >> At the moment, I have created a simple git repository on github: >> - http://github.com/dalloliogm/bio-test-datasets-repository >> but , it is still empty and maybe github is not the ideal hosting for >> such a project, since the free account has a 100MB space limit. >> >> >> -- >> ----------------------------------------------------------- >> >> My Blog on Bioinformatics (italian): http://bioinfoblog.it >> _______________________________________________ >> Open-Bio-l mailing list >> Open-Bio-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/open-bio-l > > Jason Stajich > jason at bioperl.org > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Thu Jan 15 13:53:35 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jan 2009 13:53:35 +0000 Subject: [Open-bio-l] a common repository for test datasets/use cases for all Bio* projects In-Reply-To: <5aa3b3570901150321q431ada7cm639cb351a51a563c@mail.gmail.com> References: <5aa3b3570810280406i52c61a4cxecc39016a432876b@mail.gmail.com> <3DD9AC3A-56C9-4514-A7DB-DBA649AA2976@bioperl.org> <5aa3b3570901150321q431ada7cm639cb351a51a563c@mail.gmail.com> Message-ID: <320fb6e00901150553u64be38edtee631c8198d3a7a1@mail.gmail.com> Jason wrote: >> I don't know if this is really the best email list for this -- although not >> sure what other common list should be used. >> >> We actually a started a project like this many moons ago, but no one >> contributed examples... >> >> http://code.open-bio.org/cgi/viewcvs.cgi/biodata/ I presume this repository has been unused for some time, and thus wasn't moved to SVN last year. If there is enough interest to warrant restarting this "biodata" project, perhaps it should be moved over to SVN (maybe when we do the Biopython CVS to SVN move)? Peter From hlapp at gmx.net Thu Jan 15 15:31:00 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Jan 2009 10:31:00 -0500 Subject: [Open-bio-l] a common repository for test datasets/use cases for all Bio* projects In-Reply-To: <320fb6e00901150553u64be38edtee631c8198d3a7a1@mail.gmail.com> References: <5aa3b3570810280406i52c61a4cxecc39016a432876b@mail.gmail.com> <3DD9AC3A-56C9-4514-A7DB-DBA649AA2976@bioperl.org> <5aa3b3570901150321q431ada7cm639cb351a51a563c@mail.gmail.com> <320fb6e00901150553u64be38edtee631c8198d3a7a1@mail.gmail.com> Message-ID: <2B65E656-974C-472D-ABAF-DB0D592FEA0F@gmx.net> On Jan 15, 2009, at 8:53 AM, Peter wrote: > I presume this repository [...] should be moved over to SVN I'd be much in favor of that. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : ===========================================================