[BioRuby] gsoc suggestion: microframework for simple scientific web wrappers

Thu Feb 6 00:09:58 UTC 2014

Dear all,

a small thought about a potential GSoC project. 

Many bioinformatics software consist in a binary that you run on the command line with one or few input files, some parameters and generates some output files. Let's consider only software that generates potentially human-readable output. 

Most of us on this mailing list have no problem running that kind of software on the command-line. But for the majority of biologists that's still impossible: they need a point and click interface instead. 

So if you're the person who needs to implement that point and click interface, how do you do it? 
 1. create a wrapper for galaxy [1]. This has become easy.. but puts the burden on your enduser to have or set up a galaxy installation (not trivial), and the galaxy user experience is debatable.
 2. use sinatra.rb (we did this for our sequenceserver wrapper for blast) - it worked but involved way too much manual labor.
 3. be old-skool (build your own from php/etc).

Clearly 1 isn't always appropriate & locks you into a weird framework, and 2. is still to much work. Padrino & rails are overkill for the simplest apps. With Ruby providing such great web development frameworks, why isn't there an easier/faster way to generate a web wrapper around a piece of scientific software? 

Perhaps I'm missing something. 

Alternatively, creating a "wrapping scientific software" framework could be a viable GSoC project. 

Build it upon Sinatra, create a rigid framework where the basic locations of files that the developer needs to edit are predetermined (similarly to rails). Single page/webform for the user to enter data; single output/download page after the run was successful. No need to store any user-data on the server. The framework should include the following features: 
 * easy way to verify presence, executability and version of binary (or script) that is being wrapped
 * easy way to specify number of input files, and potential constraints on them  [this stuff should be specified once; appropriate HTML should be auto-generated (bootstrap)]. 
    * most basic constraints: size and/or extension
    * more advanced constraints: user-extensible function that verifies the format
 * easy way to specify possible parameters and constraints on their types 
 * easy way to show/include local data (HMM models, sequence databases etc...)
 * easy way to make text-output look good
    * eg. inserting specific headers or indexing at specific regexps (for table of contents)
    * eg. csv output should be shown as a table

I'm not the best qualified person to consider exact implementation details, but if someone wants to go ahead with it I'm happy to provide more general thoughts. 

Cheers,

Yannick

[1]: http://galaxyproject.org

-------------------------------------------------------
Yannick Wurm - http://yannick.poulet.org
Ants, Genomes & Evolution ⋅ y.wurm at qmul.ac.uk ⋅ skype:yannickwurm ⋅ +44 207 882 3049
5.03A Fogg ⋅ School of Biological & Chemical Sciences ⋅ Queen Mary, University of London ⋅ Mile End Road ⋅ E1 4NS London ⋅ UK