[BioRuby] gsoc suggestion: microframework for simple scientific web wrappers
Yannick Wurm
y.wurm at qmul.ac.uk
Thu Feb 6 00:09:58 UTC 2014
Dear all,
a small thought about a potential GSoC project.
Many bioinformatics software consist in a binary that you run on the command line with one or few input files, some parameters and generates some output files. Let's consider only software that generates potentially human-readable output.
Most of us on this mailing list have no problem running that kind of software on the command-line. But for the majority of biologists that's still impossible: they need a point and click interface instead.
So if you're the person who needs to implement that point and click interface, how do you do it?
1. create a wrapper for galaxy [1]. This has become easy.. but puts the burden on your enduser to have or set up a galaxy installation (not trivial), and the galaxy user experience is debatable.
2. use sinatra.rb (we did this for our sequenceserver wrapper for blast) - it worked but involved way too much manual labor.
3. be old-skool (build your own from php/etc).
Clearly 1 isn't always appropriate & locks you into a weird framework, and 2. is still to much work. Padrino & rails are overkill for the simplest apps. With Ruby providing such great web development frameworks, why isn't there an easier/faster way to generate a web wrapper around a piece of scientific software?
Perhaps I'm missing something.
Alternatively, creating a "wrapping scientific software" framework could be a viable GSoC project.
Build it upon Sinatra, create a rigid framework where the basic locations of files that the developer needs to edit are predetermined (similarly to rails). Single page/webform for the user to enter data; single output/download page after the run was successful. No need to store any user-data on the server. The framework should include the following features:
* easy way to verify presence, executability and version of binary (or script) that is being wrapped
* easy way to specify number of input files, and potential constraints on them [this stuff should be specified once; appropriate HTML should be auto-generated (bootstrap)].
* most basic constraints: size and/or extension
* more advanced constraints: user-extensible function that verifies the format
* easy way to specify possible parameters and constraints on their types
* easy way to show/include local data (HMM models, sequence databases etc...)
* easy way to make text-output look good
* eg. inserting specific headers or indexing at specific regexps (for table of contents)
* eg. csv output should be shown as a table
I'm not the best qualified person to consider exact implementation details, but if someone wants to go ahead with it I'm happy to provide more general thoughts.
Cheers,
Yannick
[1]: http://galaxyproject.org
-------------------------------------------------------
Yannick Wurm - http://yannick.poulet.org
Ants, Genomes & Evolution ⋅ y.wurm at qmul.ac.uk ⋅ skype:yannickwurm ⋅ +44 207 882 3049
5.03A Fogg ⋅ School of Biological & Chemical Sciences ⋅ Queen Mary, University of London ⋅ Mile End Road ⋅ E1 4NS London ⋅ UK
More information about the BioRuby
mailing list