[Biopython-dev] biopython web interface

Mon May 2 13:52:15 UTC 2011

Hello Massimo... first of all... thanks for web2py, which is my tool
of choice for web apps :D

Here goes my 2 cents about all this:

1) I you're looking for a standard format, we should me talking about
sequence files ( fasta / gff ). This approach will be very
restrictive, but i guess it's a starting point.

2) you should look at galaxy, in some point I was hoping to integrate
a web2py programming module directly there (don't know how yet, and
i'm in many things at once, so it's more like a dream than a project).
Galaxy has a fex tutorials and videos that should point you in the
right direction.

3) Sadly, standard data representation has been an issue for some time
for the bioinformatics community, the REST / web services approach has
gain some momentum and some apps talk to each other in some way, but
we still have not much of a standard way to represent all the data.
Ontologies are a strong point also (check http://www.obofoundry.org/ )
with sequence ontology being a great one IMHO pointing on how the data
should be represented (it's recommended, even when not enforced, to
use SO when creating gff3 files).

4) So far, the one tool to "standard biological data saving" I've
found useful was the Chado DB schema, which BTW didn't enforce or even
define how to handle a lot of situations, but is more of a framework
on which to base your own data representation. I guess that's not what
you're looking for, but surely an interesting approach and a lot of
lessons learned there.

I'm currently building a web interface for some of our projects saving
genomic and proteomic data on a Chado DB ( http://gmod.org/wiki/Chado
) using web2py, but it's at least rough and in a pre-alpha (as in a
PoC) state. Some other folks here have been doing the same kind of
projects, hopefully someone with a better and less specific approach.
If it suits you, just contact me and i'll provide you all the
direction and ideas my limited knowledge could generate. I'm a little
dispersed man most of the time, so maybe not your ideal adviser, but I
have the will.

Greets and thanks again for web2py

Bernardo Clavijo

PD: please folks correct all my bad ideas for Massimo to have a real
view and not my mess

On Sun, May 1, 2011 at 1:51 AM, Massimo Di Pierro
<mdipierro at cs.depaul.edu> wrote:
> Hello Andrea
>
> I am a looking at something a little different than what you are doing but we should definitely collaborate.
> I am trying to identify tasks that are not domain specific that could benefit more than one scientific community.
>
> It seems to me all scientific communities have data, have program (in python or not it irrelevant to me) and have a workflow.
> They all need:
> 1) a tool to post the data online in a semi-automated fashion
> 2) a tool to share data easily (both via web interface and scripting via web service) with access control
> 3) a way to annotate the data as in a CMS
> 4) a mechanism to connect data with a workflow so that certain programs are executed automatically when new data is uploaded in the system. The programs may require user input so it should possible to somehow register a task (a program) by describing what input data it needs and what user input it needs and the system should automatically generate an interface.
> 5) an interface to local clusters and grid resources to submit computing jobs to
>
> I do not have the resources or the expertise to build an interface specific for biopython but I think we should collaborate because if what I am going is general enough (and I am not sure it is unless we talk more about it) it could be used to create an interface to biopython with minimal programming.
>
> I understand your focus is on algorithms but I need to start on data. It is my experience it is very difficult to automate the workflow of algorithms if there is no standard exchange format for the data.
>
> The first thing I would need to understand are:
> - does biopython handle some standard file formats? What do they contain? how can they be recognized? Can you send me a few example?
> - is there a graph of which algorithms run on which file types?
> - what are the most common algorithms? Can you point me to the source?
>
> I like to think of the system as something that will represent the workflow as a graph. Each file type is a node. An algorithm is a link.
> If a node is an image or a csv file or an xml file or a movie or a vtk file, etc. the system will be able to represent it (show it).
> Links "define" the file type. As long as you have a standard, you will be able to register your algorithms and the system will know what to do.
>
> The all graph is built automatically without programming by introspecting your folders and identifying your files. You will be able to annotate your folders using a markup language to augment the information.
>
> In my approach starting from the data is critical. My approach does not fly if you do not have standard file formats.
>
> Massimo
>
>
>
>
>
>
>
> P.S. Sei italiano?
>
> On Apr 30, 2011, at 12:03 PM, Andrea Pierleoni wrote:
>
>>
>>>
>>> Message: 3
>>> Date: Fri, 29 Apr 2011 08:34:34 -0500
>>> From: Massimo Di Pierro <mdipierro at cs.depaul.edu>
>>> Subject: [Biopython-dev] biopython web interface
>>> To: <biopython-dev at biopython.org>
>>> Message-ID: <57629245-F184-4143-8B18-80E69BC2C351 at cs.depaul.edu>
>>> Content-Type: text/plain; charset="us-ascii"
>>>
>>> Hello everybody,
>>>
>>> I am new to biopython and I have some silly questions.
>>>
>>> Does biopython have a web interface?
>>> If not, would you be interested in help developing one?
>>> What kind of features would you be interested in?
>>>
>>> Reason for my question: I am a physicist and a professor of CS. I am
>>> working with a few different groups to build a unified platform to bring
>>> scientific data online. The main idea is that of having a tool that
>>> requires no programming and scientists can use to introspect an existing
>>> directory and turn it into dynamical web pages. Those pages can then be
>>> edited and re-oreganized like a CMS. The system should be able to
>>> recognize basic file types, group, tag and categorize them. It should them
>>> be possible to register algorithms, run them on the server, create a
>>> workflow. The system will also have an interface for mobile.
>>>
>>> Here is a first prototype for physics data that interface with the
>>> National Energy Research Computing Center:
>>> http://tests.web2py.com/nersc
>>>
>>> Since we are doing this it would be great to have as many community on
>>> board as possible so that we can write specs that are broad enough.
>>> We can do all the work or you can help us if you want.
>>>
>>> So, if you have a wish list please share it with me.
>>>
>>> Personally, I need to be educated on biopython since I do not fully
>>> understand what are the basic file types it handles, what are the most
>>> popular algorithms it provides, nor I am familiar with the typical usage
>>> workflow.
>>>
>>> Massimo
>>>
>>>
>>>
>>
>>
>> Hi Massimo,
>> BioPython itself is a python library, but a web interface would enable many
>> functions to biological scientist with no programming expertise.
>> There are some parts of the library that cope well with a
>> web-interface/server,
>> in particular the BioSQL modules.
>> The BioSQL schema is a relational database model to store biological data.
>> I do have working code for using the BioPython BioSQL functions (and more)
>> with
>> the web2py DAL, and I'm working on a complete web2py-based opensource
>> webserver to store and manage biological sequences/entities.
>> If you (or any other) are interested and want to contribute, let me know.
>> There are  many things in common between what I'm doing and what you want
>> to do,
>> so maybe its a good idea to work together.
>>
>> Andrea Pierleoni
>>
>>
>>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>