[Bioperl-l] want to bring tools together to help small labs

Fernan Aguero fernan@iib.unsam.edu.ar
Fri, 2 Nov 2001 14:40:10 -0300


+----[ Osborne, Brian (Brian.Osborne@osip.com) dijo sobre "RE: [Bioperl-l] want to bring tools together to help small labs":
|
| Fernan,
| 
| Like you, I don't seem to see the cabability to create or maintain local
| databases in bioperl yet, but I certainly could be missing something. I
| would also guess that adding this capability is allowed by the
| architetecture. What you're proposing sounds like a marriage between bioperl
| and Ian Korf's MyGenBank (http://sapiens.wustl.edu/~ikorf/MyGenBank.html).
| 
| Brian O.

I've been looking into MyGenBank and, yes this is part of what we
need. But the other aspect of it is storing in the database the output
of the analysis performed on sequences (BLAST or whatever).
Say you perform BLAST analysis on several thousand sequences, when you
parse the BLAST reports you get an in-memory data structure that can
be as well stored in the dabatase. Of course you can also store the
original blast report and a path to it in the db, but having the
parsed data handy is one of the goals. 

As i said, we already have this working, but we were thinking about
changing it into something more generic and not so tied to our own
system.

So what I'm thinking about right now is to somehow map bioperl
structures onto db schemas (for genbank parsers, blast parsers, etc)
so as to use bioperl capabilities to populate a database.
Of course you would end up storing more information than you perhaps
need, but this would make it more direct and generic, in the sense
that perhaps (I'm just guessing) this could be extended to other
parsers.
I know that this will not be easy (I'm talking about me right now)
because I've always tried to avoid the innards of bioperl and its complex
OO structure. Perhaps it's time to dive into it a little ...

Do this sound right? Is this already done (haven't time to check
bioperl-db yet)?

Fernan
 


|  -----Original Message-----
| From: 	Fernan Aguero [mailto:fernan@iib.unsam.edu.ar] 
| Sent:	Friday, November 02, 2001 8:48 AM
| To:	T.D. Houfek
| Cc:	bioperl-l@bioperl.org
| Subject:	Re: [Bioperl-l] want to bring tools together to help small
| labs
| 
| +----[ T.D. Houfek (tdhoufek@unity.ncsu.edu) dijo sobre "[Bioperl-l] want to
| bring tools together to help small labs":
| |
| | Hi all,
| | 
| | Recently Jason Stajich visited our lab and gave us a lot of good
| information
| | as well as encouragement to participate here.  But I'm new to this forum,
| | so please excuse me (yet still tell me) if I stray too far from its proper
| subject
| | matter.  Besides whatever my lab puts on our plate at any given moment,
| | we're chiefly interested in working on freely available open-source
| software
| | geared towards the needs of small-to-medum size laboratories doing
| | sequencing.  Smaller labs, with their correspondingly small computer
| hardware
| | and bioinformatics salary budgets, have an extremely daunting task on
| | their hands even if their ambitions for analysis are modest.  Ultimately
| | there is no cure for this problem, but we'd like to do something to ease
| | the pain... and I'd greatly appreciate any help anyone can give us.
| | 
| | Since small labs do more EST sequencing than large genomic assemblies, I'd
| | like to develop a distributable Linux/UNIX web application package that:
| | 	a) facilitates batching of various analyses for ESTs
| | 	b) allows specification of different processing "pipelines" for
| | 	   different sets of incoming data.
| | 	c) stores sequence data, quality data, meta-data, analysis
| | 	   results, etc in a relational database.
| | 	d) gives easy web browsing access to this data, allowing
| specification of
| | 	   different levels of access permissions for different data sets.
| | 	e) seriously eases data management burdens, including:
| | 	   	1) file organization
| | 		2) sequence data quality control
| | 		3) data backups
| | 		4) logging of analysis histories
| | 	f) installs easily
| | 	g) allows almost all ongoing administration to be done by
| | 	   researchers or technicians  (non-power-users) through CGI.
| | 	h) requires only one fairly decent ( <=$5,000 ) computer, but
| | 	   allows a number of ways to distribute the system over more
| | 	   machines (so that a lab can separate the workhorse and the
| | 	   web server, or grow a small compute farm).
| | 
| | There being no point to reinventing the wheel, I'd like to use BioPerl /
| | BioJava / etc wherever I can.  If anyone has any thoughts about how such
| | might (or might not) fit into such a scheme, or has helpful information
| | about what smaller labs they have known might want or need, I'd be most
| | grateful!
| | 
| | T.D. Houfek
| | 
| |
| +----]
| 
| Dear T.D.and bioperlers:
| 
| Have you had any positive responses and/or suggestions? are there
| other people interested in this?
| 
| Some time ago we needed the same thing and looked into how ensembl was
| doing things since we also wanted to use a relational database backend
| to store info and then generate the web pages on the fly through CGI
| scripts. For us it was too complex and also had a bias toward higher
| eukaryotes, which we did not needed (we work with bacteria and
| protozoa). 
| In the end we developed our own db schema, scripts and so on ... but
| as a first attempt at it I know it is far from being _the_right_thing_
| First of all it is too much customized to our own projects and way of
| working.
| 
| So we now would like to have something more generic, more modularized
| and ... simple. So if we agree on the goals (I would also like to add
| EST clustering to the list) perhaps we could join our effort.
| 
| I haven't looked into bioperl lately, but I thought there was a
| bioperl-db or something like this ... I don't seem to find it right
| now. I had the idea (perhaps misguided) that it was a generic db
| schema and modules to store sequence info and annotation, am i right?
| 
| If this is of no interest to the list we can discuss it in private.
| 
| Regards,
| 
| Fernan
| 
| -- 
| 
| |  F e r n a n   A g u e r o  |  B i o i n f o r m a t i c s  |
| |   fernan@iib.unsam.edu.ar   |      genoma.unsam.edu.ar      |
| _______________________________________________
| Bioperl-l mailing list
| Bioperl-l@bioperl.org
| http://bioperl.org/mailman/listinfo/bioperl-l
| 
|
+----]

-- 

|  F e r n a n   A g u e r o  |  B i o i n f o r m a t i c s  |
|   fernan@iib.unsam.edu.ar   |      genoma.unsam.edu.ar      |