[Biopython-dev] pySRS (was Re: ANN: mindy-0.1)

Mon Mar 26 04:00:49 EST 2001

"Andrew Dalke" <dalke at acm.org> writes:

> > Be careful, you are heading from a "simple" indexing
> > scheme to a pySRS :-)

Oh, great - pySRS was intended as a joke, but it seems to me we are
embarking on a new project ... fine with me :-)

> 
> What would that entail?  I'm actually pretty serious.
> What would be needed to be competitive with SRS?  From
> what I know of it, it provides:
>   1. A parsing system for identifying useful regions
>        of many formats
>   2. Icarus, a language used to implement the actions
>        of the matches in the parser
>   3. A generic data model for storing identifiers, cross
>        references, keywords and free text.
>   4. A database for storing and searching those models
>   5. A web based interface to the database
>   6. A basic set of analysis tools augmenting that interface
> 
> Martel provides 1.  Python provides 2.  I think 3 is
> pretty easy esp. by building off the data structures
> biopython already uses for these databases.  Does SRS
> have their own database for 4 or do they use an existing
> one?  In either case, off-the-shelf databases provide
> similar or better functionality.  I've done 5 and 6
> before, although complete solutions (like what
> bionavigator does) are much, much harder.

What is the advantage of Icarus ? (I have no idea). The only part I know is
that SRS uses a LOT of index files ...  

IMHO a biopython approach would include e.g. gdbm for a light version
and/or mysql or postgres for a more full-featured version (by the way, I
have moved from postgres to mysql for several reasons)
(gdbm is present on almost all unixes and can be used [similar to cpickles]
for FAST storage and retrieval of simple key-value data)

I think between 4 and 5-6 I would include a generic library/modules so that
5 and 6 could be easily extended to include web/tk/gtk/commandline/pipes
etc. Most of that (except the Icarus part) sounds familiar to me too.

So what would the perfect combination of tools look like ?

1) Martel, for the parsing system for identifying useful regions of many formats
2) Python, a language used to implement the actions of the matches in the parser
3) Biopython/Biocorba for generic data models for storing identifiers,
   cross references, keywords and free text.
4) Gdbm, MySQL databases for storing and searching those models
4b)A generic library/module encoding methods for interfacing pySRS 
5) A web/Tk/Gtk based interface to the database
6) A basic set of analysis tools augmenting that interface
7) A basic set of methods optionally to be used in all other biopython
   modules (e.g. FASTA parser's rec.nice_title() could query the accession
   found in the rec-title field and substitute it for organism + gene name
   etc.

are-we-going-to-annoy-thure?'ly y'rs
-thomas

pySRS 
SnRS: Snake Retrival System 
PSI-RS: Python System for Indexing and Retrieval of Sequences
PseudoSRS: python system for embedded .... ääähh  .... 

-- 
Sicheritz-Ponten Thomas, Ph.D  CBS, Department of Biotechnology
thomas at biopython.org           The Technical University of Denmark
CBS:  +45 45 252489            Building 208, DK-2800 Lyngby
Fax   +45 45 931585            http://www.cbs.dtu.dk/thomas

	De Chelonian Mobile ... The Turtle Moves ...