[Biopython-dev] pySRS (was Re: ANN: mindy-0.1)
Thomas Sicheritz-Ponten
thomas at cbs.dtu.dk
Mon Mar 26 04:00:49 EST 2001
"Andrew Dalke" <dalke at acm.org> writes:
> > Be careful, you are heading from a "simple" indexing
> > scheme to a pySRS :-)
Oh, great - pySRS was intended as a joke, but it seems to me we are
embarking on a new project ... fine with me :-)
>
> What would that entail? I'm actually pretty serious.
> What would be needed to be competitive with SRS? From
> what I know of it, it provides:
> 1. A parsing system for identifying useful regions
> of many formats
> 2. Icarus, a language used to implement the actions
> of the matches in the parser
> 3. A generic data model for storing identifiers, cross
> references, keywords and free text.
> 4. A database for storing and searching those models
> 5. A web based interface to the database
> 6. A basic set of analysis tools augmenting that interface
>
> Martel provides 1. Python provides 2. I think 3 is
> pretty easy esp. by building off the data structures
> biopython already uses for these databases. Does SRS
> have their own database for 4 or do they use an existing
> one? In either case, off-the-shelf databases provide
> similar or better functionality. I've done 5 and 6
> before, although complete solutions (like what
> bionavigator does) are much, much harder.
What is the advantage of Icarus ? (I have no idea). The only part I know is
that SRS uses a LOT of index files ...
IMHO a biopython approach would include e.g. gdbm for a light version
and/or mysql or postgres for a more full-featured version (by the way, I
have moved from postgres to mysql for several reasons)
(gdbm is present on almost all unixes and can be used [similar to cpickles]
for FAST storage and retrieval of simple key-value data)
I think between 4 and 5-6 I would include a generic library/modules so that
5 and 6 could be easily extended to include web/tk/gtk/commandline/pipes
etc. Most of that (except the Icarus part) sounds familiar to me too.
So what would the perfect combination of tools look like ?
1) Martel, for the parsing system for identifying useful regions of many formats
2) Python, a language used to implement the actions of the matches in the parser
3) Biopython/Biocorba for generic data models for storing identifiers,
cross references, keywords and free text.
4) Gdbm, MySQL databases for storing and searching those models
4b)A generic library/module encoding methods for interfacing pySRS
5) A web/Tk/Gtk based interface to the database
6) A basic set of analysis tools augmenting that interface
7) A basic set of methods optionally to be used in all other biopython
modules (e.g. FASTA parser's rec.nice_title() could query the accession
found in the rec-title field and substitute it for organism + gene name
etc.
are-we-going-to-annoy-thure?'ly y'rs
-thomas
pySRS
SnRS: Snake Retrival System
PSI-RS: Python System for Indexing and Retrieval of Sequences
PseudoSRS: python system for embedded .... ääähh ....
--
Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology
thomas at biopython.org The Technical University of Denmark
CBS: +45 45 252489 Building 208, DK-2800 Lyngby
Fax +45 45 931585 http://www.cbs.dtu.dk/thomas
De Chelonian Mobile ... The Turtle Moves ...
More information about the Biopython-dev
mailing list