[Biopython-dev] ANN: mindy-0.1
Thomas Sicheritz-Ponten
thomas at cbs.dtu.dk
Mon Mar 26 01:53:46 EST 2001
"Andrew Dalke" <dalke at acm.org> writes:
>
> >Just curious -- why'd you decide to use Berekeley DB?
>
> I considered the following choices:
> - Berkeley DB
> - mySQL
> - PostgreSQL
> - Oracle
>
> The last three require knowledge of SQL, of which I
> have very little, and I wanted to get things up very
> quickly. In addition, all I wanted to do was lookups,
> and BSDDB does that very well. Plus, I liked that
> BSDDB works in the local process rather than talking
> to a server.
Hmm. I don't think I understand what you are actually storing - how is the
indexing done ? Are you preparsing all entries during the indexing part, or
are you storing the positions of the entries via seek and get ? (for a
simple position indexing tool ala TIGR's yank see getgene.py in biopython)
(that would also answer the alias question)
>
> I can envision intefaces to the other databases. Perhaps
> for the future.
>
> >> Would working with compressed files be useful?
Always !!! - Does anybody know how to seek/tell in a gzipped file ?
> Easy enough I think to stick a bit of code on the beginning
> of the read to tell if the file is compressed or not. I
> think Python now includes some in-built modules for reading
> compressed files, else popen'ing through zcat or bzcat is
> pretty easy.
from gzip import open ???
> No, it wouldn't. But I think when you start getting into
> "real" databases (meaning ones with SQL) then people want
> the ability to set up their own schemas, sos the queries
> they have go quickly. Should the database created be
> fully normalized (in which cases queries can be very
> complex and require a lot of joins) or denormalized (which
> make for easier queries but which is easier to accidently
> leave in an invalid state)?
Be careful, you are heading from a "simple" indexing scheme to a pySRS :-)
>
> I don't think there is a solution, so the best is to
> wait until someone has a need for it. Then pay me to
> write the interfaces :) My need for now is indexed
> searches, so I used a database system which is designed
> for that task. There is no possible confusion that the
> result is usable for larger scale queries.
>
> >Another addition which I think would be nice is storing the size of
> >the indexed files.
> >This would allow you to potentially skip an
> >indexing when index is called on a file.
Uhuh ... I don't think so, especially not if just accession numbers or ID's
are changed (e.g. from TREMBL ID yo SWISS ID) which could result in a
slightly changed db with the same size. Better to use checksum's or the
indexed accession numbers/id's (best solution, but takes more time)
> By the end of the week I hope to start working on it.
> OTOH, my laptop started acting flaky in the last few days :(
> Have I mentioned that me and hardware don't get along?
What laptop or hardware combination is causing you nightmares ?
seek-and-indexingly-y'rs
-thomas
--
Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology
thomas at biopython.org The Technical University of Denmark
CBS: +45 45 252489 Building 208, DK-2800 Lyngby
Fax +45 45 931585 http://www.cbs.dtu.dk/thomas
De Chelonian Mobile ... The Turtle Moves ...
More information about the Biopython-dev
mailing list