[Biopython-dev] ANN: mindy-0.1

Andrew Dalke dalke at acm.org
Sun Mar 25 16:42:54 EST 2001


>Well, despite this friendly encouragement, I decided to take a look at 
>mindy anyways :-).

Yeah, Jeff pointed out that as well.  I pasted in the README,
which was meant to tell people that they shouldn't have long
term plans to expect that the code would be useable without
changes.  But that's perhaps overkill for the posting, which
is to get people to use the idea of it for the long term.

> Using this
>requires that you put mindy inside a mindy directory on your
>PYTHONPATH and make it importable by adding a __init__.py.

And to say that there would be effort to make things work.
Yeah, I did all my work in a single directory.

>I think the second one is an 
>old, now defunct, accession number for the same clone. The problem I
>get with just indexing with mindy using "accession" is that everything 
>will be indexed using the second accession number, and not the first
>like I would like.

Are you using the accession as the primary key or as an alias?
I made the assumption there will always be a primary key which
is unique but that there can be many aliases.

If you want something other than that, you would need to use
XSLT or a Python function, whose interfaces I sketched out
but did not implement.

>Is is possible to
>have multiple indexes pointing to the same record (ie. both AC006837
>and AE002093 point to this record)? Am I stuck using XSLT or
>something else for this case?

Yes.  Call them aliases.

>Just curious -- why'd you decide to use Berekeley DB?

I considered the following choices:
  - Berkeley DB
  - mySQL
  - PostgreSQL
  - Oracle

The last three require knowledge of SQL, of which I
have very little, and I wanted to get things up very
quickly.  In addition, all I wanted to do was lookups,
and BSDDB does that very well.  Plus, I liked that
BSDDB works in the local process rather than talking
to a server.

I can envision intefaces to the other databases.  Perhaps
for the future.

>> Would working with compressed files be useful?

>Yes, this would be really useful, at least for me. I always end up 
>uncompressing and recompressing stuff before I work with them to keep
>myself from filling up my hard disk.

Easy enough I think to stick a bit of code on the beginning
of the read to tell if the file is compressed or not.  I
think Python now includes some in-built modules for reading
compressed files, else popen'ing through zcat or bzcat is
pretty easy.

>> Would like to be able to add new files to a database.
>> 
>> Would like to remove/update files in a database.
>
>Yeah, both would be really nice! It seems like there is some
>support for this (?) but I didn't play with it.

There is?  Huh, didn't know about that.  Yes, it would be
nice.  :)

>Hmm, would it be hard to support multiple backends? I don't really
>know anything about Berkeley DB and just installed it blindly to use
>this.

No, it wouldn't.  But I think when you start getting into
"real" databases (meaning ones with SQL) then people want
the ability to set up their own schemas, sos the queries
they have go quickly.  Should the database created be
fully normalized (in which cases queries can be very
complex and require a lot of joins) or denormalized (which
make for easier queries but which is easier to accidently
leave in an invalid state)?

I don't think there is a solution, so the best is to
wait until someone has a need for it.  Then pay me to
write the interfaces :)  My need for now is indexed
searches, so I used a database system which is designed
for that task.  There is no possible confusion that the
result is usable for larger scale queries.

>Another addition which I think would be nice is storing the size of
>the indexed files.
>This would allow you to potentially skip an
>indexing when index is called on a file. 

Yeah, that would work.  Though there would need to be a way
to override that skipping.

>Nice to have you back! BTW, since you are back and I have your
>attention (hopefully :-), 

Oh, pardon.  You talking to me?  Sorry, I wasn't paying
attention.

>have you thought about adding Martel to the
>CVS tree?

Getting there.  Getting there.  My problem has always been
the difficulty of getting my linux box hooked up to the
world.  I finally gave in and bought some dedicated hardware
for it: http://www.egghead.com/category/inv/00042993/03297120.htm

By the end of the week I hope to start working on it.
OTOH, my laptop started acting flaky in the last few days :(
Have I mentioned that me and hardware don't get along?

                    Andrew





More information about the Biopython-dev mailing list