[BioPython] Re: [Biopython-dev] Implementation advice

Jason Stajich jason@cgt.mc.duke.edu
Mon, 17 Jun 2002 17:44:26 -0400 (EDT)


The latest docs *should* be in
cvs -d:pserver:cvs.open-bio.org:/home/repository/obf-common co obda-specs

However the anon cvs server is down temporarily until Chris D can hit
reset in the Cambridge,MA server room.  Hopefully that will be soon.

-j
On Tue, 18 Jun 2002, Iddo Friedberg wrote:

>
> Thanks for pointing this out to me, Jeff. Please try and find those
> missing docs... :))
>
> Iddo
>
>
>
>
>
> On Mon, 17 Jun 2002, Jeffrey Chang wrote:
>
> : At the Biohackathon in April(?), we talked about the need to provide
> : this kind of database capability, and the 4 projects (biopython,
> : bioperl, biojava, and bioruby) decided to standardize on 2
> : cross-platform approaches.For smaller databases, we invented our own
> : flat file format.For larger ones, we used Berkeley DB.  Andrew wrote
> : some excellent documentation for these, but I can't find it right now.
> :
> : Andrew has implemented both these already in Bio.Mindy.Please take a
> : look there.The advantage of using one of these is that 1) the db
> : stuff is already written, and 2) the resulting file will be usable for
> : the other bio projects as well.
> :
> : Jeff
> :
> :
> : On Mon, Jun 17, 2002 at 05:24:26PM +0300, Iddo Friedberg wrote:
> : > Hi all,
> : >
> : > I am tryingto expand the functionality of FSSP a bit. As part of that, I
> : > would like to provide the user with the ability to give a PDB id, and
> : > retrieve the name of the FSSP file(s) containing that PDB id.
> : >
> : > Without getting into too much details, each FSSP file (out of some 2800)
> : > has anywhere between 3 and 300 PDB ids, some of them in more than one
> : > file.
> : >
> : > I was thinking of creating a dictionary which will look something like:
> : > { '1chyA': ['1xyzB','3fgy0'],
> : > '3dcp0': ['3syx'],
> : > '2abcC': ['3syx', '4rde'],
> : > .
> : > .
> : > .
> : > }
> : > # Meaning, that 1chyA is in the FSSP file represented by 1xyzB and in the
> : > # one represented by 3fgy0
> : >
> : > Dictionary creation will be a one-time thing, its updates as frequently as
> : > the user likes (not very frequent), and queries will be many (very
> : > frequent). It seems a bit large to read (some 2800 keys, and rising) in
> : > anytime you actually need to find out where 2abcC is located, so I thought
> : > of using the Python dbm interface.
> : >
> : > 'anydbm', soas to maximize platform independence.
> : >
> : > ***** Is this good so far? Or is there a better tool I can use? I don't
> : > want to use SQL here... seems a bit of an overkill.
> : >
> : > Because anydbm (as do gdbm, dumbdbm...) accepts only strings for keys and
> : > values, and I'd like to use lists in the values (maybe also in the keys),
> : > I thought that creating a UserDict instance which overloads __getitem__,
> : > __setitem__, etc. , using cPickle.loads and cPickle.dumps for key and
> : > values, this transparently enabling the use of non-strings in a Python dbm
> : > interface. (Bit of code attached).
> : >
> : > **** This seems a very generic application. I'd be extremely surprised if
> : > nobody did something like this before. But I couldn't really find
> : > anything. Comments?
> : >
> : >
> : > Thanks,
> : >
> : > Iddo
> : >
> : >
> : > --
> : >
> : > Iddo Friedberg                                | Tel: +972-2-6757374
> : > Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308
> : > The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il
> : > POB 12272, Jerusalem 91120                    |
> : > Israel                                        |
> : > http://bioinfo.md.huji.ac.il/marg/people-home/iddo/
> : >
> : >
> : >
> : >
> : >
> :
> : > import cPickle
> : > import UserDict
> : > import anydbm
> : > loads = cPickle.loads
> : > dumps = cPickle.dumps
> : > class dbmDict(UserDict.UserDict):
> : > 	def __init__(self,filename, flag='r'):
> : > 		self.data = anydbm.open(filename,flag)
> : > 	def __getitem__(self,key):
> : > 		return loads(self.data[dumps(key)])
> : >	def __setitem__(self,key, value):
> : > 		self.data[dumps(key)] = dumps(value)
> : > 	def values(self):
> : > 		value_list = []
> : > 		for i in self.data.keys():
> : > 			value_list.append(loads(self.data[i]))
> : > 		return value_list
> : > 	def keys(self):
> : > 		key_list = []
> : > 		for i in self.data.keys():
> : > 			key_list.append(loads(i))
> : > 		return key_list
> :
> : _______________________________________________
> : Biopython-dev mailing list
> : Biopython-dev@biopython.org
> : http://biopython.org/mailman/listinfo/biopython-dev
> :
>
> --
>
> Iddo Friedberg                                  | Tel: +972-2-6757374
> Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
> The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il
> POB 12272, Jerusalem 91120                      |
> Israel                                          |
> http://bioinfo.md.huji.ac.il/marg/people-home/iddo/
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu