[Biopython-dev] Implementation advice

Mon Jun 17 10:24:26 EDT 2002

Hi all,

I am trying to expand the functionality of FSSP a bit. As part of that, I
would like to provide the user with the ability to give a PDB id, and
retrieve the name of the FSSP file(s) containing that PDB id.

Without getting into too much details, each FSSP file (out of some 2800)
has anywhere between 3 and 300 PDB ids, some of them in more than one
file.

I was thinking of creating a dictionary which will look something like:
{ '1chyA': ['1xyzB','3fgy0'],
  '3dcp0': ['3syx'],
  '2abcC': ['3syx', '4rde'],
.
.
.
}
# Meaning, that 1chyA is in the FSSP file represented by 1xyzB and in the
# one represented by 3fgy0

Dictionary creation will be a one-time thing, its updates as frequently as
the user likes (not very frequent), and queries will be many (very
frequent). It seems a bit large to read (some 2800 keys, and rising) in
anytime you actually need to find out where 2abcC is located, so I thought
of using the Python dbm interface.

'anydbm', so as to maximize platform independence.

***** Is this good so far? Or is there a better tool I can use? I don't
want to use SQL here... seems a bit of an overkill.

Because anydbm (as do gdbm, dumbdbm...) accepts only strings for keys and
values, and I'd like to use lists in the values (maybe also in the keys),
I thought that creating a UserDict instance which overloads __getitem__,
__setitem__, etc. , using cPickle.loads and cPickle.dumps for key and
values, this transparently enabling the use of non-strings in a Python dbm
interface. (Bit of code attached).

**** This seems a very generic application. I'd be extremely surprised if
nobody did something like this before. But I couldn't really find
anything. Comments?

Thanks,

Iddo

--

Iddo Friedberg                                  | Tel: +972-2-6757374
Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg at cc.huji.ac.il
POB 12272, Jerusalem 91120                      |
Israel                                          |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/

-------------- next part --------------
import cPickle
import UserDict
import anydbm
loads = cPickle.loads
dumps = cPickle.dumps
class dbmDict(UserDict.UserDict):
	def __init__(self,filename, flag='r'):
		self.data = anydbm.open(filename,flag)
	def __getitem__(self,key):
		return loads(self.data[dumps(key)])
	def __setitem__(self,key, value):
		self.data[dumps(key)] = dumps(value)
	def values(self):
		value_list = []
		for i in self.data.keys():
			value_list.append(loads(self.data[i]))
		return value_list
	def keys(self):
		key_list = []
		for i in self.data.keys():
			key_list.append(loads(i))
		return key_list