[Biopython-dev] Implementation advice

Iddo Friedberg idoerg at cc.huji.ac.il
Wed Jun 19 05:09:31 EDT 2002


Taking up on Jeff's advice, I started playing around with Mindy. I guess
I already have some sort of bug report, which I submitted to the new
bugzilla (good move, Jeff).

Anyhow, just to make sure this gets through (Andrew?), here is the bug(?)
report:

I copied the source code of Xpath.main() to my own file, changed the path
to the swissprot flat-file, and ran it. Here's the result:

>>> sp_reader.main()

/usr/local/lib/python2.2/site-packages/bsddb3/__init__.py:46:
RuntimeWarning: Python
C API version mismatch for module _db: This Python has API version 1011,
module
_db has version 1010.
  import _db
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "sp_reader.py", line 29, in main
    extract_info = [
  File "/home/idoerg/cvs_biopython/biopython/Bio/Mindy/XPath.py", line 58,
in
xpath_index
    creator = creator_factory(dbname, primary_namespace, data_names)
  File "/home/idoerg/cvs_biopython/biopython/Bio/Mindy/BerkeleyDB.py",
line 17,
in create
    primary_namespace = self.primary_namespace,
NameError: global name 'self' is not defined









On Mon, 17 Jun 2002, Jeffrey Chang wrote:

: At the Biohackathon in April(?), we talked about the need to provide
: this kind of database capability, and the 4 projects (biopython,
: bioperl, biojava, and bioruby) decided to standardize on 2
: cross-platform approaches.For smaller databases, we invented our own
: flat file format.For larger ones, we used Berkeley DB.  Andrew wrote
: some excellent documentation for these, but I can't find it right now.
:
: Andrew has implemented both these already in Bio.Mindy.Please take a
: look there.The advantage of using one of these is that 1) the db
: stuff is already written, and 2) the resulting file will be usable for
: the other bio projects as well.
:
: Jeff
:
:
: On Mon, Jun 17, 2002 at 05:24:26PM +0300, Iddo Friedberg wrote:
: > Hi all,
: >
: > I am tryingto expand the functionality of FSSP a bit. As part of that, I
: > would like to provide the user with the ability to give a PDB id, and
: > retrieve the name of the FSSP file(s) containing that PDB id.
: >
: > Without getting into too much details, each FSSP file (out of some 2800)
: > has anywhere between 3 and 300 PDB ids, some of them in more than one
: > file.
: >
: > I was thinking of creating a dictionary which will look something like:
: > { '1chyA': ['1xyzB','3fgy0'],
: > '3dcp0': ['3syx'],
: > '2abcC': ['3syx', '4rde'],
: > .
: > .
: > .
: > }
: > # Meaning, that 1chyA is in the FSSP file represented by 1xyzB and in the
: > # one represented by 3fgy0
: >
: > Dictionary creation will be a one-time thing, its updates as frequently as
: > the user likes (not very frequent), and queries will be many (very
: > frequent). It seems a bit large to read (some 2800 keys, and rising) in
: > anytime you actually need to find out where 2abcC is located, so I thought
: > of using the Python dbm interface.
: >
: > 'anydbm', soas to maximize platform independence.
: >
: > ***** Is this good so far? Or is there a better tool I can use? I don't
: > want to use SQL here... seems a bit of an overkill.
: >
: > Because anydbm (as do gdbm, dumbdbm...) accepts only strings for keys and
: > values, and I'd like to use lists in the values (maybe also in the keys),
: > I thought that creating a UserDict instance which overloads __getitem__,
: > __setitem__, etc. , using cPickle.loads and cPickle.dumps for key and
: > values, this transparently enabling the use of non-strings in a Python dbm
: > interface. (Bit of code attached).
: >
: > **** This seems a very generic application. I'd be extremely surprised if
: > nobody did something like this before. But I couldn't really find
: > anything. Comments?
: >
: >
: > Thanks,
: >
: > Iddo
: >
: >
: > --
: >
: > Iddo Friedberg                                | Tel: +972-2-6757374
: > Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308
: > The Hebrew University - Hadassah Medical School | email: idoerg at cc.huji.ac.il
: > POB 12272, Jerusalem 91120                    |
: > Israel                                        |
: > http://bioinfo.md.huji.ac.il/marg/people-home/iddo/
: >
: >
: >
: >
: >
:
: > import cPickle
: > import UserDict
: > import anydbm
: > loads = cPickle.loads
: > dumps = cPickle.dumps
: > class dbmDict(UserDict.UserDict):
: > 	def __init__(self,filename, flag='r'):
: > 		self.data = anydbm.open(filename,flag)
: > 	def __getitem__(self,key):
: > 		return loads(self.data[dumps(key)])
: >	def __setitem__(self,key, value):
: > 		self.data[dumps(key)] = dumps(value)
: > 	def values(self):
: > 		value_list = []
: > 		for i in self.data.keys():
: > 			value_list.append(loads(self.data[i]))
: > 		return value_list
: > 	def keys(self):
: > 		key_list = []
: > 		for i in self.data.keys():
: > 			key_list.append(loads(i))
: > 		return key_list
:
: _______________________________________________
: Biopython-dev mailing list
: Biopython-dev at biopython.org
: http://biopython.org/mailman/listinfo/biopython-dev
:

--

Iddo Friedberg                                  | Tel: +972-2-6757374
Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg at cc.huji.ac.il
POB 12272, Jerusalem 91120                      |
Israel                                          |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/




More information about the Biopython-dev mailing list