[Biopython-dev] Accessing built-in data files

Tiago Antão tiagoantao at gmail.com
Mon Nov 19 15:00:38 UTC 2007


Hi,

The following mail documents my effort in understanding what needs to
be done in order to include data files with biopython. This serves to
purposes:
1. For review by others
2. For future reference, to anyone that needs to package data with the
distribution.

The problem can be split into 2 parts

1. Including the data with the distribution
2. Accessing the data that was packaged
The second is more complicated than the first.


1. Including the data with the distribution
Biopython uses distutils, a standard Python package mechanism. To
include data files, setup.py needs to be changed.
I recommend reading chapter 2 of the distutils manual
http://docs.python.org/dist/dist.html . Especially "Installing Package
Data" and "Installing Additional Files". I am using the "Package Data"
part.
In this scenario, the data files are placed alongside with the code in
the code directory (or, best, in a data subdirectory below).

To make this work setup.py needs to be changed. The change seems to be
trivial: Go to DATA_FILES and include your files. For SimCoal I did
this:
DATA_FILES=[
    #.... More stuff will be here for other packages
    "Bio/PopGen/SimCoal/data/*par" #my data files are *par
    ]
Thats it! Of course, as usual, the package should have been added (but
that is true to all packages)


2. Accessing the data that was packaged
This one is trickier. The first problem is that the data files might
be installed in either the platform dependent install directory or in
the platform independent one. With biopython this is actually a
non-problem, as data seems to go to the platform independent
directory. Which I suppose is the correct thing to do.
Now, the typical strategy is to get the sys.prefix to get the
directory where the platform independent files are installed (really
bad would be to get sys.exec_prefix which gets the platform dependent
path) and to append the package specific part. The problem is that, in
some cases (e.g. typical development cycle or when one doesn't have
admin install privileges on the computer), the installation is done to
another directory other than the system-wide default. My work around?
Go through all the directories on sys.path (PYTHONPATH) and search for
a data file that I know is there (because we know the path relative to
root of the installation, as on DATA_FILES above). When that file is
found, then the directory where all stuff is, is found. Not perfect,
but works for now.
Here is possible code to go to, e.g., __init__.py, of a package (based
on my SimCoal example):
---CODE STARTS---
from os import sep
from sys import path

for entry in path:
    try:
        #Searching for a single case, to determine correct directory
        try_path = sep.join([entry, 'Bio', 'PopGen', 'SimCoal', 'data',
            'island.par'])
        f = open(try_path)
        builtin_template_path = try_path
        del try_path
        f.close()
        break
    except:
        pass #Correct behavior, searching for entry
---CODE ENDS---

One could raise an error if not found.

Comments?

Regards,
Tiago

-- 
http://www.tiago.org/ps



More information about the Biopython-dev mailing list