[Biopython-dev] Oligopython page

Tue Apr 6 18:00:24 EDT 2004

Hey Harry and Michiel;

[Harry announces OligoPython]
> >For the time being, anyone interested can get the same pages from
> >oligopython.dyndns.org.

Thanks for this -- very nice to see people working on microarray
code and we don't have anything like this in Biopython so it is very
welcome.

Michiel:
> Thanks for writing oligopython! I had a look at your package to see if I 
> could write a setup.py for it, and I noticed that the file parser makes use 
> of C++ rather than C. If I'm not mistaken, the only C++ code currently in 
> Biopython is Bio.KDTree, which is not installed by default because of 
> problems building it on some platforms. Is there some Biopython policy on 
> C++ code?

If I remember properly, the reason KDTree is not installed by
default isn't because it's C++ but rather because it used the C++
standard library (stdc++) which caused building problems on some
systems without development libraries.

I think either C++ or C is fine -- basically our only requirement is
that it builds on multiple platforms. Sadly, sometimes figuring that
out requires including it and seeing how many people complain :-).

A simple setup.py that will work for this code is:

from distutils.core import setup
from distutils.extension import Extension

setup(
  name = "OligoPython",
  version = "0.1",
  packages = ["Affymetrix"],
  ext_modules = [Extension("Affymetrix._cel",
                           ["Affymetrix/celmodule.cc"])]
)

This assumes that you put the code into a module directory called
Affymetrix. The other change that is necessary is that the includes
do not need to be relative to the python directory, so celmodule.cc
just needs to do:

#include "Python.h"
#include "Numeric/arrayobject.h"

After this it seems to build and install fine.

I'd be happy to include this in Biopython if you are willing, but do 
have a few suggestions for the code:

1. I'd prefer it to be named something like Bio.Affymetrix rather
then something more generic like Bio.Oligo -- since that would
reflect it's purpose and use a little better. I'm not sure exactly
what your development goals are with this, but if the main goal now
is paring cel files and manipulating them this makes some sense.
Michiel may also have some input here (which is probably more useful
then mine).

2. The current Cel class integrates both parsing and storing the
resulting data in the same class. To be more consistent with
Biopython, I think it would be nice to separate out the work into
two classes something like:

CelParser -- has the parse function and returns a CelRecord object
CelRecord -- contains the parsed data (all of the _pixels, _stdev,
_npix, _nrows, _ncols attributes) and the functions which return
them.

Other then this, things look good -- let me know how you want to
proceed forward on this and maybe coordinate with Michiel if he has
plans for dealing with microarray data and integrating this with
Cluster code and would like to be involved.

Thanks again for the work!
Brad