[Biopython-dev] Re: Contribute to BioPython

Iddo idoerg at burnham.org
Fri Nov 5 12:50:42 EST 2004


I'm cc-ing your email to the lists, I think it's important other 
Biopythoneers comment on this.

James Stroud wrote:

>Hi Iddo,
>Thanks for your tips. I looked at Jeff Chang's cpairwise2module.c and its 
>associated pure python implementation pairwise2.py. They look like nothing 
>too special was done for the python module, except that the code is pretty 
>clean. The C-code was barren of doc-style comments, but I think that is 
>because he also had the python implementation to document the API. I'm mostly 
>looking for how I should mark up my C code (namely the API code for python) 
>to produce automated documentation. I'm not yet ready to make a pure python 
>implementation, but I am happy to contribute my C-based python module to 
>biopython once its a little more complete. I'm thinking of using doxygen to 
>mark the C, but first I wanted to get an idea of what would be best for the C 
>code to communicate with python.

Probably the best way would be to learn how to extend Python with C. 
It's a pretty standard procedure, although there are several ways of 
going about it. I use SWIG, which has the advantage of writing the 
Python extension code and the C wrapper for you. I never quite got the 
hang of how to do it with distutils (see JC's code for that), but I 
confess I am lazy in that department. However, distutils has the 
advantage that code is easier to install from source, as the users do 
not need to install SWIG on their machine, on top of everything else.

Bottom line: learning curve on SWIG is easier, but distutils is probably 
the way to go because it is "pure Python".



>Incidentally, the module I'm making is for scanning sequence information and 
>looking for islands of things, such as high or low complexity, CpG, etc. The 
>user will be able to pass arrays of values (as python lists) to the C code 
>and islands can be found from the data given certain cutoff and window 
>parameters and such. If the user takes one step back, he or she can just tell 
>the module to look for certain types of islands. So far, I have the code for 
>sequence complexity written as well as the data structures and functions that 
>handle islands themselves (separate from any particular type of island). Soon 
>I will do Q/N islands (for protein), and in the future I will add to that 
>(probably CpG, at least). Because the island-handling code is separate from 
>any type of island, it will allow (once the API is written for this part) the 
>user to write python to generate values for properties they are interested in 
>and pass it to the island API to build the islands. Currently I have a 
>nascent API to several of the more higher-level C functions.

Wow! Biopython is entering epigenomics. This sounds like a very hefty 
contribution. I can't wait to see it.
(CpG islands are methylation sites.. what are Q/N islands? Some sort of 
glutamine repeat variation?)

>Anyway, I'm looking forward to any thoughts on how I might best document this 
>for biopython.

Look at the Python tutorial/cookbook. Then write your own chapter! This 
would be for the users of your modules.  Latex source is best, but if 
you don't do Latex, just send over a text file.

As for the code documentation: the more the merrier. Try putting 
cookbook-like documentation in the docstrings (those are the bits that 
start & end with """), as they are automatically placed in the __doc__ 
attribute of the classes/modules you write. This means they are easy to 
harvest by third party programs which write up automatic documentation, 
such as HappyDoc. Then, sprinkle your documentation liberally in the 
code, you can never overdocument! (Well theoretically you can, but no 
I've never seen it).

Thanks again for your efforts.



Iddo Friedberg, Ph.D.
The Burnham Institute
10901 North Torrey Pines Road
La Jolla, CA 92037 USA
T: (858) 646 3100 x3516
F: (858) 713 9930

More information about the Biopython-dev mailing list