[Biopython-dev] Questions about code contributions
Jeffrey Chang
jchang at jeffchang.com
Fri Feb 21 06:01:55 EST 2003
On Friday, February 21, 2003, at 12:12 AM, Rob Knight wrote:
[introduction cut]
Hi Rob,
> I am currently deciding whether we should use the Biopython code base.
> As
> noted at the root of the API documentation, the existing code needs to
> be
> cleaned up extensively. Also, there seems to have been very little
> activity towards handling phylogeny, mass spec data, and RNA structure,
> which are three areas that are critical to us. On the other hand, there
> are many useful modules in the Biopython code, such as the GA and
> Graphics
> modules and the parsing framework, that we would like to use and
> extend.
Great! Biopython is a volunteer-driven project. Nobody is directly
getting paid to work on it, so the most mature parts are the ones that
are important for someone's day job. You have identified some areas
that are missing and waiting for someone to pick up.
> However, a lot of the code examples in the Biopython Tutorial/Cookbook
> seem not to work or are very brittle due to bugs in the underlying code
> (these may have been fixed in the cvs version -- I haven't had time to
> check).
Yes, we know that the code in the documentation is quite often (like
right now) out of date. There are code snippets in there that have
been broken due to redesign. As the project matures, though, that will
start to happen less, and less work will have to be done to maintain
the docs.
However, I think it is unfair to attribute the brittleness in the
documentation to bugs in the code. Most of the code in the codebase
is surprisingly robust, given the complexity of the problems. Out of
the bug reports that we get, nearly all of them are attributable to 1)
underlying format changes that break our stuff, 2) API changes in the
code base that breaks documentation and scripts. There is nothing we
can do to fix the first problem, and it will only be solved once people
start moving to structured machine-readable representations such as
XML. The second problem is our fault, but we simply don't have the
resources to maintain the documentation. However, since this is an
open source project, we will accept patches that fix the docs!
We are aware that changing the API breaks people's code, and the
documentation, and makes the toolkit harder to learn. However, we do
think about the API changes quite carefully, and seriously think about
the benefits before we do them. Fortunately, the toolkit is still
relatively young, we have been able to make changes without causing too
much unhappiness. However, Biopython is maturing, and the APIs are now
starting to stabilize.
> If we do use Biopython, we would definitely be interested in returning
> all
> our contributions to the community. However, it would only be worth the
> time it would take for us to do this (as opposed to starting fresh) if
> it's possible for us to reorganize the code significantly.
>
> So, the main questions I have are:
>
> 1. What is the process, if any, for suggesting and/or making
> large-scale
> changes that are not compatible with existing code (e.g. changing
> the
> module structure, changing inheritance patterns, introducing and
> using
> new top-level abstract data types)?
Generally, it is to propose on the biopython-dev list for discussion
and then forming consensus if anyone objects. Some changes are more
disruptive than others. For example, adding new top-level data types
are usually not problems.
The most important is actually contributing patches. There is a
surplus of good ideas, and a deficit of resources to implement them.
Having a great idea is a good thing, but it most likely won't get
integrated in the toolkit unless you also contribute the patch.
> How much support would there be for
> doing a significant reorganization for, say, a 2.0 release in 2004?
> From Jeff Chang's message to the list earlier today, I get the
> impression that this is already in progress, at least for the
> parsers.
Parts of the core will not change any more, while other stuff is
currently undergoing significant reorganization.
> 2. To what extent is the current code base compatible with Jython? Is
> there any general interest in using Biopython with Jython? (This is
> important to us, since we have a Java framework for distributing
> tasks across a cluster, and we may also want to integrate with
> the Mesquite phylogeny package later on).
I know of several groups using Biopython with Jython. However, we
haven't been officially supporting that kind of use, so have no
documentation on which parts will or won't work with it. Although we
officially do not have a Jython compatibility requirement, much of the
package does work with it. One caveat is that some modules in
Biopython uses Python 2.2 constructs, which aren't supported in Jython
yet (I think).
One major part of Biopython that is incompatible with Jython is the
parsing framework. A lot of it depends on mxTextTools, which is a
C-library. According to Andrew, though, there used to be a Python
implementation of mxTextTools, but he does not know whether it is still
around or supported.
> 3. How many developers are currently actively working on Biopython? (In
> other words, how much can we expect it will benefit us to
> participate
> in Biopython rather than just writing things on our own?)
I have no idea how many developers are working on Biopython. It would
be nice if people would let us know how they are using it, but that
rarely happens. For example, the Biopython tutorial has been
translated into other languages without us knowing. We are pleased
when stuff like that happens, but we don't always hear about it. There
are other ways to quantify, such as looking at web hits. Now, we get
about 1000 downloads of each release of the toolkit. I don't know how
that translate into actual use, though.
It is hard to say how much participating in Biopython will benefit you.
If you want someone to write your code, I'm fairly confident that's
not going to happen. However, people may be willing to help debug your
code, if it is of enough general interest to the community. For
example, I'm confident that we now have the most robust BLAST parser in
the world (although it might break on the current version of BLAST :).
There are also intangible benefits from participating in the community,
such as getting feedback, generating goodwill for your project, taking
advantage of our infrastructure (distribution, web page, bug database,
etc), and taking advantage of the goodwill we've generated in the
community.
> I'll definitely appreciate any thoughts you have: I'd certainly like to
> contribute to the Biopython project, but need to check that it will
> make
> sense for us to integrate our efforts.
I understand your concerns about not having control over the toolkit
and perhaps not having things organized in ways that are most useful
for your project. I'm not sure I have an easy answer, other than that
we are extremely open to new ideas and new structures, however, we have
been working on this for quite a long time and have a good idea of what
will and won't work.
Thanks for the email. I do hope you continue to post your plans to
this list.
Jeff
More information about the Biopython-dev
mailing list