[Biopython-dev] Questions about code contributions

Fri Feb 21 06:01:55 EST 2003

On Friday, February 21, 2003, at 12:12 AM, Rob Knight wrote:

[introduction cut]

Hi Rob,

> I am currently deciding whether we should use the Biopython code base. 
> As
> noted at the root of the API documentation, the existing code needs to 
> be
> cleaned up extensively. Also, there seems to have been very little
> activity towards handling phylogeny, mass spec data, and RNA structure,
> which are three areas that are critical to us. On the other hand, there
> are many useful modules in the Biopython code, such as the GA and 
> Graphics
> modules and the parsing framework, that we would like to use and 
> extend.

Great!  Biopython is a volunteer-driven project.  Nobody is directly 
getting paid to work on it, so the most mature parts are the ones that 
are important for someone's day job.  You have identified some areas 
that are missing and waiting for someone to pick up.

> However, a lot of the code examples in the Biopython Tutorial/Cookbook
> seem not to work or are very brittle due to bugs in the underlying code
> (these may have been fixed in the cvs version -- I haven't had time to
> check).

Yes, we know that the code in the documentation is quite often (like 
right now) out of date.  There are code snippets in there that have 
been broken due to redesign.  As the project matures, though, that will 
start to happen less, and less work will have to be done to maintain 
the docs.

However, I think it is unfair to attribute the brittleness in the 
documentation to bugs in the code.   Most of the code in the codebase 
is surprisingly robust, given the complexity of the problems.  Out of 
the bug reports that we get, nearly all of them are attributable to 1) 
underlying format changes that break our stuff, 2) API changes in the 
code base that breaks documentation and scripts.  There is nothing we 
can do to fix the first problem, and it will only be solved once people 
start moving to structured machine-readable representations such as 
XML.  The second problem is our fault, but we simply don't have the 
resources to maintain the documentation.  However, since this is an 
open source project, we will accept patches that fix the docs!

We are aware that changing the API breaks people's code, and the 
documentation, and makes the toolkit harder to learn.  However, we do 
think about the API changes quite carefully, and seriously think about 
the benefits before we do them.  Fortunately, the toolkit is still 
relatively young, we have been able to make changes without causing too 
much unhappiness.  However, Biopython is maturing, and the APIs are now 
starting to stabilize.

> If we do use Biopython, we would definitely be interested in returning 
> all
> our contributions to the community. However, it would only be worth the
> time it would take for us to do this (as opposed to starting fresh) if
> it's possible for us to reorganize the code significantly.
>
> So, the main questions I have are:
>
> 1. What is the process, if any, for suggesting and/or making 
> large-scale
>    changes that are not compatible with existing code (e.g. changing 
> the
>    module structure, changing inheritance patterns, introducing and 
> using
>    new top-level abstract data types)?

Generally, it is to propose on the biopython-dev list for discussion 
and then forming consensus if anyone objects.  Some changes are more 
disruptive than others.  For example, adding new top-level data types 
are usually not problems.

The most important is actually contributing patches.  There is a 
surplus of good ideas, and a deficit of resources to implement them.  
Having a great idea is a good thing, but it most likely won't get 
integrated in the toolkit unless you also contribute the patch.

> How much support would there be for
>    doing a significant reorganization for, say, a 2.0 release in 2004?
>    From Jeff Chang's message to the list earlier today, I get the
>    impression that this is already in progress, at least for the 
> parsers.

Parts of the core will not change any more, while other stuff is 
currently undergoing significant reorganization.

> 2. To what extent is the current code base compatible with Jython? Is
>    there any general interest in using Biopython with Jython? (This is
>    important to us, since we have a Java framework for distributing
>    tasks across a cluster, and we may also want to integrate with
>    the Mesquite phylogeny package later on).

I know of several groups using Biopython with Jython.  However, we 
haven't been officially supporting that kind of use, so have no 
documentation on which parts will or won't work with it.  Although we 
officially do not have a Jython compatibility requirement, much of the 
package does work with it.  One caveat is that some modules in 
Biopython uses Python 2.2 constructs, which aren't supported in Jython 
yet (I think).

One major part of Biopython that is incompatible with Jython is the 
parsing framework.  A lot of it depends on mxTextTools, which is a 
C-library.  According to Andrew, though, there used to be a Python 
implementation of mxTextTools, but he does not know whether it is still 
around or supported.

> 3. How many developers are currently actively working on Biopython? (In
>    other words, how much can we expect it will benefit us to 
> participate
>    in Biopython rather than just writing things on our own?)

I have no idea how many developers are working on Biopython.  It would 
be nice if people would let us know how they are using it, but that 
rarely happens.  For example, the Biopython tutorial has been 
translated into other languages without us knowing.  We are pleased 
when stuff like that happens, but we don't always hear about it.  There 
are other ways to quantify, such as looking at web hits.  Now, we get 
about 1000 downloads of each release of the toolkit.  I don't know how 
that translate into actual use, though.

It is hard to say how much participating in Biopython will benefit you. 
  If you want someone to write your code, I'm fairly confident that's 
not going to happen.  However, people may be willing to help debug your 
code, if it is of enough general interest to the community.  For 
example, I'm confident that we now have the most robust BLAST parser in 
the world (although it might break on the current version of BLAST :).

There are also intangible benefits from participating in the community, 
such as getting feedback, generating goodwill for your project, taking 
advantage of our infrastructure (distribution, web page, bug database, 
etc), and taking advantage of the goodwill we've generated in the 
community.

> I'll definitely appreciate any thoughts you have: I'd certainly like to
> contribute to the Biopython project, but need to check that it will 
> make
> sense for us to integrate our efforts.

I understand your concerns about not having control over the toolkit 
and perhaps not having things organized in ways that are most useful 
for your project.  I'm not sure I have an easy answer, other than that 
we are extremely open to new ideas and new structures, however, we have 
been working on this for quite a long time and have a good idea of what 
will and won't work.

Thanks for the email.  I do hope you continue to post your plans to 
this list.

Jeff