[Biopython-dev] Questions about code contributions

Tue Feb 25 14:47:23 EST 2003

On Friday, February 21, 2003, at 10:06 AM, Rob Knight wrote:

> [Jeff]
>> Parts of the core will not change any more, while other stuff is
>> currently undergoing significant reorganization.
>
> Which parts are now fixed, and which parts are in flux? I have read the
> last couple of months' posts on the mailing list, but it would be 
> great to
> get the current status in one place.

The core right now consists of the database access (Bio.db, Bio.config, 
Bio.dbdefs), parsing frameworks, and sequence objects.  Those are 
nearing completion.  There is still work going on on the code, but I 
don't expect that there will be any more major structural changes.

Code that may use the core framework, but doesn't yet (e.g. Bio.PubMed, 
Bio.GenBank, etc) will get rewritten to work with it.  Also, code that 
accesses NCBI databases will be rewritten to work with EUtils.

There are also many individual contributions (e.g. Bio.PDB) whose 
authors know their status better than me.

>> I know of several groups using Biopython with Jython.
>
> Are any of them active on this list? I'd definitely be interested in
> hearing any experiences people have had with this.

The list is a good place to start.  If you don't mind me passing your 
email on, I can see if one of my colleagues with some experience with 
this can help.

> Is there anything major besides the parsing framework that depends on C
> libraries? How difficult would it be to translate into Java the parts 
> of
> mxTextTools that Martel requires?

I suspect that it might be a lot of work.  However, the source code is 
available, which would help.  The best would be to talk to Marc-Andre 
Lemburg.  He may know of people who have pure python or java 
implementations.

> Page 32 of the Tutorial describes how to set up an NCBIDictionary with 
> the
> default settings (for nucleotide sequences). We were trying to get some
> protein sequences. When we passed in peptide accession numbers, the 
> error
> message indicated that the most likely problem was that the accession
> numbers were not in the database. However, they were present when we
> looked them up manually through NCBI's web site.

[discussion of bug in Bio.Genbank.NCBIDictionary, Tutorial]

Yep, you're right.  It is broken.  This code will be reworked to use 
EUtils, which is more robust and has features such as better error 
checking and diagnosis.

> Our experiences trying to follow and modify the recipes in the Tutorial
> suggest that this kind of thing is fairly common. We will file bug 
> reports
> if time permits, but it does take significant effort to write them up 
> and
> verify that the patches work (especially given the state of the tests).

Yes.

> Also, we were surprised to find all this code lurking in __init__.py in
> the first place. Is there a specific motivation for this design 
> decision?

Yes, this was discussed on the mailing list (one year ago?).  For many 
modules, this reduces the level of nesting, changing:
     from Bio.Genbank import Genbank
to:
     from Bio import Genbank

I believe that the consensus was that the second is cleaner and less 
confusing.  One disadvantage of putting a bunch of code in __init__.py 
is that many people don't look for code there.  However, that doesn't 
usually cause problems for people more than once!

> I think that the proposal of breaking up the Tutorial into sections 
> might
> help a lot with this sort of thing. It might also help to make the
> specific examples in the Tutorial into unit tests that can be 
> conveniently
> run when the code is updated so that it's easy to see what breaks...

That's a great idea!  Hey Brad, any thoughts?

> Anyway, I will keep you posted as our specific plans mature.

Great, thanks!

Jeff