[Biopython-dev] Bio.WWW.NCBI proposal

Michiel de Hoon mjldehoon at yahoo.com
Fri Feb 8 16:06:11 UTC 2008

Hi everybody,

Currently, there are two ways in Biopython to get access to NCBI's Entrez databases (Bio.WWW.NCBI and Bio.EUtils). Bio.PubMed builds on Bio.WWW.NCBI, and Bio.GenBank builds Bio.EUtils. Clearly, having two modules for the same thing is not optimal.

>From looking at these two modules, I think that Bio.WWW.NCBI is more suitable as Biopython's module to interact with NCBI. It is much smaller and very straightforward, and therefore much easier to maintain, and it has some documentation (though not quite enough). Bio.EUtils is quite large, and is difficult to maintain since none of the current active developers are familiar with it.

Bio.WWW.NCBI has two problems though: It is not quite up to date (some functions are missing, and other functions are for databases that have already been deprecated a while ago), and it is the only remaining module inside Bio.WWW.

Concretely, I'd like to propose to following:
1) Move Bio.WWW.NCBI to Bio.Entrez (actually, copy and deprecate Bio.WWW.NCBI).
2) Make it Biopython's general module for interacting with NCBI Entrez by adding any missing functions from the list at http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
(this will be very straightforward; EInfo, ESummary, EGQuery, and ESpell are currently missing), and removing any obsolete functions.
3) Update the tutorial accordingly.
4) Use Bio.Entrez in Bio.GenBank.NCBIDictionary to fix bug #2393.

At that point, I think we have an error-free Biopython again (alas only in the sense that no errors or warnings appear when running the test suite), so we'd be ready for a new release.

I don't want to deprecate Bio.EUtils right now, since it also contains some functionality other than database access (e.g. parsing the database output from NCBI; we can those issues about that after the next release).

Any comments or objections?


Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

More information about the Biopython-dev mailing list