[Biopython-dev] Bio.WWW; database access

Michiel De Hoon mdehoon at c2b2.columbia.edu
Tue Nov 13 01:52:54 EST 2007


Hi everybody,

Recently Eric Gibert wrote to us about Bio.GenBank.NCBIDictionary failing
with Biopython 1.44. We already have a fix for this error in CVS, but when I
was looking into this bug in more detail I started wondering about the way
database access is organized in Biopython.

Currently, code to access NCBI Entrez exists in three places:
1) Bio.WWW.NCBI
2) Bio.GenBank (in NCBIDictionary)
3) Bio.EUtils

Bio.WWW contains three more submodules for database access:
1) Bio.WWW.ExPASy, to access Swissprot, Prodoc, Prosite
2) Bio.WWW.InterPro, to access InterPro
3) Bio.WWW.SCOP, to access the SCOP database

The parsers for these modules are in a different location:
1a) Bio.SwissProt
1b) Bio.Prosite
1c) Bio.Prosite.Prodoc
2) Bio.InterPro
3) Bio.SCOP

To me, it seems odd that the code for database access and the code to parse
files downloaded from the database are in different locations. For example,
when I was working on Bio.GenBank, it did not occur to me that such code
might already exist in Bio.WWW.

Now, Bio.WWW.SCOP is a very small module (64 lines total), and
Bio.WWW.InterPro seems to be out of date. With Bio.WWW.NCBI containing
functionality that also exists elsewhere in Biopython, having a separate
Bio.WWW module doesn't seem to be optimal in terms of code organization. I'd
prefer to have the code for database access together with the respective code
for parsing. 

Any opinions?

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032




More information about the Biopython-dev mailing list