[Biopython-dev] Which NCBI / Entrez module?

Peter biopython-dev at maubp.freeserve.co.uk
Mon Aug 13 22:59:42 UTC 2007


I've just been updating the Tutorial to expand the SeqIO documentation 
into a full chapter, and one of the things it now covers is parsing a 
handle to an online databases.

For the SwissProt example I was guided by the existing tutorial code and 
used Bio.WWW.ExPASy.get_sprot_raw() which works fine (but interestingly 
only fetches one record).

I then added an example fetching GenBank records from the NCBI, based on 
the existing tutorial code which uses Bio.GenBank to do some searches 
and retrieve records by their GI number.  I decided to use 
Bio.GenBank.download_many() with Bio.SeqIO.parse() in the new example - 
and this works nicely.

Now, looking over the code, the "online" parts of Bio.GenBank are using 
Bio.EUtils, a complex bit of code dated 2003 by Andrew Dalke.  There is 
another (older and much smaller) module Bio.WWW.NCBI dated 1999-2000 by 
Jeffrey Chang, which also offers an EUtils interface. This does make an 
appearance in the tutorial in the "Connecting with biological databases" 
section.

Bio.WWW.NCBI seems to just build EntreZ URLs, and returns raw data as 
provided by the NCBI.  Bio.EUtils says it also does this, and offers a 
higher level interface supporting history tracking and parsing
of query results (in XML).

Is anyone here very familiar with either of these modules? Should we 
depreciate Bio.WWW.NCBI in favour of Bio.EUtils - or perhaps just update 
its documentation to recommend using that instead?

Peter




More information about the Biopython-dev mailing list