[Biopython-dev] NCBI Abuse activity with Biopython

Andrew Dalke dalke at dalkescientific.com
Wed Jun 25 21:52:07 EDT 2008


On Jun 26, 2008, at 2:01 AM, Michiel de Hoon wrote:
> Bio.Entrez does use the 3 seconds sleep rule, and the eight E- 
> Utilities functions all make use of the EUtils web address, though  
> it is possible to pass a different web address as one of the  
> arguments. The "query" function, which is not part of the E- 
> Utilities, does use the standard NCBI web address.

What is the proper EUtils web address?

Entrez/__init__.py uses
   cgi='http://www.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi'
while the documentation at
   http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
claims "Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov",
which I think should be             "http://eutils.ncbi.nlm.nih.gov/ 
entrez/eutils/epost.fcgi"

> To avoid such problems in the future, I'd like to propose the  
> following:
> 1) Deprecate Bio.EUtils. Its functionality is covered by  
> Bio.Entrez, which (from release 1.46) will have a parser.

I looked over Bio.Entrez and it handles only a subset of what  
Bio.EUtils does.  For example, it doesn't have any support to help  
track WebEnv as it changes over each request, nor support for  
alternate format types.

I would deprecate Bio.EUtils for another reason - there's no maintainer.

> 2) Remove the 'query' function from Bio.Entrez. Anyway accessing  
> NCBI's web site from Python to get HTML back doesn't make a lot of  
> sense.

Okay, now I'm quite confused.  This is functionality that Bio.EUtils  
supports.


 >>> from Bio.EUtils import HistoryClient
 >>> client = HistoryClient.HistoryClient()
 >>> result = client.search("Michiel de Hoon[AU]")
 >>> print result.efetch("text", "docsum").read()

1:  de Hoon M, Hayashizaki Y.
  Deep cap analysis gene expression (CAGE): genome-wide  
identification of
promoters, quantification of their expression, and network inference.
Biotechniques. 2008 Apr;44(5):627-8, 630, 632. Review.
PMID: 18474037 [PubMed - indexed for MEDLINE]

2:  Sierro N, Makita Y, de Hoon M, Nakai K.
  DBTBS: a database of transcriptional regulation in Bacillus  
subtilis containing
upstream intergenic conservation information.
Nucleic Acids Res. 2008 Jan;36(Database issue):D93-6. Epub 2007 Oct 25.
PMID: 17962296 [PubMed - indexed for MEDLINE]

3:  Makita Y, de Hoon MJ, Danchin A.
  Hon-yaku: a biology-driven Bayesian methodology for identifying  
translation
initiation sites in prokaryotes.
BMC Bioinformatics. 2007 Feb 8;8:47.
PMID: 17286872 [PubMed - indexed for MEDLINE]

4:  de Hoon MJ, Makita Y, Nakai K, Miyano S.
  Prediction of transcriptional terminators in Bacillus subtilis and  
related
species.
PLoS Comput Biol. 2005 Aug;1(3):e25. Epub 2005 Aug 12.
PMID: 16110342 [PubMed - indexed for MEDLINE]

5:  de Hoon MJ, Imoto S, Kobayashi K, Ogasawara N, Miyano S.
  Inferring gene regulatory networks from time-ordered gene  
expression data of
Bacillus subtilis using differential equations.
Pac Symp Biocomput. 2003;:17-28.
PMID: 12603014 [PubMed - indexed for MEDLINE]


(The default returns this in XML format.)


 >>> print result.efetch().read(500)
<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st  
January 2008//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ 
pubmed_080101.dtd">
<PubmedArticleSet>
<PubmedArticle>
     <MedlineCitation Owner="NLM" Status="MEDLINE">
         <PMID>18474037</PMID>
         <DateCreated>
             <Year>2008</Year>
             <Month>05</Month>
             <Day>13</Day>
         </DateCreated>
         <DateCompleted>
             <Year>2008</Year>
             <Month>06</Mont


> 3) Remove the argument for a user-specified web address to make  
> sure that always the E-Utilities address is used.

Yes.

				Andrew
				dalke at dalkescientific.com




More information about the Biopython-dev mailing list