[Biopython-dev] NCBI Abuse activity with Biopython
Andrew Dalke
dalke at dalkescientific.com
Thu Jun 26 01:52:07 UTC 2008
On Jun 26, 2008, at 2:01 AM, Michiel de Hoon wrote:
> Bio.Entrez does use the 3 seconds sleep rule, and the eight E-
> Utilities functions all make use of the EUtils web address, though
> it is possible to pass a different web address as one of the
> arguments. The "query" function, which is not part of the E-
> Utilities, does use the standard NCBI web address.
What is the proper EUtils web address?
Entrez/__init__.py uses
cgi='http://www.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi'
while the documentation at
http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
claims "Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov",
which I think should be "http://eutils.ncbi.nlm.nih.gov/
entrez/eutils/epost.fcgi"
> To avoid such problems in the future, I'd like to propose the
> following:
> 1) Deprecate Bio.EUtils. Its functionality is covered by
> Bio.Entrez, which (from release 1.46) will have a parser.
I looked over Bio.Entrez and it handles only a subset of what
Bio.EUtils does. For example, it doesn't have any support to help
track WebEnv as it changes over each request, nor support for
alternate format types.
I would deprecate Bio.EUtils for another reason - there's no maintainer.
> 2) Remove the 'query' function from Bio.Entrez. Anyway accessing
> NCBI's web site from Python to get HTML back doesn't make a lot of
> sense.
Okay, now I'm quite confused. This is functionality that Bio.EUtils
supports.
>>> from Bio.EUtils import HistoryClient
>>> client = HistoryClient.HistoryClient()
>>> result = client.search("Michiel de Hoon[AU]")
>>> print result.efetch("text", "docsum").read()
1: de Hoon M, Hayashizaki Y.
Deep cap analysis gene expression (CAGE): genome-wide
identification of
promoters, quantification of their expression, and network inference.
Biotechniques. 2008 Apr;44(5):627-8, 630, 632. Review.
PMID: 18474037 [PubMed - indexed for MEDLINE]
2: Sierro N, Makita Y, de Hoon M, Nakai K.
DBTBS: a database of transcriptional regulation in Bacillus
subtilis containing
upstream intergenic conservation information.
Nucleic Acids Res. 2008 Jan;36(Database issue):D93-6. Epub 2007 Oct 25.
PMID: 17962296 [PubMed - indexed for MEDLINE]
3: Makita Y, de Hoon MJ, Danchin A.
Hon-yaku: a biology-driven Bayesian methodology for identifying
translation
initiation sites in prokaryotes.
BMC Bioinformatics. 2007 Feb 8;8:47.
PMID: 17286872 [PubMed - indexed for MEDLINE]
4: de Hoon MJ, Makita Y, Nakai K, Miyano S.
Prediction of transcriptional terminators in Bacillus subtilis and
related
species.
PLoS Comput Biol. 2005 Aug;1(3):e25. Epub 2005 Aug 12.
PMID: 16110342 [PubMed - indexed for MEDLINE]
5: de Hoon MJ, Imoto S, Kobayashi K, Ogasawara N, Miyano S.
Inferring gene regulatory networks from time-ordered gene
expression data of
Bacillus subtilis using differential equations.
Pac Symp Biocomput. 2003;:17-28.
PMID: 12603014 [PubMed - indexed for MEDLINE]
(The default returns this in XML format.)
>>> print result.efetch().read(500)
<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st
January 2008//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/
pubmed_080101.dtd">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID>18474037</PMID>
<DateCreated>
<Year>2008</Year>
<Month>05</Month>
<Day>13</Day>
</DateCreated>
<DateCompleted>
<Year>2008</Year>
<Month>06</Mont
> 3) Remove the argument for a user-specified web address to make
> sure that always the E-Utilities address is used.
Yes.
Andrew
dalke at dalkescientific.com
More information about the Biopython-dev
mailing list