[Biopython-dev] NCBI Abuse activity with Biopython

Peter Cock p.j.a.cock at googlemail.com
Thu Jun 26 10:47:05 UTC 2008


On Thu, Jun 26, 2008 at 2:52 AM, Andrew Dalke <dalke at dalkescientific.com> wrote:
> On Jun 26, 2008, at 2:01 AM, Michiel de Hoon wrote:
>>
>> Bio.Entrez does use the 3 seconds sleep rule, and the eight E-Utilities
>> functions all make use of the EUtils web address, though it is possible to
>> pass a different web address as one of the arguments. The "query" function,
>> which is not part of the E-Utilities, does use the standard NCBI web
>> address.
>
> What is the proper EUtils web address?
>
> Entrez/__init__.py uses
>  cgi='http://www.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi'
> while the documentation at
>  http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
> claims "Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov",
> which I think should be
> "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi"

Yes, for ePost that is correct:
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/epost_help.html

[On a related note, following Andrew's suggestion, I have updated CVS
to use the new base URL in Bio/EUtils/ThinClient.py]

>> To avoid such problems in the future, I'd like to propose the following:
>> 1) Deprecate Bio.EUtils. Its functionality is covered by Bio.Entrez, which
>> (from release 1.46) will have a parser.
>
> I looked over Bio.Entrez and it handles only a subset of what Bio.EUtils
> does.  For example, it doesn't have any support to help track WebEnv as it
> changes over each request, nor support for alternate format types.

No, Bio.Entrez does not support the WebEnv / history interface.  It
can request data in different format types though, although it will
only parse the XML output.

> I would deprecate Bio.EUtils for another reason - there's no maintainer.

This is a strong reason - although we are still using Bio.EUtils in
Bio.GenBank (and probably in other places too).

>> 2) Remove the 'query' function from Bio.Entrez. Anyway accessing NCBI's
>> web site from Python to get HTML back doesn't make a lot of sense.
>
> Okay, now I'm quite confused.  This is functionality that Bio.EUtils
> supports.

I think Michiel meant getting a handle containing raw HTML isn't very
sensible, and this is what the Bio.Entrez.query() function does.  If
it can only return HTML, then I agree, its not very useful and could
be removed.

>> 3) Remove the argument for a user-specified web address to make sure that
>> always the E-Utilities address is used.
>
> Yes.
>

Unlike BLAST where you may have a local webserver, is there any reason
for to use a URL other than the NCBI's one?

Peter



More information about the Biopython-dev mailing list