[Biopython-dev] NCBI Abuse activity with Biopython

Michiel de Hoon mjldehoon at yahoo.com
Thu Jun 26 09:41:24 EDT 2008


> The Bio.GenBank.search_for() still seems somewhat
> useful, but without a default limit on the number
> of returned IDs, this could easily be abused.
> Again, we could deprecate this and direct people
> to Bio.Entrez.esearch() instead.
As always, I am in favor of deprecating functions whose purpose is dubious.
F

# Using Bio.GenBank
>>> from Bio import GenBank
>>> gi_list = GenBank.search_for("Opuntia AND rpl16")
>>> gi_list
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']

# Same thing, using Bio.Entrez
>>> from Bio import Entrez
>>> handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']


--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
From: Peter <biopython at maubp.freeserve.co.uk>
Subject: Re: [Biopython-dev] NCBI Abuse activity with Biopython
To: mjldehoon at yahoo.com
Cc: "Biopython Developers Mailing List" <biopython-dev at biopython.org>
Date: Thursday, June 26, 2008, 8:53 AM

> As far as I can tell, Bio.GenBank is currently the only module in which
> Bio.EUtils is used, not counting modules that themselves have been
> deprecated. It shouldn't be too complicated to modify Bio.GenBank to
use
> Bio.Entrez instead.

Looking back at CVS, it used to use Bio.WWW.NCBI once upon a time
(which is now Bio.Entrez), and had explicit rate limiting.  Then four
years ago Brad moved the Bio.GenBank.download_many() and search_for()
functions over to using Bio.EUtils (CVS revision 1.51 of
Bio/GenBank/__init__.py).

Brad also appears to have changed the functionality of
Bio.GenBank.download_many() from a call back mechanism to returning a
handle.  We could still return a handle, but it would require fetching
all the records (perhaps in batches), and concatenating them.  I think
it would make more sense to deprecate the Bio.GenBank.download_many()
function, and direct people to Bio.Entrez.efetch() instead.

The Bio.GenBank.search_for() still seems somewhat useful, but without
a default limit on the number of returned IDs, this could easily be
abused.  Again, we could deprecate this and direct people to
Bio.Entrez.esearch() instead.

Peter


      


More information about the Biopython-dev mailing list