[Biopython-dev] NCBI Abuse activity with Biopython
Michiel de Hoon
mjldehoon at yahoo.com
Thu Jun 26 09:41:24 EDT 2008
> The Bio.GenBank.search_for() still seems somewhat
> useful, but without a default limit on the number
> of returned IDs, this could easily be abused.
> Again, we could deprecate this and direct people
> to Bio.Entrez.esearch() instead.
As always, I am in favor of deprecating functions whose purpose is dubious.
F
# Using Bio.GenBank
>>> from Bio import GenBank
>>> gi_list = GenBank.search_for("Opuntia AND rpl16")
>>> gi_list
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']
# Same thing, using Bio.Entrez
>>> from Bio import Entrez
>>> handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['57240072', '57240071', '6273287', '6273291', '6273290', '6273289', '6273286', '6273285', '6273284']
--- On Thu, 6/26/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
From: Peter <biopython at maubp.freeserve.co.uk>
Subject: Re: [Biopython-dev] NCBI Abuse activity with Biopython
To: mjldehoon at yahoo.com
Cc: "Biopython Developers Mailing List" <biopython-dev at biopython.org>
Date: Thursday, June 26, 2008, 8:53 AM
> As far as I can tell, Bio.GenBank is currently the only module in which
> Bio.EUtils is used, not counting modules that themselves have been
> deprecated. It shouldn't be too complicated to modify Bio.GenBank to
use
> Bio.Entrez instead.
Looking back at CVS, it used to use Bio.WWW.NCBI once upon a time
(which is now Bio.Entrez), and had explicit rate limiting. Then four
years ago Brad moved the Bio.GenBank.download_many() and search_for()
functions over to using Bio.EUtils (CVS revision 1.51 of
Bio/GenBank/__init__.py).
Brad also appears to have changed the functionality of
Bio.GenBank.download_many() from a call back mechanism to returning a
handle. We could still return a handle, but it would require fetching
all the records (perhaps in batches), and concatenating them. I think
it would make more sense to deprecate the Bio.GenBank.download_many()
function, and direct people to Bio.Entrez.efetch() instead.
The Bio.GenBank.search_for() still seems somewhat useful, but without
a default limit on the number of returned IDs, this could easily be
abused. Again, we could deprecate this and direct people to
Bio.Entrez.esearch() instead.
Peter
More information about the Biopython-dev
mailing list