[Biopython-dev] [Biopython-announce] is this supposed to be really slow?
Andrew Dalke
dalke at dalkescientific.com
Sat May 26 10:10:21 UTC 2007
(Move this from the -announce to the -dev list)
Bryan Smith, replying to Titus Brown wrote:
> i did see this constraint for only one request per 3 seconds, but did
> not realize each time i went through my loop that this was a separate
> request.
> is there anything to do about this constraint?
In your "search_for" call add delay=0.
def search_for(search, reldate=None, mindate=None, maxdate=None,
batchsize=100, delay=2, callback_fn=None,
start_id=0, max_ids=None):
"""search_for(search[, reldate][, mindate][, maxdate]
[, batchsize][, delay][, callback_fn][, start_id][, max_ids]) ->
ids
Search PubMed and return a list of the PMID's that match the
criteria. search is the search string used to search the
database. reldate is the number of dates prior to the current
date to restrict the search. mindate and maxdate are the dates to
restrict the search, e.g. 2002/01/01. batchsize specifies the
number of ids to return at one time. By default, it is set to
10000, the maximum. delay is the number of seconds to wait
between queries (default 2). callback_fn is an optional callback
function that will be called as passed a PMID as results are
retrieved. start_id specifies the index of the first id to
retrieve and max_ids specifies the maximum number of id's to
retrieve.
in your Dictionary creation also add delay=0
class Dictionary:
def __init__(self, delay=5.0, parser=None):
"""Dictionary(delay=5.0, parser=None)
Create a new Dictionary to access PubMed. parser is an
optional
parser (e.g. Medline.RecordParser) object to change the results
into another form. If set to None, then the raw contents of
the
file will be returned. delay is the number of seconds to wait
between each query.
>> I personally tend to just use the NCBI retrieval URLs directly, but
>> that's kind of ugly.
NCBI also watches those requests, and if you do too many
you might get a warning or be blocked off, or so rumor has it.
BTW, in your original code you can simplify
> for idx in range( len( termIds ) ):
> pubDates[idx] = string.atoi( medlineDict[ termIds[ idx ]
> ].publication_date[ 0:4 ] )
> idx = idx + 1
to
for idx, termId in enumerate(termIds):
pubDates[idx] = int(medlineDict[termId]].publication_date[:4])
Andrew
dalke at dalkescientific.com
More information about the Biopython-dev
mailing list