[Biopython-dev] [Biopython-announce] is this supposed to be really slow?

Andrew Dalke dalke at dalkescientific.com
Sat May 26 10:10:21 UTC 2007


(Move this from the -announce to the -dev list)


Bryan Smith, replying to Titus Brown wrote:
> i did see this constraint for only one request per 3 seconds, but did
> not realize each time i went through my loop that this was a separate
> request.

> is there anything to do about this constraint?

In your "search_for" call add delay=0.

def search_for(search, reldate=None, mindate=None, maxdate=None,
                batchsize=100, delay=2, callback_fn=None,
                start_id=0, max_ids=None):
     """search_for(search[, reldate][, mindate][, maxdate]
     [, batchsize][, delay][, callback_fn][, start_id][, max_ids]) ->  
ids

     Search PubMed and return a list of the PMID's that match the
     criteria.  search is the search string used to search the
     database.  reldate is the number of dates prior to the current
     date to restrict the search.  mindate and maxdate are the dates to
     restrict the search, e.g. 2002/01/01.  batchsize specifies the
     number of ids to return at one time.  By default, it is set to
     10000, the maximum.  delay is the number of seconds to wait
     between queries (default 2).  callback_fn is an optional callback
     function that will be called as passed a PMID as results are
     retrieved.  start_id specifies the index of the first id to
     retrieve and max_ids specifies the maximum number of id's to
     retrieve.



in your Dictionary creation also add delay=0


class Dictionary:
     def __init__(self, delay=5.0, parser=None):
         """Dictionary(delay=5.0, parser=None)
         Create a new Dictionary to access PubMed.  parser is an  
optional
         parser (e.g. Medline.RecordParser) object to change the results
         into another form.  If set to None, then the raw contents of  
the
         file will be returned.  delay is the number of seconds to wait
         between each query.



>> I personally tend to just use the NCBI retrieval URLs directly, but
>> that's kind of ugly.

NCBI also watches those requests, and if you do too many
you might get a warning or be blocked off, or so rumor has it.


BTW, in your original code you can simplify

> for idx in range( len( termIds ) ):
>     pubDates[idx] = string.atoi( medlineDict[ termIds[ idx ]
> ].publication_date[ 0:4 ] )
>     idx = idx + 1

to

for idx, termId in enumerate(termIds):
     pubDates[idx] = int(medlineDict[termId]].publication_date[:4])


				Andrew
				dalke at dalkescientific.com





More information about the Biopython-dev mailing list