[Biopython-dev] NCBIWWW.qblast: Question about expected run time and time outs

Mon Jun 22 09:03:16 UTC 2015

Hi Lev,

My usual advice when dealing with any large-scale BLAST
search is to download the NCBI database and use standalone
BLAST+ locally, rather than the NCBI web-service which
can be busy - especially during USA working hours.

Do you have access to a local Linux cluster or similar? It is
very likely there are people in your department/university
already doing this - often the SysAdmin will keep a single
shared copy of the databases up to date for everyone to
use.

(You would likely need to do some post-filtering to remove
any Ciliata hits since the Entrez query option is only available
when running BLAST at the NCBI.)

Peter

On Sun, Jun 21, 2015 at 7:19 PM, Lev Tsypin <ltsypin at uchicago.edu> wrote:
> Hello everyone,
>
> I have been writing a tool that makes use of Biopython for automatic BLAST
> searches--your libraries have made my life so much easier! I really
> appreciate your work. I've recently begun to run into some trouble, though,
> and I am not quite sure how to explain it, or respond to it, so I wanted to
> ask for advice:
>
> The issue is that, of late, when I call the NCBIWWW.qblast function, it
> takes forever--literally never finishing. Before, there were sometimes cases
> that it would get stuck for a long time (up to an hour or so), but it would
> then manage to fight through whatever obstacle and go on. In such cases, I
> also found that if I were to artificially restart the request, the function
> would rouse itself and go much better. Here's an example of a function call:
>
> blastp_result = NCBIWWW.qblast('blastp', 'nr',
> 'MSLSREENIYMGKISEQTERFEDMLEYMKKVVQTGQELSVEERNLLSVAYKNTVGSRRSAWRSISAIQQKEESKGSKHLDLLTNYKKKIETELNLYCEDILRLLNDYLIKNATNAEAQVFFLKMKGDYYRYIAEYAQGDDHKKAADGALDSYNKASEIANSELSTTHPIRLGLALNFSVFHYEVLNDPSKACTLAKTAFDEAIGDIERIQEDQYKDATTIMQLIRDNLTLWTSEFQDDAEEQQE',
> entrez_query = 'NOT Ciliata').read()
>
> [In the protein sequence above I have multiple lines so that it fits in the
> email, but when I normally run the function I don't have any newline
> characters or anything, of course]
>
> My questions are the following: Why does the function sometimes get stuck
> for so long, and what should I do now that it never seems to work anymore?
> Do you have any suggestions for introducing a 'time out' so that if, for
> example, the request takes longer than 10 minutes, it would automatically
> retry? I know there is an optional parameter in the urllib2 library for a
> time out, but, looking at the source code for NCBIWWW.qblast(), it wasn't
> obvious to me whether and how it would work to use it.
>
> Thank you very much for any advice.
>
> Best regards,
> Lev
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev