[Biopython] query upper limit for NCBIWWW.qblast?

Peter Cock p.j.a.cock at googlemail.com
Thu Apr 11 09:43:44 UTC 2013


On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade
<matthiasschade.de at googlemail.com> wrote:
> Hello everyone,
>
> is there an upper limit to how many sequences I can query via NCBIWWW.qblast
> at once?

There are sometimes limits on the URL length, especially if going via
firewalls and proxies, so that may be one factor.

At the NCBI end, I'm not sure what limits they impose on this:
http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

> Sending up to 150 sequences each of 24mer length in a single string
> everything works fine. But now, I have tried the same for a string
> containing about 900 sequences. On good times, it takes the NCBI-server
> about 5min to send an answer. I save the answer and later open and parse the
> file by other functions in my code. However, even though I have queried the
> same 900 sequences, the resulting output-file varies in length (10
> MB<x<20MB) and always at least misses the correct termination-tag in
> "<\BlastOutput>" or even misses more (this does not happen why querying 150
> sequences or less).
>
> I would guess once the server has started sending its answers, there might
> only be a limited time NCBIWWW.qblast waits for follow up packets ... and
> thus depending on the current server-load, the NCBIWWW.qblast-function
> simply decides to terminate waiting for incomming data after some time,
> resulting in my blast-output-files to vary in length. Could anyone correct
> or verify this long-fetched hypothesis?
>
> My core-lines are:
>
> orgn='Mus Musculus' #on anything else
> result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> entrez_query=str(orgn+"[orgn]"))
> save_file = open ('myblast_result.xml',"w")
> save_file.write(result.read())
>
> Best regards,
> Matthias

I think you've reach the scale where it would be better to run blastn
locally - ideally on a cluster if you have access to one. You can
download the whole NT database from here - most departments
running BLAST with their own Linux servers will have a central copy
which is kept automatically up to date:
ftp://ftp.ncbi.nlm.nih.gov/blast/db/

If you don't have those kinds of resources, then you can even
run BLAST on your own Windows machine - although I'm not
sure how much RAM would be recommended for the NT
database which is pretty big.

Regards,

Peter



More information about the Biopython mailing list