[Biopython] query upper limit for NCBIWWW.qblast?

Justin Gibbons jgibbons1 at mail.usf.edu
Thu Apr 11 18:10:32 UTC 2013


NCBI Standalone Blast gives you the option of querying the website so that
you don't have to maintain a local database.

Justin Gibbons

P.S. Yes Peter, I did respond to the wrong email. Hopefully, I got it
correct this time.


On Thu, Apr 11, 2013 at 5:43 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade
> <matthiasschade.de at googlemail.com> wrote:
> > Hello everyone,
> >
> > is there an upper limit to how many sequences I can query via
> NCBIWWW.qblast
> > at once?
>
> There are sometimes limits on the URL length, especially if going via
> firewalls and proxies, so that may be one factor.
>
> At the NCBI end, I'm not sure what limits they impose on this:
> http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html
>
> > Sending up to 150 sequences each of 24mer length in a single string
> > everything works fine. But now, I have tried the same for a string
> > containing about 900 sequences. On good times, it takes the NCBI-server
> > about 5min to send an answer. I save the answer and later open and parse
> the
> > file by other functions in my code. However, even though I have queried
> the
> > same 900 sequences, the resulting output-file varies in length (10
> > MB<x<20MB) and always at least misses the correct termination-tag in
> > "<\BlastOutput>" or even misses more (this does not happen why querying
> 150
> > sequences or less).
> >
> > I would guess once the server has started sending its answers, there
> might
> > only be a limited time NCBIWWW.qblast waits for follow up packets ... and
> > thus depending on the current server-load, the NCBIWWW.qblast-function
> > simply decides to terminate waiting for incomming data after some time,
> > resulting in my blast-output-files to vary in length. Could anyone
> correct
> > or verify this long-fetched hypothesis?
> >
> > My core-lines are:
> >
> > orgn='Mus Musculus' #on anything else
> > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> > entrez_query=str(orgn+"[orgn]"))
> > save_file = open ('myblast_result.xml',"w")
> > save_file.write(result.read())
> >
> > Best regards,
> > Matthias
>
> I think you've reach the scale where it would be better to run blastn
> locally - ideally on a cluster if you have access to one. You can
> download the whole NT database from here - most departments
> running BLAST with their own Linux servers will have a central copy
> which is kept automatically up to date:
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/
>
> If you don't have those kinds of resources, then you can even
> run BLAST on your own Windows machine - although I'm not
> sure how much RAM would be recommended for the NT
> database which is pretty big.
>
> Regards,
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list