[Biopython] BLAST against mouse genome only

Peter biopython at maubp.freeserve.co.uk
Fri Jun 19 17:36:51 UTC 2009


On Fri, Jun 19, 2009 at 6:29 PM, Peter Saffrey<pzs at dcs.gla.ac.uk> wrote:
> Peter wrote:
>>
>> Got it, thanks. I've just tied it at work about six times in a row with a
>> few variations to the options, and they all worked (taking a few minutes
>> for each search). Are you limiting the expectation threshold, or the number
>> of alignments/descriptions to return? With the default settings the page
>> returned is a BIG file which may explain a network problem... but a 404
>> error (page not found) is odd.
>
> This code still gives me the 404:
>
> from Bio.Blast import NCBIWWW
>
> seq = "GTG...CAGAT"
>
> result_handle = NCBIWWW.qblast("blastn", "gpipe/10090/ref_contig", seq)
> with open("ncbitest.xml", "w") as fh:
>        fh.write(result_handle.read())
>
> I hadn't realised quite how large that file is (150MB). I should probably
> filter it for the purposes of my code...

I confess I didn't measure it - I just noticed it was big. And yes, it
would make sense to put as many filters on the search as possible
to reduce the output size.

>> OK, I have checked in the fix for the "\n\n" issue - I'm satisfied that
>> it is sensible even if I haven't verified it first hand.
>>
>
> Just to let you know, the patch is a little verbose - it reports each time
> it has to wait, which fills up the screen on some of my examples.

Don't worry - I left out the diagnostic print statements ;)

>> The Biopython qblast function is calling
>> http://blast.ncbi.nlm.nih.gov/Blast.cgi
>> internally, but that web interface doesn't allow us to pick these
>> non-standard databases, so a fair test (Biopython vs website)
>> on the same URL isn't possible. That's a shame.
>
> This page has a URL for the search I want:
>
> http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=10090&db=ref_contig&pgm=mbn&EXPECT=2&DESCRIPTIONS=3&ALIGNMENTS=3
>
> It selects mouse with the taxid and the database as ref_contig to give me
> the reference sequence only. However if I do this:
>
> result_handle = NCBIWWW.qblast("blastn", "ref_contig", seq,
> entrez_query="txid10090[orgn]")
>
> I get the "Results == '\n\n': continuing..." message for several pages. It
> hasn't terminated after about 10 minutes.

Setting the expectation limits etc in Biopython will help, but if you
are still consistently finding your BLAST jobs are too big to run
over the internet (or your network/ISP), you'll probably have to
install standalone BLAST instead. I'm not sure if these databases
are available pre-built or not though...

Peter




More information about the Biopython mailing list