[Biopython] BLAST against mouse genome only

Peter Saffrey pzs at dcs.gla.ac.uk
Fri Jun 19 13:29:23 EDT 2009


Peter wrote:
> Got it, thanks. I've just tied it at work about six times in a row with a few
> variations to the options, and they all worked (taking a few minutes for
> each search). Are you limiting the expectation threshold, or the number
> of alignments/descriptions to return? With the default settings the page
> returned is a BIG file which may explain a network problem... but a 404
> error (page not found) is odd.
>

This code still gives me the 404:

from Bio.Blast import NCBIWWW

seq = 
"GTGACCTCAGGCCAGAGTGGAGTATGAGCGGAAAGGATGAATCCTGTGGCTTCTGCCCTACCCCACGGCCAAGGCTGTGCTACTGATGTGATGACCCACCATCCTGAGCAGTTCAAACCTGCAGGTGTCAGCTGTAAGCTGCAAAAGTGAGCTCTGTCTCCAAATGACCCCTAGTTGTGAGCTGTTGGTGTAACAGTTACAGGCCATCAGAGGCAGTAGCCTAGGGAAGACCTTGGCCACACGACCCCATTCTCAAATCTGGGTCTCCCCCTTGGCGGTGCTGTCAGCGCACAGACCCATGCGCACCTCCCCCAGATCCTTTACCCTGACAATATGTATTATATTTTAATGTATATGTGAAGATATTGAAAATAATTTGTTTTTCCTGGTTTTTGTTCTGTTTTTGTTTGCTGTTAGCATCTATGTGCTGGAATCAAGGAAAGACTTTGTGAGGATAGTATAAATTCTCCTGCAAGGTTGGATTTGTTATCATGTAAATATCCCAACGCAGGCTGCCTTGTGGTTTGGCCGCCTTGTGCTATGTTGATAAGATTGATTTACTGCTTCAGATCACTTTACTTTATCCAATTTTTACTGAACTTTTTATGTAAAAAATAAAATCAATTAAAGAACTTGGAATGTGTGCTCCCTCAAAATTAATCAGGTTTGTTTGTGTTGATGTGAAAGGATGTAGTGGTCCTGGTGTGTGGAGGCTGAGATTAACCTTTCTACTGCAGTTCATTATAAGCTTGGTTCTTGAGCCTGAGCTTACTTGAGCTTACAGTTTAGTCATTCCAGACCAGAGGATGTCTGTCCTGAGACCTCATTGCCACTGGCTTGTTTTAATTTGGCTAGTGGGTCAATCAAGAGAAAATGTCTTCACTCTTGGCTGGAGAATGTCACTGGACCATTTTGCCTTCAGACTTCACTTCTCCCACCCCACAGGAGTGTTCCTTCAGTGTGTGGGGCCTAGCTTCTCACTTTACTTT
ACACTGGGCCTAGACAAGAGAGAAAGCAGCAAAGGACAGGACAGCTGTGGCAGGGGTGCAGGCACCGGCATATGGTAAGAGTGTCTGTGTATTCTAGATGCAGGCTCGGCAGTGGCCTCTTTGGTGATGAGTTTTCAACAGAGAGAAGTTCATGCTAGATTGGGGCCCATGTGTTTCTCAGGAATGTGATTCTGCTTTGCAAAACGAGGCTTGGTTGAGGCCTGACAGAAATTAGAGCGCCTTTTGCCTGTATATTAAGCATTTCAGAGATTGGGGTATGTCCTTACAACTCTTAGAGAAATTGGCACTGTGGGTAAGACTTAAGACCAAGCAAGCTGGGCTGGAGAGATGGCTCAGCGGTTAAGAGCACTGACTGTTCTTCCAGAGGTCCTGAGTTCAGTTCCCAGCAACCACATGGTGGCTCACAACCATCTGTAATGGGATCTGATGCCCTCTCCTGGTGTGTGTCTGAAGACAGCTACAGTGTACATGTACATAAAAAAGTAAAATAAAAGACCAAGCAAACTTCAGTCACTCATTTACAATTCTATATTAGAGGGCAGAGATTCTTTATGGTCATGCATGCTGTGTAGCAAATTTTCCATCACTACCTCTGGGGGCTTGGCTACAAGTGTGTAGATGATCAAGCACCTTAAATAAAACGGCATAGTTCATACCTGTAGTCTACCCGCATGGATCCTGGCTATCTCTGGATTACTTCCAGCCTAATACCATGCCAGTGCCATACAAGGCTAGTTGATCAGCAATACATGAATGTGGACCCTAGACACTATGGACTAATAATCTAGCCTTCTTCACTTTGTAACTTAAATGCACGTTGTTGTAGTAAGTGGACCATAATTCACTCGACCCTTGACAATTTCTAGTTGTGTCTGGTACAGTGAGTTTTCGTGTTTTTCCAAGGGAATGTCAGAGTGGTGACATAGGCGTCAAGTTTTAGAAGAGATTTTGAGACGTTTTACTTTTCTT
GTTCCCCGCCACAAATGTTTTTTACCTTCCCTCCATATGCCTTCCTGTTGGCATGACCTAAGTAGGGACAGTGTGTGCCAGTCTGTTCATGGAAAATGTTATGCTCACCTGCTGACGCAGTCCTTGGTGGCCCAGCAGCTGACTGCTCAAGTGGAGTGTGGGCTTCCCAGTGGGCTGATCTGAGACTTTGCTGGTTTTTTTCTCTTCATCTATGCCTCATACAAAGTAGCGAGCGACTCCTATGAGCATCTCAGTGCAGTGAGGGAGCAGGGTCTACTTGGCCTCCACTTCACCATGATCTTACCTCAGGTCTTCTCAGTGAGTCTGGATGAACTAAAGCCCTTTCATCCATTGCACTGGTCCTTCCTAGAAGGCAGAGCGGGACCCAGCTACCTGCGCCCCCTTGAGGATGGGTGTGTGTGTGGAAGTACAGGTGGCTTGGCTCGACGCCCTGTCATGAACAGCCTGTTTGCCCACTTGTGTTCAAATCACATGCACAGCTGTGGAAGCCTGGGTGGAATTCCTCAGCCTGGGTGGCAGTCTGCTCTTTTTATTTTTTTGTAGCTCTGGAGATTGAACCTAGGACCTTGCGTGTGCTAGACAAGTGCCCTGCCAGTATGCCCAGCCAGAATCCCAGTGTTGGTTTTTTTTTTGTTGTTTTTTTTTTTGGATTTTTCGAGACAGGGTTTCTCTGTGTAGCCCTGGCTGTCCTGGAACTCACTCTGTAGACCAGGCTGACATAGAACTCAGAAATCCTCCTGCCTCTGCCTCCCAAGTGCTGGGATTAAAGGCACGCGCCACCACTGCCCGGCTGAGTCCCAGTGTTGAAACGTCATCTTTTTCTTGTCTAAAGATGACCTAACGTCTTCAACAACTAGCTCACCACAACTACCTTGCATCTTCCCTGTCACAGCACAAGTCACGCAAAGGGTCCTTGGGTGCACCATGGGAACCTTAGGGGTAGAGGACTTACTACATATGCCTCCACTA
AGCAAAGACTGGAGTTCAGGAGGAGACATGACTTGTTAATGTCATCCAAACACTGAAGGGCAGGAGGGTGAGCTCCAGCCTGGCCCTCCACAGCCCATGTACAGAAGCGCCCCCACCTCCTTCCCAAGTCCTTGTCTGGGTCTCTTTCACAGCTACCCAACTGTCTTACAGGTCCAAGGAGCCAAGTAGGTTAGAACAAAACTCCAAAGGTGCCTTTAATATGTGATTCTTAAAAAGAAATAGAAAAAATAACAAGCACATAAAGGGGCAGAACGAGAATCTGTGGGCAAAGCCATGCCCACTCTCTTACCCACCCCCCCATGTCCCTCGCTTCTATCTTGGAGAGGATGGAGAAGGAACATGAAGTGGCCGGATCTTTTGTGTTCTGCTGCCACAACAGCAAGCTGAAGCCAGAGAAGTACTAGGAAGCCCATGAAAGACATGAGGCCAGGGCAGGCAGCCCTGGGAGGCGGCACTCACACCACCGAGGAGCTCTCAGCTGGCGAGCTCAAAACCTGGACCACATCTTCTCGGCCTATGGCAGCCAGAGCATCCTCCAGCACTCTGAGGGTAGCTCTATTGCCTTCTTGGGCAGCCCAGTTCCTTAGCAGGGTATAGGCTGGCATTTGGTCACAGGCCATGGTTTCCACAGCCTCAGCTTGGTAGCCGAGGTGGCCTGCCAGCTCCTGCCAGCCCTTGGCTGGCTCACCCATCATCAGGAGCCGCTGGACTTCCTCCTGCTGCTGCTGTGGAATATGCAGGTAAAGCTGGCAGCCAAGGTCCGGGTGTGGTCCTGGGTGGGGGTGGGGGTGGGGGTGAAGAGAGAAGTGTTAGTGGGTAGGGGAGGCACTAGTTAAATACAAAGGACTACAGACAGACTAGGAAACGTGCTCACCCTGGCTGGGAATACAGGGCTCCAGACTAGGAGGAGAGTCCACGAAGACGTTGCTGTCACCACGCCTCTGGTCCCTGTCAGGGTCCCCTAGCTCT
ACAGTCCGAGCTTTAGCCAACTGTTGCCTTTGCTTATGTGAGCGCCAGCTGTAGGGGGCAGAGGGCACTAAGACAAGGAACTGCCTCAGAGTCCAGGCATGGAGGGGATGCCACAGGACAGGACCCAGACCACCTACCATTTGAAGGCCACATAGGCCAGCAGACCAAGGATCACTGTAGCTAGGAGAGCACAGTAGACAGGAATGATGTTGCTCGAGGCCCCTGGAGGCTCAGGGGGAAATAGGGAGGAGGTATTGGGGGCTAGGGCTCCCCCAGCTCCTACCCACATTCCTTCCCTGTCCCCGTCCTGCCCCTGCAGGGCTGTATCTGTGAAGACAGACAGTGGTCAAGATAGGGAGCCACGGCAGCCTCACCTAGGTAGACCATCTTGGCAGAACTTTCCAAATACATTAGAGTTTACCATGTGTTAAAGGACTACATGGCTGGCCCTGGAGCGGCAAGAATGGCTCAGCAGACAGACACATGCCACCAAGTATGACAACCTGAGTTTGATCCCCCAGGATCCCTGAACCACACGTGCCTGCTAAGTGGTTACTGGGGCTTGTACCGCCTCAGGACAGGAGTTCTGTTCCCAACACCACATTGGATGGATCACAGCCACCTTTAACTCAATCTCCTGGGGGGGTTGATGCCCTCTTACACCCTCTGTGGGCACTCTCACACACAGACGTGCACATAGGTACACGTAAATGGCCCTTTGCTTGCCATTGTGGACAGTCACTCCCAGATGTGCCTTGTCATCTTCCGGAAGCATCCAAACGTCTTGCTATTGCATCTTCTCCTAAACGCACAGCAGGATCTCCTCTGGAAGCCTTCCTTGACCTTTCTCTTTTCACCCTGGTTGCACCACCTCTGCTTACCACAGCACAAGATTGGTCGGCTTCCAGGGTAGACTGTGAGAGCACACTGATGTGTGCTGGTGTCATGGGATCATAAAAATAAAACTTAACTGGAAGTAAATACTGGTGC
TCTCTCACTTCTGCCTTAAGTCCAATGACTGACTAGTCCTTGTACCCAGTGTAGACAGGGTTCAAGGGTCAGGGACTAAAGAGCCCGTGAATGGACTTGTACACGACCCACTCTACCTCCCAATCTGCCCGCTACCTAGGATCTGGGACAAGGAATGCCTACCTGAATAGACCACACCTTTGCTGACGTTATAAAGCATGGTGCTGCCAGGCGCTGCCTGCTGGGCTGTTTTTTGGTGAAGGGGAGTTGTGTAGAGAACTGAGTTGACGGAGCTTGGAGCCTGCTTCTCTTGAATCTCAGAAGGAGGAACTCTGCAGAAATATGAAGAAATCCTCAGATCTGCTGTCAGGAGTCCCTGGGGGCAGCGTGTGTGTCCATCCTGATTCACACCAGGGCTGGAACAGTTTCTCCTGTGTCAAATAGGTGGAGGTAACAACTGGCTCCTGCTAGCCAAAGCTGGTGGCATGGGGTACAGCTCAGGATAGAAATCACCCGATCTCAAAGACCTTCCCAAGTACCTGGGCTGAGTCTGGGGATGTTAAAAGGAGGGTAAAGATACATGGACTGTGACCCTATCTCAGGCTGAAACATCCCTAGGAACTGGTGATCATACACATCTGCCAAAACCCAGCATCCCAAGGTCCCCAAGGCAAGCCCGCTTACACTGCCTAATGCTCCCTCTGTCCCCTCCAGGGCCAGTGGCCAGCCCTTTCCTGGGCAGGGACTAAGTGAACTGATACCTTAATGGTTCAGCAAACTCATAGACCATACAGATCTCAGAGGGCCTGAAATGGAAGAACAGGGCCAACTCAGACCAAGGACCCACCTGGACCCTAGAGTACTAAATCTGCTGAGACCACTCCCTGCAGTCTACGGAGGAGGAGAGGCTCCAGATCTGAAGGAGGAGAGAGATCTGGCTTGGATAAGTAAGAGATTAAAAGAGGCCCTTCAAGTCCCCAACGGCTACTGCTGCATAGCCAATGGCTCTAA
CACGTGGTGTGTCTATAGTAGGGCTATCAGCAGTTCGGTGCTGTAGTCAGGTAAACCTCTGATGTGGGTGGCCCCTTTATGGACTTTGTATCTTTGTGTCGCCACATTGGGAGTTGGGGGCTTCTGGGATCCTTGGTGGGTGGGCCTGTAAACCAGTAGCAGCAGACAGGCCTTGGAATCTGCCCCACCCATCCTGAGCAGCCGGAGTGGAACTTCTTCAGGGCTCCCCACCCATCTCCAAGCCCAAAATGGGAAGAAACAATTCAACAGCCCCTGCAGGCCCATACACACCCCAACACAGACGCTGGTACCCACAGGATCAGAGCACTAAAGGCGGGAGACAGAGAAAGTTCTGGCCCCTTCCACCTAGCAAGAGCCCGGCTAGTCATTCCCTCCTAACCTTCTGCGGCCACCCCTCCGAGGGTGCCAGGATCCACTCAGCTAGGAAGACGACGGGAGTCCCTGGAGAGGGCAGGTTCCGGTCTGCCCAAGAGTGAGCCAAGGCAAGGGGCGGGCCAGTGGGGGGGTGGTGTTGAAGAGGGGAGCAGGACAATGAAGAGGCGGGGCCGAGCTCGAGGGCGCGGTCCCGCCCCCGCCCCACGCCGGAGCACGCAGAAGCACTCGGAGTTCACAGAAGCCGACACCAGCGTGCCTGGCAGAGCAGGCCACTGGCATGCAAATGCCATGCAATGGACCGCGAGAGCTGAGAACCAGGAGTCAGGAAACGTCTGGCAAAGCCAGAGGCGCCTCCGCTGGCTACACCGAGGCCAGCCTGGCCAGGAAGAAGCATGCCAGGCCAGACAGGGTAACAGAGGCTAAGACTGGGGGCCACAGGAGGCCAAGGACGGCGGCACATGTGTACTCAAGAAACCGAAAGATTACAAAACTAGGCCACGTTTATTGCTGAGAATGGGCAGCGATAGTCACCTTTGAGGATTAAGGCCACAGGTGGTCTTTGTGCTTTCACTGGGACGTGGGATTTGAAAGTAG
GGATTCCCTCCCACCCCAGAT"

result_handle = NCBIWWW.qblast("blastn", "gpipe/10090/ref_contig", seq)
with open("ncbitest.xml", "w") as fh:
	fh.write(result_handle.read())


I hadn't realised quite how large that file is (150MB). I should 
probably filter it for the purposes of my code...

> OK, I have checked in the fix for the "\n\n" issue - I'm satisfied that it is
> sensible even if I haven't verified it first hand.
>

Just to let you know, the patch is a little verbose - it reports each 
time it has to wait, which fills up the screen on some of my examples.

> 
> The Biopython qblast function is calling http://blast.ncbi.nlm.nih.gov/Blast.cgi
> internally, but that web interface doesn't allow us to pick these non-standard
> databases, so a fair test (Biopython vs website) on the same URL isn't
> possible. That's a shame.
> 

This page has a URL for the search I want:

http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=10090&db=ref_contig&pgm=mbn&EXPECT=2&DESCRIPTIONS=3&ALIGNMENTS=3

It selects mouse with the taxid and the database as ref_contig to give 
me the reference sequence only. However if I do this:

result_handle = NCBIWWW.qblast("blastn", "ref_contig", seq, 
entrez_query="txid10090[orgn]")

I get the "Results == '\n\n': continuing..." message for several pages. 
It hasn't terminated after about 10 minutes.

Peter


More information about the Biopython mailing list