[Biopython] BLAST against mouse genome only
Peter Saffrey
pzs at dcs.gla.ac.uk
Fri Jun 19 13:29:23 EDT 2009
Peter wrote:
> Got it, thanks. I've just tied it at work about six times in a row with a few
> variations to the options, and they all worked (taking a few minutes for
> each search). Are you limiting the expectation threshold, or the number
> of alignments/descriptions to return? With the default settings the page
> returned is a BIG file which may explain a network problem... but a 404
> error (page not found) is odd.
>
This code still gives me the 404:
from Bio.Blast import NCBIWWW
seq =
"GTGACCTCAGGCCAGAGTGGAGTATGAGCGGAAAGGATGAATCCTGTGGCTTCTGCCCTACCCCACGGCCAAGGCTGTGCTACTGATGTGATGACCCACCATCCTGAGCAGTTCAAACCTGCAGGTGTCAGCTGTAAGCTGCAAAAGTGAGCTCTGTCTCCAAATGACCCCTAGTTGTGAGCTGTTGGTGTAACAGTTACAGGCCATCAGAGGCAGTAGCCTAGGGAAGACCTTGGCCACACGACCCCATTCTCAAATCTGGGTCTCCCCCTTGGCGGTGCTGTCAGCGCACAGACCCATGCGCACCTCCCCCAGATCCTTTACCCTGACAATATGTATTATATTTTAATGTATATGTGAAGATATTGAAAATAATTTGTTTTTCCTGGTTTTTGTTCTGTTTTTGTTTGCTGTTAGCATCTATGTGCTGGAATCAAGGAAAGACTTTGTGAGGATAGTATAAATTCTCCTGCAAGGTTGGATTTGTTATCATGTAAATATCCCAACGCAGGCTGCCTTGTGGTTTGGCCGCCTTGTGCTATGTTGATAAGATTGATTTACTGCTTCAGATCACTTTACTTTATCCAATTTTTACTGAACTTTTTATGTAAAAAATAAAATCAATTAAAGAACTTGGAATGTGTGCTCCCTCAAAATTAATCAGGTTTGTTTGTGTTGATGTGAAAGGATGTAGTGGTCCTGGTGTGTGGAGGCTGAGATTAACCTTTCTACTGCAGTTCATTATAAGCTTGGTTCTTGAGCCTGAGCTTACTTGAGCTTACAGTTTAGTCATTCCAGACCAGAGGATGTCTGTCCTGAGACCTCATTGCCACTGGCTTGTTTTAATTTGGCTAGTGGGTCAATCAAGAGAAAATGTCTTCACTCTTGGCTGGAGAATGTCACTGGACCATTTTGCCTTCAGACTTCACTTCTCCCACCCCACAGGAGTGTTCCTTCAGTGTGTGGGGCCTAGCTTCTCACTTTACTTT
ACACTGGGCCTAGACAAGAGAGAAAGCAGCAAAGGACAGGACAGCTGTGGCAGGGGTGCAGGCACCGGCATATGGTAAGAGTGTCTGTGTATTCTAGATGCAGGCTCGGCAGTGGCCTCTTTGGTGATGAGTTTTCAACAGAGAGAAGTTCATGCTAGATTGGGGCCCATGTGTTTCTCAGGAATGTGATTCTGCTTTGCAAAACGAGGCTTGGTTGAGGCCTGACAGAAATTAGAGCGCCTTTTGCCTGTATATTAAGCATTTCAGAGATTGGGGTATGTCCTTACAACTCTTAGAGAAATTGGCACTGTGGGTAAGACTTAAGACCAAGCAAGCTGGGCTGGAGAGATGGCTCAGCGGTTAAGAGCACTGACTGTTCTTCCAGAGGTCCTGAGTTCAGTTCCCAGCAACCACATGGTGGCTCACAACCATCTGTAATGGGATCTGATGCCCTCTCCTGGTGTGTGTCTGAAGACAGCTACAGTGTACATGTACATAAAAAAGTAAAATAAAAGACCAAGCAAACTTCAGTCACTCATTTACAATTCTATATTAGAGGGCAGAGATTCTTTATGGTCATGCATGCTGTGTAGCAAATTTTCCATCACTACCTCTGGGGGCTTGGCTACAAGTGTGTAGATGATCAAGCACCTTAAATAAAACGGCATAGTTCATACCTGTAGTCTACCCGCATGGATCCTGGCTATCTCTGGATTACTTCCAGCCTAATACCATGCCAGTGCCATACAAGGCTAGTTGATCAGCAATACATGAATGTGGACCCTAGACACTATGGACTAATAATCTAGCCTTCTTCACTTTGTAACTTAAATGCACGTTGTTGTAGTAAGTGGACCATAATTCACTCGACCCTTGACAATTTCTAGTTGTGTCTGGTACAGTGAGTTTTCGTGTTTTTCCAAGGGAATGTCAGAGTGGTGACATAGGCGTCAAGTTTTAGAAGAGATTTTGAGACGTTTTACTTTTCTT
GTTCCCCGCCACAAATGTTTTTTACCTTCCCTCCATATGCCTTCCTGTTGGCATGACCTAAGTAGGGACAGTGTGTGCCAGTCTGTTCATGGAAAATGTTATGCTCACCTGCTGACGCAGTCCTTGGTGGCCCAGCAGCTGACTGCTCAAGTGGAGTGTGGGCTTCCCAGTGGGCTGATCTGAGACTTTGCTGGTTTTTTTCTCTTCATCTATGCCTCATACAAAGTAGCGAGCGACTCCTATGAGCATCTCAGTGCAGTGAGGGAGCAGGGTCTACTTGGCCTCCACTTCACCATGATCTTACCTCAGGTCTTCTCAGTGAGTCTGGATGAACTAAAGCCCTTTCATCCATTGCACTGGTCCTTCCTAGAAGGCAGAGCGGGACCCAGCTACCTGCGCCCCCTTGAGGATGGGTGTGTGTGTGGAAGTACAGGTGGCTTGGCTCGACGCCCTGTCATGAACAGCCTGTTTGCCCACTTGTGTTCAAATCACATGCACAGCTGTGGAAGCCTGGGTGGAATTCCTCAGCCTGGGTGGCAGTCTGCTCTTTTTATTTTTTTGTAGCTCTGGAGATTGAACCTAGGACCTTGCGTGTGCTAGACAAGTGCCCTGCCAGTATGCCCAGCCAGAATCCCAGTGTTGGTTTTTTTTTTGTTGTTTTTTTTTTTGGATTTTTCGAGACAGGGTTTCTCTGTGTAGCCCTGGCTGTCCTGGAACTCACTCTGTAGACCAGGCTGACATAGAACTCAGAAATCCTCCTGCCTCTGCCTCCCAAGTGCTGGGATTAAAGGCACGCGCCACCACTGCCCGGCTGAGTCCCAGTGTTGAAACGTCATCTTTTTCTTGTCTAAAGATGACCTAACGTCTTCAACAACTAGCTCACCACAACTACCTTGCATCTTCCCTGTCACAGCACAAGTCACGCAAAGGGTCCTTGGGTGCACCATGGGAACCTTAGGGGTAGAGGACTTACTACATATGCCTCCACTA
AGCAAAGACTGGAGTTCAGGAGGAGACATGACTTGTTAATGTCATCCAAACACTGAAGGGCAGGAGGGTGAGCTCCAGCCTGGCCCTCCACAGCCCATGTACAGAAGCGCCCCCACCTCCTTCCCAAGTCCTTGTCTGGGTCTCTTTCACAGCTACCCAACTGTCTTACAGGTCCAAGGAGCCAAGTAGGTTAGAACAAAACTCCAAAGGTGCCTTTAATATGTGATTCTTAAAAAGAAATAGAAAAAATAACAAGCACATAAAGGGGCAGAACGAGAATCTGTGGGCAAAGCCATGCCCACTCTCTTACCCACCCCCCCATGTCCCTCGCTTCTATCTTGGAGAGGATGGAGAAGGAACATGAAGTGGCCGGATCTTTTGTGTTCTGCTGCCACAACAGCAAGCTGAAGCCAGAGAAGTACTAGGAAGCCCATGAAAGACATGAGGCCAGGGCAGGCAGCCCTGGGAGGCGGCACTCACACCACCGAGGAGCTCTCAGCTGGCGAGCTCAAAACCTGGACCACATCTTCTCGGCCTATGGCAGCCAGAGCATCCTCCAGCACTCTGAGGGTAGCTCTATTGCCTTCTTGGGCAGCCCAGTTCCTTAGCAGGGTATAGGCTGGCATTTGGTCACAGGCCATGGTTTCCACAGCCTCAGCTTGGTAGCCGAGGTGGCCTGCCAGCTCCTGCCAGCCCTTGGCTGGCTCACCCATCATCAGGAGCCGCTGGACTTCCTCCTGCTGCTGCTGTGGAATATGCAGGTAAAGCTGGCAGCCAAGGTCCGGGTGTGGTCCTGGGTGGGGGTGGGGGTGGGGGTGAAGAGAGAAGTGTTAGTGGGTAGGGGAGGCACTAGTTAAATACAAAGGACTACAGACAGACTAGGAAACGTGCTCACCCTGGCTGGGAATACAGGGCTCCAGACTAGGAGGAGAGTCCACGAAGACGTTGCTGTCACCACGCCTCTGGTCCCTGTCAGGGTCCCCTAGCTCT
ACAGTCCGAGCTTTAGCCAACTGTTGCCTTTGCTTATGTGAGCGCCAGCTGTAGGGGGCAGAGGGCACTAAGACAAGGAACTGCCTCAGAGTCCAGGCATGGAGGGGATGCCACAGGACAGGACCCAGACCACCTACCATTTGAAGGCCACATAGGCCAGCAGACCAAGGATCACTGTAGCTAGGAGAGCACAGTAGACAGGAATGATGTTGCTCGAGGCCCCTGGAGGCTCAGGGGGAAATAGGGAGGAGGTATTGGGGGCTAGGGCTCCCCCAGCTCCTACCCACATTCCTTCCCTGTCCCCGTCCTGCCCCTGCAGGGCTGTATCTGTGAAGACAGACAGTGGTCAAGATAGGGAGCCACGGCAGCCTCACCTAGGTAGACCATCTTGGCAGAACTTTCCAAATACATTAGAGTTTACCATGTGTTAAAGGACTACATGGCTGGCCCTGGAGCGGCAAGAATGGCTCAGCAGACAGACACATGCCACCAAGTATGACAACCTGAGTTTGATCCCCCAGGATCCCTGAACCACACGTGCCTGCTAAGTGGTTACTGGGGCTTGTACCGCCTCAGGACAGGAGTTCTGTTCCCAACACCACATTGGATGGATCACAGCCACCTTTAACTCAATCTCCTGGGGGGGTTGATGCCCTCTTACACCCTCTGTGGGCACTCTCACACACAGACGTGCACATAGGTACACGTAAATGGCCCTTTGCTTGCCATTGTGGACAGTCACTCCCAGATGTGCCTTGTCATCTTCCGGAAGCATCCAAACGTCTTGCTATTGCATCTTCTCCTAAACGCACAGCAGGATCTCCTCTGGAAGCCTTCCTTGACCTTTCTCTTTTCACCCTGGTTGCACCACCTCTGCTTACCACAGCACAAGATTGGTCGGCTTCCAGGGTAGACTGTGAGAGCACACTGATGTGTGCTGGTGTCATGGGATCATAAAAATAAAACTTAACTGGAAGTAAATACTGGTGC
TCTCTCACTTCTGCCTTAAGTCCAATGACTGACTAGTCCTTGTACCCAGTGTAGACAGGGTTCAAGGGTCAGGGACTAAAGAGCCCGTGAATGGACTTGTACACGACCCACTCTACCTCCCAATCTGCCCGCTACCTAGGATCTGGGACAAGGAATGCCTACCTGAATAGACCACACCTTTGCTGACGTTATAAAGCATGGTGCTGCCAGGCGCTGCCTGCTGGGCTGTTTTTTGGTGAAGGGGAGTTGTGTAGAGAACTGAGTTGACGGAGCTTGGAGCCTGCTTCTCTTGAATCTCAGAAGGAGGAACTCTGCAGAAATATGAAGAAATCCTCAGATCTGCTGTCAGGAGTCCCTGGGGGCAGCGTGTGTGTCCATCCTGATTCACACCAGGGCTGGAACAGTTTCTCCTGTGTCAAATAGGTGGAGGTAACAACTGGCTCCTGCTAGCCAAAGCTGGTGGCATGGGGTACAGCTCAGGATAGAAATCACCCGATCTCAAAGACCTTCCCAAGTACCTGGGCTGAGTCTGGGGATGTTAAAAGGAGGGTAAAGATACATGGACTGTGACCCTATCTCAGGCTGAAACATCCCTAGGAACTGGTGATCATACACATCTGCCAAAACCCAGCATCCCAAGGTCCCCAAGGCAAGCCCGCTTACACTGCCTAATGCTCCCTCTGTCCCCTCCAGGGCCAGTGGCCAGCCCTTTCCTGGGCAGGGACTAAGTGAACTGATACCTTAATGGTTCAGCAAACTCATAGACCATACAGATCTCAGAGGGCCTGAAATGGAAGAACAGGGCCAACTCAGACCAAGGACCCACCTGGACCCTAGAGTACTAAATCTGCTGAGACCACTCCCTGCAGTCTACGGAGGAGGAGAGGCTCCAGATCTGAAGGAGGAGAGAGATCTGGCTTGGATAAGTAAGAGATTAAAAGAGGCCCTTCAAGTCCCCAACGGCTACTGCTGCATAGCCAATGGCTCTAA
CACGTGGTGTGTCTATAGTAGGGCTATCAGCAGTTCGGTGCTGTAGTCAGGTAAACCTCTGATGTGGGTGGCCCCTTTATGGACTTTGTATCTTTGTGTCGCCACATTGGGAGTTGGGGGCTTCTGGGATCCTTGGTGGGTGGGCCTGTAAACCAGTAGCAGCAGACAGGCCTTGGAATCTGCCCCACCCATCCTGAGCAGCCGGAGTGGAACTTCTTCAGGGCTCCCCACCCATCTCCAAGCCCAAAATGGGAAGAAACAATTCAACAGCCCCTGCAGGCCCATACACACCCCAACACAGACGCTGGTACCCACAGGATCAGAGCACTAAAGGCGGGAGACAGAGAAAGTTCTGGCCCCTTCCACCTAGCAAGAGCCCGGCTAGTCATTCCCTCCTAACCTTCTGCGGCCACCCCTCCGAGGGTGCCAGGATCCACTCAGCTAGGAAGACGACGGGAGTCCCTGGAGAGGGCAGGTTCCGGTCTGCCCAAGAGTGAGCCAAGGCAAGGGGCGGGCCAGTGGGGGGGTGGTGTTGAAGAGGGGAGCAGGACAATGAAGAGGCGGGGCCGAGCTCGAGGGCGCGGTCCCGCCCCCGCCCCACGCCGGAGCACGCAGAAGCACTCGGAGTTCACAGAAGCCGACACCAGCGTGCCTGGCAGAGCAGGCCACTGGCATGCAAATGCCATGCAATGGACCGCGAGAGCTGAGAACCAGGAGTCAGGAAACGTCTGGCAAAGCCAGAGGCGCCTCCGCTGGCTACACCGAGGCCAGCCTGGCCAGGAAGAAGCATGCCAGGCCAGACAGGGTAACAGAGGCTAAGACTGGGGGCCACAGGAGGCCAAGGACGGCGGCACATGTGTACTCAAGAAACCGAAAGATTACAAAACTAGGCCACGTTTATTGCTGAGAATGGGCAGCGATAGTCACCTTTGAGGATTAAGGCCACAGGTGGTCTTTGTGCTTTCACTGGGACGTGGGATTTGAAAGTAG
GGATTCCCTCCCACCCCAGAT"
result_handle = NCBIWWW.qblast("blastn", "gpipe/10090/ref_contig", seq)
with open("ncbitest.xml", "w") as fh:
fh.write(result_handle.read())
I hadn't realised quite how large that file is (150MB). I should
probably filter it for the purposes of my code...
> OK, I have checked in the fix for the "\n\n" issue - I'm satisfied that it is
> sensible even if I haven't verified it first hand.
>
Just to let you know, the patch is a little verbose - it reports each
time it has to wait, which fills up the screen on some of my examples.
>
> The Biopython qblast function is calling http://blast.ncbi.nlm.nih.gov/Blast.cgi
> internally, but that web interface doesn't allow us to pick these non-standard
> databases, so a fair test (Biopython vs website) on the same URL isn't
> possible. That's a shame.
>
This page has a URL for the search I want:
http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=10090&db=ref_contig&pgm=mbn&EXPECT=2&DESCRIPTIONS=3&ALIGNMENTS=3
It selects mouse with the taxid and the database as ref_contig to give
me the reference sequence only. However if I do this:
result_handle = NCBIWWW.qblast("blastn", "ref_contig", seq,
entrez_query="txid10090[orgn]")
I get the "Results == '\n\n': continuing..." message for several pages.
It hasn't terminated after about 10 minutes.
Peter
More information about the Biopython
mailing list