[BioPython] [Fwd: Re: qblast + documentation]

Peter biopython at maubp.freeserve.co.uk
Fri Aug 24 09:35:16 UTC 2007


I managed to send this to Michael Robeson only, and not the list.

In the mean time Michiel de Hoon pointed out that support for lots of 
parameters (including NUCL_REWARD and NUCL_PENALTY) was added after the 
release of Biopython 1.43, so you would have to update the file 
Bio/Blast/NCBIWWW.py

If you don't want to bother with CVS, the easy way to do this is backup 
the original and then replace it with the download from here:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Blast/NCBIWWW.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python

Looking at the history, that shouldn't impact anything else.

Peter

-------- Original Message --------
Subject: Re: [BioPython] qblast + documentation
Date: Thu, 23 Aug 2007 22:45:12 +0100
From: Peter <biopython at maubp.freeserve.co.uk>
Reply-To: biopython at lists.open-bio.org
To: Michael S. Robeson <Michael.Robeson at colorado.edu>
References: <A58FA3CF-1D84-466E-AD55-EE1EB7D0DC2D at colorado.edu>

Michael S. Robeson wrote:
> I meant to add to the other e-mail:  Since we know it has to do with  
> the blast setting differences on the NCBI web site versus the  
> biopython back-end. How can I get around that? I've tried a few  
> things, but have not been successful.

I'd just like to point out this isn't Biopython's fault - its the NCBI
who seem to be using different match/mismatch penalties on their GUI
webserver and their QBLAST web API.

The simple answer for how to get the two queries to agree, is whenever
you do any manual queries on the website, click on "Algorithm
parameters" and under "Scoring Parameters" change "Match/Mismatch
Scores" from "2, -3" to "1, -3" (grin).

Or, if you want to change the gap penalties in the qblast call, what
Biopython is doing is providing a python interface to this URL scheme:

http://www.ncbi.nlm.nih.gov/BLAST/Doc/node5.html

If you are familiar with the BLAST terminology this is probably very
obvious, but you need to set the NUCL_REWARD and NUCL_PENALTY options in
the URL.

i.e. In Biopython use the optional nucl_reward and nucl_penalty
arguments to the Bio.Blast.NCBIWWW.qblast function.

from Bio.Blast.NCBIWWW import qblast
seq_string = "TGTGATGGATATCTGCAGAATTCGCCCTTTAAACTTCAGGGTGACCAAAA" \
             + "AATCAAAATAAATGTTGAAATAATACTGGATCTCCACCACCACTAACTTC" \
             + "AAAAAATGTTGTATTAAAATTTCTATCAGTTAATAACATTGTTATAGCAC" \
             + "CCCCTAATACTGGTAATGATAATAATAATAATCATGCTGTTATAAATACA" \
             + "GCTCAAACAAATAAAGGTAACTTAAACATACTCATACCAGGTGTTCGCAT" \
             + "ATTAATAACAGTAACAATAAAATTTATTGAACCTAATATTGATGATATAC" \
             + "CAGCTAAATGTAAACTAAATATTGCACATTCTATTGAACCTCCTGAATGT" \
             + "GAAAATATACCAGATAATGGTGGATAAACAGTTCAACCTGTACCTGCCCC" \
             + "CATCTCGACTACAGATGATCAAATTAATAAAAAAAATGATGGTACTAATA" \
             + "ATCAAAAACTTATATTATTTAATCTTGGGAATGCCATATCAGGAGCTCCT" \
             + "ATCATTAAAGGTAAAAATCAATTACCAAAACCACCCATTAATGCAGGCAT" \
             + "AACCATAAAAAATATCATTATTAAAGCATGTGCTGTTATTAACACATTAT" \
             + "ATGCTTGATGATTGTAATTTAATATTACTGCACCAGCATCTGATAATTCT" \
             + "ATACGTATTAATATAGATCAAAATGTTCCTATTAAACCTGCTAAAAATGC" \
             + "AAATATTAAATATAATGTTCCAATATCTTTATGATTTGTTGACCAAGGGC" \
             + "GAATTCCAGCACACTGGCGGCCGTTACTAG"

#filename, format = "test.html", "HTML"
#filename, format = "test.txt",  "Text"
filename, format = "test.xml",  "XML"

result_handle = qblast('blastn', 'nr', seq_string, format_type=format,
                        nucl_reward=2, nucl_penalty=-3)
output_handle = open(filename, "w")
output_handle.write(result_handle.read())
output_handle.close()

print "Done"


And this does seem to agree with the results from doing the query by
hand on their website with the default settings.

Peter





More information about the Biopython mailing list