[BioPython] help with NCBIWWW.qblast

Peter biopython at maubp.freeserve.co.uk
Sat Jan 3 12:59:34 UTC 2009


On 1/3/09, Jessica Grant wrote:
>
>  I have never seen that ("rU") before.  I will give it a try.  Thanks!
>
>  Jessica
>

I meant to type "universal new lines mode", but anyway its been
present since at least Python 2.3 and can be very helpful in this
situation - see:
http://docs.python.org/library/functions.html

I was hoping you could tell me the exact string you used as your
tblastx query, because it could be useful to be able to reproduce this
kind of error.

I hope you don't mind me sharing some comments on your original
snippet of code, which looked like this:

result_handle = NCBIWWW.qblast("tblastn", "nr",  fas.seq.data)

I'm assuming that the variable fas is a SeqRecord, thus fas.seq is its
Seq object.  Using a Seq object's data property to get the sequence as
a plain string is discouraged (it hasn't been in the tutorial for some
time), as the Seq object now behaves much more like a string itself.
You could have just used fas.seq here.

This brings me to my next point, the NCBI qblast interface will take
three kinds of queries, (1) a record identifier like a GI number, (2)
a sequence, or (3) a FASTA format string.

Supplying just the sequence (as in your code) means that BLAST will
assign an identifier for your sequence automatically. You might prefer
to use the SeqRecord object's format method to make a fasta string
(which will include the existing identifier - and that should then be
present in the BLAST results):

result_handle = NCBIWWW.qblast("tblastn", "nr",  fas.format("fasta"))

This information is in the version of the Biopython Tutorial, but I
thought it worth bringing it up here too.  The format method used here
was added to the SeqRecord (and Alignment) objects in Biopython 1.48.

Peter



More information about the Biopython mailing list