[Biopython] Blast using Biopython

Tanya Golubchik golubchi at stats.ox.ac.uk
Tue Oct 15 14:40:29 UTC 2013


Hi guys,

This is strictly speaking more about blast than biopython, but I was 
wondering if anyone has any tips on doing the following: searching for a 
hit in a nucleotide database using tblastn, but reporting the actual DNA 
sequence of the subject, rather than the translated protein sequence. Is 
there by any chance a way of extracting this from the XML output?

What I'm finding is that blastn sometimes misses the edges, where 
substitutions close the ends of my hit result in a truncated hit (rather 
than a complete hit with a mismatch or two). The full hit is reported 
correctly by tblastn, but of course this returns the protein translation 
rather than the original nucleotide sequence. It's probably a long shot, 
but just wondering if anyone has ideas -- the brute force approach would 
be to get the start and stop positions from tblastn and then extract and 
re-align this fragment to my query, but that seems redundant given that 
blast has already done this for me...

Thanks
Tanya



More information about the Biopython mailing list