[Biopython] Blast using Biopython

Martin Mokrejs mmokrejs at fold.natur.cuni.cz
Tue Oct 15 22:33:32 UTC 2013


Hi Tanya,
  I suppose you use the newer ncbi-tools++ suite. Try the legacy blastn from the ncbi-tools suite.
The version numbering is same ... I have better experience with "blastall -p blastn" form the old
suite. You can also try to find some switch to force the really old blastn algorithm buried in
blastall (nowadays the blastall uses the new algorithm which is in the new ncbi-tools++ suite).
However, experience shows that "blastall -p blastn" gives different results compared to blastn
although BOTH should be in theory using the new algorithm. With the possibility to force the real
predecessor of the algorithm in blastall you have a third method to test.

  From blastall you get only limited results into CSV-formatted output, you cannot change the
output columns. For me important results can be only parsed from XML/plaintext results of blastall.

  You can increase the reward for a match "-r 2" to overcome some gaps on sides but depends what
queries you have and whether that does not give you elsewhere falsely widened alignments. You have
to test that.

Good luck,
Martin


Tanya Golubchik wrote:
> Hi guys,
> 
> This is strictly speaking more about blast than biopython, but I was wondering if anyone has any tips on doing the following: searching for a hit in a nucleotide database using tblastn, but reporting the actual DNA sequence of the subject, rather than the translated protein sequence. Is there by any chance a way of extracting this from the XML output?
> 
> What I'm finding is that blastn sometimes misses the edges, where substitutions close the ends of my hit result in a truncated hit (rather than a complete hit with a mismatch or two). The full hit is reported correctly by tblastn, but of course this returns the protein translation rather than the original nucleotide sequence. It's probably a long shot, but just wondering if anyone has ideas -- the brute force approach would be to get the start and stop positions from tblastn and then extract and re-align this fragment to my query, but that seems redundant given that blast has already done this for me...



More information about the Biopython mailing list