[Biopython] Blast using Biopython
Tanya Golubchik
golubchi at stats.ox.ac.uk
Thu Oct 17 14:07:50 UTC 2013
Hi Martin,
Using task=blastn seems to solve the problem! Thanks so much, I didn't
realise that the default (megablast) behaviour is different even when
the word size and other parameters are changed. Blastn seems to find the
edges much more precisely than megablast. I haven't thoroughly tested it
yet to make sure it doesn't break anything, but so far so good!
Thanks
Tanya
On 15/10/13 23:33, Martin Mokrejs wrote:
> Hi Tanya,
> I suppose you use the newer ncbi-tools++ suite. Try the legacy blastn from the ncbi-tools suite.
> The version numbering is same ... I have better experience with "blastall -p blastn" form the old
> suite. You can also try to find some switch to force the really old blastn algorithm buried in
> blastall (nowadays the blastall uses the new algorithm which is in the new ncbi-tools++ suite).
> However, experience shows that "blastall -p blastn" gives different results compared to blastn
> although BOTH should be in theory using the new algorithm. With the possibility to force the real
> predecessor of the algorithm in blastall you have a third method to test.
>
> From blastall you get only limited results into CSV-formatted output, you cannot change the
> output columns. For me important results can be only parsed from XML/plaintext results of blastall.
>
> You can increase the reward for a match "-r 2" to overcome some gaps on sides but depends what
> queries you have and whether that does not give you elsewhere falsely widened alignments. You have
> to test that.
>
> Good luck,
> Martin
>
>
> Tanya Golubchik wrote:
>> Hi guys,
>>
>> This is strictly speaking more about blast than biopython, but I was wondering if anyone has any tips on doing the following: searching for a hit in a nucleotide database using tblastn, but reporting the actual DNA sequence of the subject, rather than the translated protein sequence. Is there by any chance a way of extracting this from the XML output?
>>
>> What I'm finding is that blastn sometimes misses the edges, where substitutions close the ends of my hit result in a truncated hit (rather than a complete hit with a mismatch or two). The full hit is reported correctly by tblastn, but of course this returns the protein translation rather than the original nucleotide sequence. It's probably a long shot, but just wondering if anyone has ideas -- the brute force approach would be to get the start and stop positions from tblastn and then extract and re-align this fragment to my query, but that seems redundant given that blast has already done this for me...
More information about the Biopython
mailing list