[Biopython] Getting 'raw_score' of a Blast hit

Alexey Morozov alexeymorozov1991 at gmail.com
Thu Jan 11 14:53:40 UTC 2018


I'm not sure what hit score is and whether it even applies to modern BLAST.

According to this (
http://etutorials.org/Misc/blast/Part+III+Practice/Chapter+7.+A+BLAST+Statistics+Tutorial/7.1+Basic+BLAST+Statistics/),BLAST
applies a lot of statistics to HSP scores/evalues, considering database
size, sequence composition and God knows what else. That book is pretty old
(2003), so maybe it's BLAST1?
Judging by the BLAST wikipedia page (
https://en.wikipedia.org/wiki/BLAST#Algorithm), BLAST2 merges whatever it
can into a single HSP and gives the score of that alignment as an HSP score.

And if HSPs cannot be merged, eg with repetitive sequences, the idea of a
hit score doesn't make much sense anyway. I've just checked and with
incompatible HSPs web-blast does not report any hit-wide statistics. The
hits are just sorted by the best HSP, no matter whether the second one is
about as good as the best or merely noticeable. You can see it with this
sequence (
https://gist.github.com/SynedraAcus/690870ff00bf00dd832a635fe0652f81)
against protein nr. And it does allow pretty sizeable indels, on the order
of hundreds of nucleotides at least.

So in most cases with BLAST2 you can just take the best HSP (most likely
only one) and consider it done. If you expect alignments that can't be
merged (large repeats/rearrangements or large indels/tracts of
nonhomologous sequence), you probably need some specific statistics to
cover that.

Also, out of pure curiosity: what do you use raw scores for? Some kind of
distance calculations? I don't think I've ever seen them used.



2018-01-11 20:15 GMT+08:00 Adam Sjøgren <asjo at koldfront.dk>:

>   Hi,
>
>
> Peter writes:
>
> > Each of the HSPs will have its own raw score - it that not what you want?
>
> To be honest: I wasn't sure; I am porting pieces of old code, trying to
> match it as much as possible, more or less with my head under my arm.
>
> I see now that we use hsp.score in other similar places, so I think I
> can use hsp.score and be done with it.
>
> (It is a different value, though, right?)
>
> > Also you might find Bio.SearchIO more future proof over using Bio.Blast
> > (the later offers a more generic interface covering HMMER etc as well).
>
> Thanks for the hint, I'll have to look into why my colleague chose
> Bio.Blast.
>
>
>   Best regards,
>
>     Adam
>
> --
>  "That's one of the remarkable things about life. It's        Adam Sjøgren
>   never so bad that it can't get worse."                 asjo at koldfront.dk
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>



-- 
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20180111/7d6f0fa4/attachment.html>


More information about the Biopython mailing list