[BioPython] blast

Peter Cock p.j.a.cock at googlemail.com
Wed Jul 23 14:49:39 UTC 2008


On Wed, Jul 23, 2008 at 3:30 PM, Stefanie Lück <lueck at ipk-gatersleben.de> wrote:
>>Peter wrote:
>>
>> Using hsp.score gives the raw score, which you are then scaling by the
>> length and 100.  I'm not sure offhand what you've calculated, but if
>> you want the percentage identity, I think its just the number of
>> identically match letters divided by the alignment length:
>>
>> percentage_identity = (100.0 * hsp.identities) / hsp.align_length
>
> I tried and it dosen't work in my case. It's gives me wrong percentage.
>
> I need to know at which % my primer match to the query part:
>
>
> query          --------taggcctcgcgcgcc-------
>              ||||||||||||||||||||||||||||||
> primer         tagcgctataggcctcgcgcgccatatagc
>
> Here 50 %.

Your query match region has an un-gapped length of 15, and gapped length of 30
Your subject match region has a length of 30
Your query and subject have 15 identical matches
The alignment length is 30, therefore 100*15/30 = 50%

Using my suggested formula correctly gives the percentage identies
relative to the alignment length:
percentage_identity = (100.0 * hsp.identities) / hsp.align_length

It seems what you want to calculate is 100*15/15 = 100%  i.e. The
percentage identites relative to the ungapped query length?  This
would probably work but isn't very elegant:

percentage_identity_ref_matched_query = (100.0 * hsp.identities) /
(hsp.align_length - hsp.query.count("-"))

Or even relative to the full original query?  If record is the blast
record object that the hsp cam from,

percentage_identity_ref_full_query = (100.0 * hsp.identities) /
record.query_letters)

(For a short query sequence like a probe, these may be the same)

Peter




More information about the Biopython mailing list