[BioPython] blast
Peter Cock
p.j.a.cock at googlemail.com
Wed Jul 23 14:49:39 UTC 2008
On Wed, Jul 23, 2008 at 3:30 PM, Stefanie Lück <lueck at ipk-gatersleben.de> wrote:
>>Peter wrote:
>>
>> Using hsp.score gives the raw score, which you are then scaling by the
>> length and 100. I'm not sure offhand what you've calculated, but if
>> you want the percentage identity, I think its just the number of
>> identically match letters divided by the alignment length:
>>
>> percentage_identity = (100.0 * hsp.identities) / hsp.align_length
>
> I tried and it dosen't work in my case. It's gives me wrong percentage.
>
> I need to know at which % my primer match to the query part:
>
>
> query --------taggcctcgcgcgcc-------
> ||||||||||||||||||||||||||||||
> primer tagcgctataggcctcgcgcgccatatagc
>
> Here 50 %.
Your query match region has an un-gapped length of 15, and gapped length of 30
Your subject match region has a length of 30
Your query and subject have 15 identical matches
The alignment length is 30, therefore 100*15/30 = 50%
Using my suggested formula correctly gives the percentage identies
relative to the alignment length:
percentage_identity = (100.0 * hsp.identities) / hsp.align_length
It seems what you want to calculate is 100*15/15 = 100% i.e. The
percentage identites relative to the ungapped query length? This
would probably work but isn't very elegant:
percentage_identity_ref_matched_query = (100.0 * hsp.identities) /
(hsp.align_length - hsp.query.count("-"))
Or even relative to the full original query? If record is the blast
record object that the hsp cam from,
percentage_identity_ref_full_query = (100.0 * hsp.identities) /
record.query_letters)
(For a short query sequence like a probe, these may be the same)
Peter
More information about the Biopython
mailing list