[BioPython] blast
Julien Cigar
jcigar at ulb.ac.be
Wed Jul 23 18:02:48 UTC 2008
Just FYI and another point of view (?), I asked one of the user about
which method is the best and he said that "one should not depend on the
length of the sequence to know how different/similar the use fragment
is", so it's really percentage identity which is needed
Julien
On Wed, 2008-07-23 at 16:36 +0100, Peter Cock wrote:
> > Sorry, both give me wrong percentage if I try on my database.
> >
> > Look here, compare alignment and percentage:
> >
> > http://picasaweb.google.de/luecks/Python02/photo#5226228430697476754
> >
> > e.g Hit 13 should give 60 %
> >
> > What you recommend to use instead of hsp.score?
>
> I am assuming you want to parse some BLAST output in order to populate
> this database. How about something based on this:
>
> from Bio.Blast import NCBIXML
> for record in NCBIXML.parse(open("test.xml")) :
> print "Query %i length %i" % (record.query_id, record.query_letters)
> for alignment in record.alignments :
> for hsp in alignment.hsps :
> percentage_identities_versus_full_query = (100.0 *
> hsp.identities) / record.query_letters
> print " vs %s gives %0.1f%% identities" \
> % (alignment.hit_id, percentage_identities_versus_full_query)
>
> This uses the fact the the original query length is recorded in the
> record object as the "query_letters" property (this name was a
> historical choice based on the plain text blast output).
>
> For the example you gave, then then I would expect hsp.identities ==
> 12 (and hsp.alignment_length == 12) while record.query_letters == 60
> which will give you the desired output of 60%.
>
> Peter
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list