[BioPython] blast

Wed Jul 23 18:02:48 UTC 2008

Just FYI and another point of view (?), I asked one of the user about
which method is the best and he said that "one should not depend on the
length of the sequence to know how different/similar the use fragment
is", so it's really percentage identity which is needed

Julien

On Wed, 2008-07-23 at 16:36 +0100, Peter Cock wrote:
> > Sorry, both give me wrong percentage if I try on my database.
> >
> > Look here, compare alignment and percentage:
> >
> > http://picasaweb.google.de/luecks/Python02/photo#5226228430697476754
> >
> > e.g Hit 13 should give 60 %
> >
> > What you recommend to use instead of hsp.score?
> 
> I am assuming you want to parse some BLAST output in order to populate
> this database.  How about something based on this:
> 
> from Bio.Blast import NCBIXML
> for record in NCBIXML.parse(open("test.xml")) :
>     print "Query %i length %i" % (record.query_id, record.query_letters)
>     for alignment in record.alignments :
>         for hsp in alignment.hsps :
>             percentage_identities_versus_full_query = (100.0 *
> hsp.identities) / record.query_letters
>             print " vs %s gives %0.1f%% identities" \
>               % (alignment.hit_id, percentage_identities_versus_full_query)
> 
> This uses the fact the the original query length is recorded in the
> record object as the "query_letters" property (this name was a
> historical choice based on the plain text blast output).
> 
> For the example you gave, then then I would expect hsp.identities ==
> 12 (and hsp.alignment_length == 12) while record.query_letters == 60
> which will give you the desired output of 60%.
> 
> Peter
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython