[Biopython] slow pairwise2 alignment

Sun Jun 7 12:30:45 UTC 2009

On Sat, Jun 6, 2009 at 2:16 PM, Ogan ABAAN<oda at georgetown.edu> wrote:
> Thanks Peter for the reply.
>
> So as I understand pairwise2 should be running in C code without me doing
> anything.
>
> As for my code goes, it is actually quite simple.
>
>>from Bio import pairwise2 as pw2
>>primerlist=[22mer1,22mer2]
>>filename=sys.argv[1]
>>input= open(filename,'r')
>>count= 0
>>for line in input:
> ....line= line.strip().split() #line[8] contains the 30mer target seq
> ........for primer in primerlist:
> ............try:
> ................alignment=
> pw2.align.globalmx(line[8],primer,2,-1,score_only=1)
> ................if alignment>=len(primer)*2-len(primer)/5: #40 or better out
> of 44
> ....................count+= 1
> ............except IndexError: pass
>>input.close()
>>output= open(filename+'output.txt','w')
>>output.writeline(str(count))
>>output.close()
>
> Do you think there is room for improvement. Sorry for typos if any.
>
> Thanks

Hi Ogan,

You forgot to CC the mailing list on your reply ;)

There is something funny about your indentation - but I assume that
was just a problem formatting it for the email.

One simple thing you are wasting time a lot of time recalculating
this: len(primer)*2-len(primer)/5

By the way - do you mean to be doing integer division? If the
alignment score is an integer this may not matter.

You could calculate these thresholds once and store them in a list,
then do something like this:
for (primer, threshold) in zip(primerlist, thresholdlist) : ...

Of course, it would be sensible to do some profiling - but I don't see
anything else just from reading it.

Peter