[Biopython] slow pairwise2 alignment

Ogan ABAAN oda at georgetown.edu
Sun Jun 7 09:16:03 EDT 2009


Thank you Peter, again

I thought reply should go back to the group as well, so I learned one more
thing.

As for the formatting goes, I typed it in my self so it may not be proper.

You are correct about the integer division, the alignment score is an
integer. Since for now all the primers are of equal length, I can just use a
fixed threshold. I calculated as such so that the code will be flexible with
variable length primers.

Thank you very much for all the helpful tips.



On Sun, Jun 7, 2009 at 8:30 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Sat, Jun 6, 2009 at 2:16 PM, Ogan ABAAN<oda at georgetown.edu> wrote:
> > Thanks Peter for the reply.
> >
> > So as I understand pairwise2 should be running in C code without me doing
> > anything.
> >
> > As for my code goes, it is actually quite simple.
> >
> >>from Bio import pairwise2 as pw2
> >>primerlist=[22mer1,22mer2]
> >>filename=sys.argv[1]
> >>input= open(filename,'r')
> >>count= 0
> >>for line in input:
> > ....line= line.strip().split() #line[8] contains the 30mer target seq
> > ........for primer in primerlist:
> > ............try:
> > ................alignment=
> > pw2.align.globalmx(line[8],primer,2,-1,score_only=1)
> > ................if alignment>=len(primer)*2-len(primer)/5: #40 or better
> out
> > of 44
> > ....................count+= 1
> > ............except IndexError: pass
> >>input.close()
> >>output= open(filename+'output.txt','w')
> >>output.writeline(str(count))
> >>output.close()
> >
> > Do you think there is room for improvement. Sorry for typos if any.
> >
> > Thanks
>
> Hi Ogan,
>
> You forgot to CC the mailing list on your reply ;)
>
> There is something funny about your indentation - but I assume that
> was just a problem formatting it for the email.
>
> One simple thing you are wasting time a lot of time recalculating
> this: len(primer)*2-len(primer)/5
>
> By the way - do you mean to be doing integer division? If the
> alignment score is an integer this may not matter.
>
> You could calculate these thresholds once and store them in a list,
> then do something like this:
> for (primer, threshold) in zip(primerlist, thresholdlist) : ...
>
> Of course, it would be sensible to do some profiling - but I don't see
> anything else just from reading it.
>
> Peter
>


More information about the Biopython mailing list