[Biopython] Bio.pairwise2

Michiel de Hoon mjldehoon at yahoo.com
Fri Jun 28 15:23:06 UTC 2013


Hi all,

So I started using Bio.pairwise2, and while the code now returns the alignment that I expect to find, it is also very slow, even when using the C implementation.

This is an example of how Bio.pairwise2 is used:
    >>> from Bio import pairwise2
    >>> alignments = pairwise2.align.globalms("ACCGT", "ACG", 2, -1, -.5, -.1)
Here, pairwise2.align is an object. Calling pairwise2.align.globalms creates a new object that parses the "globalms" name as well as the arguments to find out how to run the alignment.

While this works, it seems rather unpythonic to me. Perhaps more importantly, because the name and argument parsing is repeated for each alignment, it is on the order of 100x slower than it could be.

I would prefer something along these lines:
>>> aligner = NeedlemanWunsch(...arguments...)
# sets all the specifics and parameters to use for the alignment
>>> alignments = aligner.run(sequenceA, sequenceB)
# this can now be fast, as the algorithm initialization has already been done.
# one may also want to add a convenience function like this:
>>> def align(sequenceA, sequenceB, ...arguments...):
            aligner = NeedlemanWunsch(...arguments...)
            return aligner.run(sequenceA, sequenceB)

Finally, I think such an alignment code would fit in better as a submodule of Bio.Align.

Comments, suggestions, anybody?

Best,
-Michiel.




More information about the Biopython mailing list