[Biopython-dev] forward_complement, reverse_complement
Michiel Jan Laurens de Hoon
mdehoon at ims.u-tokyo.ac.jp
Sun Jun 27 23:38:14 EDT 2004
I did some timings on the complement, reverse, and reverse_complement in
Bio.SeqUtils, Bio.GFF.easy, and Bio.Seq. It turned out that reverse_complement
and forward_complement in Bio.GFF.easy are faster than their counterparts in
Bio.SeqUtils. However, using the map function gives even faster results:
def reverse_complement(self):
from Bio.Data.IUPACData import ambiguous_dna_complement
self.data = map(lambda c: ambiguous_dna_complement[c], self.data)
self.data.reverse()
self.data = array.array('c', self.data)
Here, I implemented reverse_complement as a member function of MutableSeq. My
feeling is that that is the best place for this function, as it also has a
member function "reverse". SeqUtils mainly contains functions that analyze
sequences, but don't modify them.
The timing results are below. Note that the functions in Bio.SeqUtils can handle
both strings and Seq objects, with the Seq objects being slower, while
Bio.GFF.easy and Bio.Seq handle Seq objects only.
Can I go ahead and update CVS to add complement and reverse_complement to
Bio.Seq? I'll clean up Bio.GFF.easy and Bio.SeqUtils accordingly.
--Michiel.
Timings (in seconds)
=====================
Bio.GFF.easy Bio.SeqUtils Using map
reverse_complement antiparallel reverse_complement
Sequence length Seq object Seq object string Seq object
1 000 0.002 0.004 0.002 0.002
10 000 0.017 0.045 0.023 0.012
100 000 0.166 0.444 0.225 0.117
1 000 000 1.651 4.347 2.234 1.135
10 000 000 18.187 45.137 24.179 11.697
100 000 000 192.243 457.680 242.258 116.170
Bio.GFF.easy Bio.SeqUtils Using map
forward_complement complement complement
Sequence length Seq object Seq object string Seq object
1 000 0.002 0.005 0.002 0.001
10 000 0.016 0.042 0.020 0.012
100 000 0.165 0.435 0.192 0.119
1 000 000 1.638 4.283 1.912 1.166
10 000 000 17.993 45.085 20.937 11.572
100 000 000 193.528 443.024 209.573 116.916
Bio.SeqUtils Bio.Seq
reverse reverse
Sequence length Seq object string Seq object
1 000 0.003 0.001 0.000
10 000 0.023 0.003 0.001
100 000 0.226 0.022 0.010
1 000 000 2.232 0.227 0.107
10 000 000 22.592 2.319 1.057
100 000 000 225.447 23.094 10.559
Michael Hoffman wrote:
>>Bio/GFF/easy.py contains the functions forward_complement and
>>reverse_complement, which return the forward and reverse complement of a
>>sequence object. I had been looking for such functions in Biopython for a while,
>>but I assumed that they were not available as I didn't find them in Bio/Seq.py.
>>I'd like to propose to move those two functions there. Note that Bio.SeqUtils
>>contains similar functions that work on strings but not on sequence objects. Any
>>thoughts?
>
>
> I wrote those when Bio.GFF was not part of Biopython and they are
> really only there to support Bio.GFF.
>
> It would probably be better to change the Bio.SeqUtils funtions to
> work on sequence objects. I imagine the Bio.SeqUtils functions are
> much faster since much of the work gets passed to the native function
> str.translate().
--
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
More information about the Biopython-dev
mailing list