[BioPython] Rethinking Seq objects

Michiel Jan Laurens de Hoon mdehoon at ims.u-tokyo.ac.jp
Sat May 7 01:36:01 EDT 2005


Gavin Crooks wrote:
> On May 5, 2005, at 00:30, Michiel Jan Laurens de Hoon wrote:
>>> If, in the alternative, Seq was a simple immutable object then it 
>>> could be implemented as a light weight subclass of str, with an 
>>> alphabet attribute that is also a subclass of str. You'd edit it like 
>>> you would edit any string in python;  split it into a list, do 
>>> whatever manipulations are necessary, and then join the list back 
>>> together into a new Seq.
>>
>> There may be performance issues with this approach, if a Seq object is 
>> mutated often. So let's wait and see if any of our users actually want 
>> to mutate a sequence object, and if so, if the performance is critical.
> 
> Performance would be no worse than for string manipulation in standard 
> python. The Way of The Python is not to use MutableString's (Which are 
> in the standard library, but not really canonical) but to split string 
> into lists or arrays, do whatever manipulations are necessary and then 
> join the string back together. Is there any reason why Seq's can't be 
> mutated analogously?
> 
Well, I was gonna say that Seq objects can be very large, certainly much larger 
than common usage of strings in Python, and that this will be a performance 
issue. But when I tried to modify a long string by splitting and rejoining, it 
doesn't seem to be bad at all. So maybe this is the way to go.

--Michiel.


-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


More information about the BioPython mailing list