[Biopython-dev] Rethinking Seq objects

Frédéric Sohm Frederic.Sohm at iaf.cnrs-gif.fr
Wed May 11 07:20:50 EDT 2005


Hi Michiel and everyone,

I have no problem with suppressing the mutable Seq object. The solution proposed
should be OK.

But last time I posted something and forgot the subject so it got lost in the
mailing list instead of coming to the right place, here is the proposition :

Just a thought, don't flame me for that.
Since you will be making a new Seq object, will it be worth making it behave
more like a typical object :

But first a disclaimer, I realise the proposed change could mean breaking a
lot of code, so it might a very bad idea in the end.

When I did first used Biopython, I have been surprised by the behaviour of
 Seq object, in regards of the use of the built-in str() and repr() functions
 (I should have read the manual first, but hey...) :

Ok here is a the Seq behaviour :
>>> from Bio.Seq import Seq
>>> a = 'a'*80
>>> a
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa'
>>> s = Seq(a)
>>> s
Seq('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa', Alphabet())
>>> str(s)
"Seq('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaa ...', Alphabet())"
>>> repr(s)
"Seq('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaa', Alphabet())"
>>> s.tostring()
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaa'

Now here is  what I was expecting at the time following the respective
 meaning of str and repr

>>> a = 'a'*80
>>> a
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaa'
>>> s = Seq(a)
>>> s
Seq('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaa', Alphabet())
>>> str(s)
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaa'
>>> repr(s)
"Seq('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaa', Alphabet())"


So what I would propose is to :
     - change str(seq) to return the actual sequence as do seq.tostring() right
       now. leave repr(seq) as it is,
     - make seq.tostring()  return str(seq) for backward compatibity. (Would be
       eventually removed).
     - add a new function Seq.short() for example which would behave like the     
       actual str(Seq).

I don't have any idea how much code this would break. And the feasability of
it will as well depends on the way the new Seq will be release (I mean do you
plan to have the actual Seq and the new one co-existing for a while or to
directly replace the old Seq?).
If the later is the way we go this change is certainly not desirable,
otherwise it might be something to consider.

Personally I have mix filling about it, but I think it is worth discussing
 the matter now.

This change would make the Seq objects behave more like a Python programmer
would expect, on the other hand Biopython have been built on the current
model and this might be a bad idea to change after so much time.


Since the only real problem with this is the replacement of the str() method
all boiled down to how frequently people use the actual string method of Seq
in their code?
I do not have the impression it is very frequent but ...

What do you think ?

Fred

-- 
Frédéric Sohm
Equipe INRA U1126 "Morphogenèse du système nerveux des Chordés"
UPR 2197 DEPSN, CNRS
Institut de Neurosciences A. Fessard
1 Avenue de la Terrasse
91 198 GIF-SUR-YVETTE
FRANCE
Phone: +33 (0) 1 69 82 34 12
Fax:+33 (0) 1 69 82 34 47



More information about the Biopython-dev mailing list