[BioPython] Making the Seq object act more like a string
Michiel de Hoon
mdehoon at c2b2.columbia.edu
Mon Sep 10 09:56:25 UTC 2007
Let's have the Seq/MutableSeq/SeqRecord discussion after the upcoming
release, which is only five days away. There's not enough time to
discuss these issues in detail, let alone to test them.
--Michiel.
Peter wrote:
> We seem to be talking at cross purposes.
>
> Michiel de Hoon wrote:
>> Peter wrote:
>>> I would like to make the following "small" change now, ready for
>>> the next release of Biopython:
>>>
>>> (1) Make __str__ give the full sequence as a string for Seq and
>>> MutableSeq objects, allowing intuitive use of str(myseq) which used
>>> to give a truncated representation including the alphabet.
>>
>> Note that the __str__ is used to create the output of "print myseq",
>> where myseq is a Seq object. So if __str__ returns the full sequence
>> string, then "print myseq" will print the full sequence. This is not
>> necessarily what you want.
>
> Getting the full string from both "print my_seq" and str(my_seq) is what
> I would expect from a Seq object that acted like a string.
>
>> In essence, the str() function and the .tostring() method have
>> different functions. So I think we should not drop .tostring() in
>> favor of str().
>
> At the moment str() and .tostring() do serve purposes. Currently with a
> Seq object called my_seq:
> * full sequence as string - my_seq.tostring()
> * representation with full sequence with alphabet - repr(my_seq)
> * truncated sequence as string - not built in
> * representation with truncated sequence with alphabet - str(my_seq)
>
> What I would like:
> * full sequence as string - str(my_seq) and retain my_seq.tostring() for
> backwards compatibility.
> * representation with full sequence with alphabet - repr(my_seq)
> * truncated sequence as string - not built in
> * representation with truncated sequence with alphabet - consider added
> a new method e.g. my_seq.short()
>
>> Moreover, this problem will go away if and when a Seq object
>> subclasses from a string object. Then, we won't need a Seq-to-string
>> function at all.
>
> What do you mean by the "problem will go away"? This would be much
> easier to discuss in person :(
>
> If/when we make Seq a subclass of string, there would still be __str__
> and __repr__ methods, and I would expect str(my_seq) and also "print
> my_seq" to give the full sequence. For backwards compatibility I would
> keep the existing .tostring() method as well.
>
> I would find it very strange to have the Seq object subclass string, but
> doing str(my_seq) not give me the full sequence. Isn't making
> str(my_seq) return the full sequence as a string is essential for things
> like this?:
>
> print my_seq
> print "My sequence is %s, length %i" % (my_seq, len(my_seq))
>
> Rather than as currently required:
>
> print my_seq.tostring()
> print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq))
>
>
> Peter
>
More information about the Biopython
mailing list