[BioPython] Making the Seq object act more like a string

Michiel de Hoon mdehoon at c2b2.columbia.edu
Mon Sep 10 09:56:25 UTC 2007


Let's have the Seq/MutableSeq/SeqRecord discussion after the upcoming 
release, which is only five days away. There's not enough time to 
discuss these issues in detail, let alone to test them.

--Michiel.


Peter wrote:
> We seem to be talking at cross purposes.
> 
> Michiel de Hoon wrote:
>> Peter wrote:
>>> I would like to make the following "small" change now, ready for
>>> the next release of Biopython:
>>>
>>> (1) Make __str__ give the full sequence as a string for Seq and 
>>> MutableSeq objects, allowing intuitive use of str(myseq) which used
>>> to give a truncated representation including the alphabet.
>>
>> Note that the __str__ is used to create the output of "print myseq",
>>  where myseq is a Seq object. So if __str__ returns the full sequence
>>  string, then "print myseq" will print the full sequence. This is not
>>  necessarily what you want.
> 
> Getting the full string from both "print my_seq" and str(my_seq) is what
> I would expect from a Seq object that acted like a string.
> 
>> In essence, the str() function and the .tostring() method have
>> different functions. So I think we should not drop .tostring() in
>> favor of str().
> 
> At the moment str() and .tostring() do serve purposes.  Currently with a 
> Seq object called my_seq:
> * full sequence as string - my_seq.tostring()
> * representation with full sequence with alphabet - repr(my_seq)
> * truncated sequence as string - not built in
> * representation with truncated sequence with alphabet - str(my_seq)
> 
> What I would like:
> * full sequence as string - str(my_seq) and retain my_seq.tostring() for 
> backwards compatibility.
> * representation with full sequence with alphabet - repr(my_seq)
> * truncated sequence as string - not built in
> * representation with truncated sequence with alphabet - consider added 
> a new method e.g. my_seq.short()
> 
>> Moreover, this problem will go away if and when a Seq object
>> subclasses from a string object. Then, we won't need a Seq-to-string
>> function at all.
> 
> What do you mean by the "problem will go away"?  This would be much
> easier to discuss in person :(
> 
> If/when we make Seq a subclass of string, there would still be __str__
> and __repr__ methods, and I would expect str(my_seq) and also "print
> my_seq" to give the full sequence.  For backwards compatibility I would
> keep the existing .tostring() method as well.
> 
> I would find it very strange to have the Seq object subclass string, but 
> doing str(my_seq) not give me the full sequence.  Isn't making 
> str(my_seq) return the full sequence as a string is essential for things 
> like this?:
> 
> print my_seq
> print "My sequence is %s, length %i" % (my_seq, len(my_seq))
> 
> Rather than as currently required:
> 
> print my_seq.tostring()
> print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq))
> 
> 
> Peter
> 




More information about the Biopython mailing list