[BioPython] More string methods for the Seq object

Sat Sep 27 01:55:18 UTC 2008

On Fri, Sep 26, 2008 at 4:57 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>> I suspect you have misunderstood my intension.  My Seq object .strip()
>>> method would NOT remove the given characters from the interior of the
>>> sequence - only from the ends.
>>>
>>> However, there is certainly a case for wanting an .ungap() method for
>>> the Seq class (or a more general method to remove all of a particular
>>> character), but I hadn't intended to raise this issue yet.
>>>
>>> Peter
>>
>> Yes, sorry about that. I misunderstood because I confused myself with the
>> first part that uses the split.
>>
>> Bruce
>
> Fair enough - maybe I shouldn't have tackled both methods in one
> email... but I'm glad we cleared that up.
>
> Anyway - do think adding the split and strip methods to the Seq object
> is worthwhile?
>
> Peter
>
Yes - in fact probably essential now many users are likely to have to
and want to parse genome sequences.

I really would like to see many of the sequence methods 'work' in the
same manner Python string methods. The string methods that I use a lot
for sequences are:
strip
split
join
find

(I don't the 'l' and 'r' versions very much.)
So you would address the first two.

I do something like your ungap() idea with strings using join:
>>> ''.join(sequence.split('-'))

Python 2.5 introduced 'partition(sep): Split the string at the first
occurrence of sep, and return a 3-tuple containing the part before the
separator, the separator itself, and the part after the separator'.
While I don't use it (because I usually split multiple times) it has
advantages if you are looking for the first occurrence of a patten:
>>> a='GTATGCGTAATG'
>>> a.partition('ATG')
('GT', 'ATG', 'CGTAATG')

Regards
Bruce