[BioPython] More string methods for the Seq object

Fri Sep 26 16:42:19 UTC 2008

>> Bug 2351 comment 15 - adding a split method
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15
>> Here I have suggested the separator be non-optional (for strings this
>> defaults to white space)
>
>  please apologize my ignorance but what is this useful for?

Support you had translated a nucleotide sequence into for example,
"SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" (as a Seq object with a protein
alphabet).  You might want to split the sequence at terminators, to
get the open reading frames (and then filter them on length).  Right
now the Seq object doesn't have a split method so you would have to
switch to using python strings (and then go back to a Biopython Seq
object later if need be).

>> Bug 2596 - adding strip, rstrip and lstrip
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2596
>> Here I have suggested these default to stripping gap characters (for
>> strings these default to stripping white space)
>
>  Again, what is this useful for? Aren't there checks for quality
> of the sequence when one tries to instantiate the object?

I'm not sure what you mean by quality of the sequence here (are you
talking about sequencing quality scores?)

Suppose you have some sequences which you have aligned in ClustalW,
and most have leading or trailing gaps characters.  e.g.  Given
"---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet)
you might want to strip off the leading and trailing gaps to have just
"SAD-KCNKADND"  (as a Seq object with the same alphabet).  Right now
the Seq object doesn't have a strip method, so you would have to
switch to a string and back again.

I could write these up as examples in python if it would help.

Peter