[BioPython] The count method of a Seq (or MutableSeq) object
Peter
biopython at maubp.freeserve.co.uk
Thu Mar 5 16:34:37 UTC 2009
On Thu, Mar 5, 2009 at 4:28 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> This is a little deja vu as I feel this type of thing has come up
> before. While I can not speak for anyone else, if I sound different to
> that, then I was obviously convinced by those arguments as that
> sounds better than I forgot :-)
>
> More seriously, ignoring the reading fame or the genetic code when
> counting is rather bad form!
Why? In many situations they are irrelevant. Consider counting
restriction enzyme digest sites for example, plus of counting in any
protein sequences.
> I can not think of a relevant case involving a protein sequence -
> although counting pairs of cysteines in insulin-like sequences could
> be a situation of importance (related to disulphide bonds).
>
> An example for nucleic sequences, counting 'TTT' in the madeup
> sequence 'TTTTTTTGG' can be two in frames 1 and 2 but only one in
> frame 3.
Giving an answer of 2 (using a non overlapping search like the python
string method) or 5 (using an overlapping search) are valid expected
outcomes for "TTT" in "TTTTTTTGG".
Here you seem want to count codons - which is by its nature a frame
dependent task.
Peter
More information about the Biopython
mailing list