[BioPython] The count method of a Seq (or MutableSeq) object

Peter biopython at maubp.freeserve.co.uk
Thu Mar 5 16:34:37 UTC 2009


On Thu, Mar 5, 2009 at 4:28 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> This is a little deja vu as I feel this type of thing has come up
> before. While I can not speak for anyone else, if I sound different to
> that, then I was obviously convinced by those arguments as  that
> sounds better than I forgot :-)
>
> More seriously, ignoring the reading fame or the genetic code when
> counting is rather bad form!

Why?  In many situations they are irrelevant.  Consider counting
restriction enzyme digest sites for example, plus of counting in any
protein sequences.

> I can not think of a relevant case involving a protein sequence -
> although counting pairs of cysteines in insulin-like sequences could
> be a situation of importance (related to disulphide bonds).
>
> An example for nucleic sequences, counting 'TTT' in the madeup
> sequence  'TTTTTTTGG' can be two in frames 1 and 2 but only one in
> frame 3.

Giving an answer of 2 (using a non overlapping search like the python
string method) or 5 (using an overlapping search) are valid expected
outcomes for "TTT" in "TTTTTTTGG".

Here you seem want to count codons - which is by its nature a frame
dependent task.

Peter




More information about the Biopython mailing list