[Biopython-dev] Storing reading frame / strand of nucleic acid sequences used for creating protein sequences

Tue Jul 2 22:32:32 UTC 2013

Hi everyone,

I'm wondering if we have a standard way of storing the reading frames
of DNA/RNA sequences used for creating protein sequences?

In some cases, keeping track of the original reading frame may be
desirable. (e.g. in TBLASTX alignments, where users want to know the
reading frames of both the query and the hit sequences).

I realize that it is possible to store strand information in
SeqFeature objects. However, I am afraid that storing the strand /
reading frame information for SeqFeatures of Seq objects with
ProteinAlphabets may seem misleading as the strand information belongs
to the DNA / RNA sequence that was used as the protein template, not
the protein itself.

On a related note, I'm also wondering if for the Seq object's
translate method, there should be an argument to specify which reading
frame we want use to translate the sequence?

This can be trivially solved using Python's convenient indexing and
slicing system. However, indexing and/or slicing does not allow us to
keep track of the original DNA/RNA reading frame (at least not the way
it is implemented now).

Let me know what you think :).

Best regards,
Bow