[Biopython-dev] Storing reading frame / strand of nucleic acid sequences used for creating protein sequences

Wed Jul 3 13:35:08 UTC 2013

Hi Bow,

I'm quite inerested in setting up a standard way to deal with reading
frames, as this is the job I would like to implement in the Codon Alignment
project this summer.

Getting reading frame information from TBLASTX result seems to be a good
idea and I may implement a funtion to deal with it.

As to how to store this information, I favor the CodonSeq
<https://github.com/zruan/biopython/blob/master/Bio/CodonAlign/__init__.py#L62>class
I've written since frameshift occurrs at DNA/RNA level. I have a
CodonAlphabet to check the codon sequene in CodonSeq. Maybe an enhancement
of the Alphabet and CodonSeq will better taking fameshift into account.

If you are seeking a standard way to represent frameshift in protein level,
you may want to read the methods section of pal2nal paper. For example,
M2P indicates that there is 1 nt deletion between methionine and proline.
But this nontheless violates the ProteinAlphabet we've defined in Biopython.

I just came to Shanghai and is about to writting code for this week. I am
interested in hearing your suggestions.

Thanks!

Ruan

On Wed, Jul 3, 2013 at 6:32 AM, Wibowo Arindrarto <w.arindrarto at gmail.com>wrote:

> Hi everyone,
>
> I'm wondering if we have a standard way of storing the reading frames
> of DNA/RNA sequences used for creating protein sequences?
>
> In some cases, keeping track of the original reading frame may be
> desirable. (e.g. in TBLASTX alignments, where users want to know the
> reading frames of both the query and the hit sequences).
>
> I realize that it is possible to store strand information in
> SeqFeature objects. However, I am afraid that storing the strand /
> reading frame information for SeqFeatures of Seq objects with
> ProteinAlphabets may seem misleading as the strand information belongs
> to the DNA / RNA sequence that was used as the protein template, not
> the protein itself.
>
> On a related note, I'm also wondering if for the Seq object's
> translate method, there should be an argument to specify which reading
> frame we want use to translate the sequence?
>
> This can be trivially solved using Python's convenient indexing and
> slicing system. However, indexing and/or slicing does not allow us to
> keep track of the original DNA/RNA reading frame (at least not the way
> it is implemented now).
>
> Let me know what you think :).
>
> Best regards,
> Bow
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>