[Bioperl-l] Re: Frameshifts in alignments ... ?

Aaron J Mackey Aaron J. Mackey" <amackey@virginia.edu
Tue, 3 Sep 2002 10:10:11 -0400 (EDT)


On Tue, 3 Sep 2002, Ewan Birney wrote:

> BTW - we should call them  Bio::Seq::EncodedSequence

Great ... (just out of curiousity, why not Bio::Seq::EncodedSeq ... or
just Bio::Seq::Encoded; is there a reason for the redundancy?)

> Remember that the "encoding" is as well as the bases, ie, one effectively
> has two "tracks", being
>
>    CCCCCCCCCCCIIIIIIIIIIIIIIIIIIIIIIICCCCCGGGCCCC
>    ATGGGTGTATGTATTGTGTAAAAAGAATGTTAAGGTTGT---GTET

> I am happy to get into this. I would propose the following encodings:

> I could adapt genewise to directly output this stuff.

I guess I'm not sure why we need an *internal* encoding like this; I would
argue that the various methods I proposed would be easier via the
SeqFeature annotation representation (since relative to the length of the
sequence, the number of gap/intron/frameshift locations should be small).
Or do you just mean that this encoding should be available for dumping via
$obj->encoding() (and perhaps acceptable to a new() constructor)?

$obj = new Bio::Seq::EncodedSequence (-encoding => "CCCCCCCCCCCIIIIIIIIIIIIIIIIIIIIIIICCCCCGGGCCCC",
                                      -sequence => "ATGGGTGTATGTATTGTGTAAAAAGAATGTTAAGGTTGTGTET",
                                      -start => 100, -end => 128, -strand => 1
                                     );

There was also my "embedded" encoding (which is what we tend to see in
alignment outputs), with frameshift (/, \), intron boundaries ([...]) and
gap characters, that I proposed could be obtained via as_string():

ATGGGT/GTATG[TATTGTGTAAAAAG]AATGT\TAAGGTTGT---GTET

I guess now I'm inching towards an Bio::SeqIO::encoded::wise,
Bio::SeqIO::encoded::tfastx, ... ?

> Are you keen to code this up Aaron... or hoping I would ?

I'm good to go, given that I understand the desired direction ... and I
do agree TIMTOWTDI and all.

-Aaron

-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey@virginia.edu