[Bioperl-l] Re: Frameshifts in alignments ... ?
Aaron J Mackey
Aaron J. Mackey" <amackey@virginia.edu
Tue, 3 Sep 2002 10:10:11 -0400 (EDT)
On Tue, 3 Sep 2002, Ewan Birney wrote:
> BTW - we should call them Bio::Seq::EncodedSequence
Great ... (just out of curiousity, why not Bio::Seq::EncodedSeq ... or
just Bio::Seq::Encoded; is there a reason for the redundancy?)
> Remember that the "encoding" is as well as the bases, ie, one effectively
> has two "tracks", being
>
> CCCCCCCCCCCIIIIIIIIIIIIIIIIIIIIIIICCCCCGGGCCCC
> ATGGGTGTATGTATTGTGTAAAAAGAATGTTAAGGTTGT---GTET
> I am happy to get into this. I would propose the following encodings:
> I could adapt genewise to directly output this stuff.
I guess I'm not sure why we need an *internal* encoding like this; I would
argue that the various methods I proposed would be easier via the
SeqFeature annotation representation (since relative to the length of the
sequence, the number of gap/intron/frameshift locations should be small).
Or do you just mean that this encoding should be available for dumping via
$obj->encoding() (and perhaps acceptable to a new() constructor)?
$obj = new Bio::Seq::EncodedSequence (-encoding => "CCCCCCCCCCCIIIIIIIIIIIIIIIIIIIIIIICCCCCGGGCCCC",
-sequence => "ATGGGTGTATGTATTGTGTAAAAAGAATGTTAAGGTTGTGTET",
-start => 100, -end => 128, -strand => 1
);
There was also my "embedded" encoding (which is what we tend to see in
alignment outputs), with frameshift (/, \), intron boundaries ([...]) and
gap characters, that I proposed could be obtained via as_string():
ATGGGT/GTATG[TATTGTGTAAAAAG]AATGT\TAAGGTTGT---GTET
I guess now I'm inching towards an Bio::SeqIO::encoded::wise,
Bio::SeqIO::encoded::tfastx, ... ?
> Are you keen to code this up Aaron... or hoping I would ?
I'm good to go, given that I understand the desired direction ... and I
do agree TIMTOWTDI and all.
-Aaron
--
Aaron J Mackey
Pearson Laboratory
University of Virginia
(434) 924-2821
amackey@virginia.edu