[Bioperl-l] Re: Frameshifts in alignments ... ?

Thu, 5 Sep 2002 03:05:19 -0400 (EDT)

On Thu, 5 Sep 2002, Ewan Birney wrote:

> On Wed, 4 Sep 2002, Aaron J Mackey wrote:
>
> >
> > package Bio::EncodedSeq;
>
> I think we should go for Bio::Seq::EncodedSeq
ditto here and ditto about being brave below!
>
>
> >
> > use strict;
> > use Bio::LocatableSeq;
> >
> > @ISA = qw(Bio::LocatableSeq);
> >
> > =head2 new
> >  Title   : new
> >  Usage   : $obj = Bio::EncodedSeq->new(-dnaseq   => "AGTACGTGTCATG",
> >                                        -encoding => "CCCCCCFCCCCCC",
> >                                        -id       => "myseq",
> >                                        -start    => 1,
> >                                        -end      => 13,
> >                                        -strand   => 1
> >                                       );
> >  Function: creates a new Bio::EncodedSeq object from a supplied DNA
> >            sequence
> >  Returns : a new Bio::EncodedSeq object
> >  Args    : dnaseq   - primary nucleotide sequence used to encode the
> >                       protein
> >            encoding - a string of characters (see Encoding Table)
> >                       describing backwards frameshifts implied by the
> >                       encoding but not present in the sequence will be
> >                       added (as '-'s) to the sequence.  If not
> >                       supplied, it will be assumed that all positions
> >                       are coding (C).  Encoding may include either
> >                       implicit phase encoding characters (i.e. "CCC")
> >                       and/or explicit encoding characters (i.e. "CDE").
> >                       Alternatively, encoding may be a hashref
> >                       datastructure, with encoding characters as keys
> >                       and Bio::LocationI objects (or arrayrefs of
> >                       Bio::LocationI objects) as values, e.g.:
> >                       { C => [ Bio::Location::Simple->new(1,9),
> >                                Bio::Location::Simple->new(11,13) ],
> >                         F => Bio::Location::Simple->new(10,10),
> >                       } # same as "CCCCCCCCCFCCC"
> >            id, start, end, strand - as with Bio::LocatableSeq; note
> >                       that the coordinates are relative to the
> >                       encoding DNA sequence, not the implicit protein
> >                       sequence.
> > =cut
> >
> > =head2 encoding
> >  Title   : encoding
> >  Usage   : $obj->encoding("CCCCCC");
> >            $obj->encoding( -encoding => { I => $location } );
> >            $enc = $obj->encoding(-explicit => 1);
> >            $enc = $obj->encoding("CCCCCC", -explicit => 1);
> >            $enc = $obj->encoding(-location => $location,
> >                                  -explicit => 1 );
> >  Function: get/set the objects encoding, either globally or by location(s).
> >  Returns : the (possibly new) encoding string.
> >  Args    : encoding - see the encoding argument to the new() function.
> >            explicit - whether or not to return explicit phase
> >                       information in the coding (i.e. "CCC" becomes
> >                       "CDE", "III" becomes "IJK", etc); defaults to 0.
> >            location - optional; location to get/set the encoding.
> >                       Defaults to the entire sequence.
> > =cut
> >
> > =head2 cds
> >  Title   : cds
> >  Usage   : $cds = $obj->cds();
> >  Function: obtain the "spliced" DNA sequence, by removing any
> >            nucleotides that participate in an UTR, forward frameshift
> >            or intron, and replacing any unknown nucleotide implied by
> >            a backward frameshift or gap with N's.
> >  Returns : a Bio::EncodedSeq object, with an encoding consisting only
> >            of "CCCC..".
> >  Args    : none.
> > =cut
> >
> > =head2 translate
> >  Title   : translate
> >  Usage   : $prot = $obj->translate(@args);
> >  Function: obtain the protein sequence encoded by the underlying DNA
> >            sequence; same as $obj->cds()->translate(@args).
> >  Returns : a Bio::PrimarySeq object.
> >  Args    : same as the translate() function of Bio::PrimarySeqI
> > =cut
> >
> > =head2 seq
> >  Title   : seq
> >  Usage   : $protseq = $obj->seq();
> >  Function: obtain the raw protein sequence encoded by the underlying
> >            DNA sequence; This is the same as calling
> >            $obj->translate()->seq();
> >  Returns : a string of single-letter amino acid codes
> >  Args :    same as the seq() function of Bio::PrimarySeq; note that this
> >            function may not be used to set the protein sequence; see
> >            the dnaseq() function for that.
> > =cut
> >
> > =head2 dnaseq
> >  Title   : dnaseq
> >  Usage   : $dnaseq = $obj->dnaseq();
> >            $obj->dnaseq("ACGTGTCGT", "CCCCCCCCC");
> >            $obj->dnaseq(-dnaseq => "ATG",
> >                         -encoding => "CCC",
> >                         -location => $loc );
> >  Function: get/set the underlying DNA sequence; will overwrite any
> >            current DNA and/or encoding information present.
> >  Returns : a string of single-letter nucleotide codes, including any
> >            gaps implied by the encoding.
> >  Args    : dnaseq   - the DNA sequence to be used as a replacement
> >            encoding - the encoding of the DNA sequence (see the new()
> >                       constructor); defaults to all 'C'.
> >            location - optional, the location of the DNA sequence to
> >                       get/set; defaults to the entire sequence.
> > =cut
> >
> > [ and all the inherited Bio::LocatableSeq and Bio::PrimarySeqI
> > methods; note that the coordinates of those methods will refer only to
> > the underlying DNA sequence, not the implicit encoded protein sequence
> > - my next task will be to extend Ewan and Heikki's Bio::Coordinate
> > system to include Bio::Coordinate::EncodedPair so that conversions can
> > be made more easily ... any comments on that? ]
>
>
> You are a brave man. Look forward to seeing this in...
>
>
>
> >
> > thanks for reading,
> >
> > -Aaron
> >
> > --
> >  Aaron J Mackey
> >  Pearson Laboratory
> >  University of Virginia
> >  (434) 924-2821
> >  amackey@virginia.edu
> >
> >
> >
> >
>
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>.
> -----------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu