[Bioperl-l] calculate the frequency of occurrence of themostcommonly observed amino acid at each position ofmultiplesequence alignment

Sat Feb 7 15:36:38 EST 2009

oops-bugs in that. Try

> my $len  = length($seqs[0]);
> my @residue_counts;
> my %h;
> foreach (0..$len-1) {
>  %h = ();
>  foreach $seq (@seqs) {
>    $h{ substr($seq, $_, 1) }++;
> }
> push @residue_counts, {%h};
> }

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Dylan Krishnan" <dylankrishnan at gmail.com>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Saturday, February 07, 2009 11:56 AM
Subject: Re: [Bioperl-l] calculate the frequency of occurrence of 
themostcommonly observed amino acid at each position ofmultiplesequence 
alignment

> Dylan- It's worth mentioning that the BioPerl method is very overhead-heavy; 
> all
> the objects make it easy to just write a few lines, but probably won't be the 
> absolute
> fastest way to do what you want. Another path to follow would be
>
> # your seqs are plain strings in the array @seqs, and are aligned and same 
> length
> my $len  = length($seqs[0]);
> my @residue_counts;
> foreach (0..$len-1) {
>  my %h = ();
>  foreach $seq (@seqs) {
>    $h{ substr($seq, $_, 1) }++;
> }
> push @residue_counts, \%h;
> }
>
> Now, for each elt in @residue_counts (each elt is a reference to a hash), look 
> for the
> key that has the maximum hash value. The snippet above is also worth working
> through for the educational value, esp. w/r to using hashes, which (IMHO) are 
> one of
> the absolutely coolest thing about Perl.
>
> cheers- MAJ
>  ----- Original Message ----- 
>  From: Dylan Krishnan
>  To: Mark A. Jensen
>  Cc: bioperl-l at lists.open-bio.org
>  Sent: Saturday, February 07, 2009 11:43 AM
>  Subject: Re: [Bioperl-l] calculate the frequency of occurrence of the 
> mostcommonly observed amino acid at each position of multiplesequence 
> alignment
>
>
>  thanks mark!
>
>  the authors other approach is to load the alignment into a MS Excel worksheet 
> and use the "autofilter" procedure to count the occurrences of any residue 
> position of the alignment. the claim is "that excel is uselful for this 
> purpose."sounds reasonable for 10 alignments but not 2000!
>
>  again, many thanks.
>
>
>  -dylan
>
>  On Sat, Feb 7, 2009 at 10:25 AM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>    Dylan,
>
>    This is an extremely good exercise for anyone learning Perl to do 
> bioinformatics.
>    When you have done many exercises like this, you will see what people mean
>    when they say it is very straightforward.
>
>    Here are some hints:
>
>    Use the "entropy" scrap at 
> http://www.bioperl.org/wiki/Site_entropy_in_an_alignment .
>    You will convert the function entropy_by_column() into the function you 
> need.
>    Replace the line
>
>    $ent{$col} = entropy(values %res);
>
>    with a line you will write using the "hash key at max value" scrap, found
>    here: http://www.bioperl.org/wiki/Hash_key_at_the_max_value .
>
>    Happy coding!
>    Mark
>
>    ----- Original Message ----- From: "Dylan Krishnan" 
> <dylankrishnan at gmail.com>
>    To: <bioperl-l at lists.open-bio.org>
>    Sent: Saturday, February 07, 2009 11:10 AM
>    Subject: [Bioperl-l] calculate the frequency of occurrence of the 
> mostcommonly observed amino acid at each position of multiplesequence 
> alignment
>
>
>
>      I am new to perl but this is somethign I am seeking to do either through 
> a
>      bioperl module or just perl. Apparently, this is quite "straightforward
>      using PERL," but I beg to differ.
>
>      Any assistance regarding this matter would be greatly appreciated.
>
>      Thanks!
>
>      -dylan
>
>      _______________________________________________
>      Bioperl-l mailing list
>      Bioperl-l at lists.open-bio.org
>      http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>