[Bioperl-l] calculate the frequency of occurrence of themostcommonly observed amino acid at each position ofmultiplesequence alignment
Mark A. Jensen
maj at fortinbras.us
Sat Feb 7 15:36:38 EST 2009
oops-bugs in that. Try
> my $len = length($seqs[0]);
> my @residue_counts;
> my %h;
> foreach (0..$len-1) {
> %h = ();
> foreach $seq (@seqs) {
> $h{ substr($seq, $_, 1) }++;
> }
> push @residue_counts, {%h};
> }
----- Original Message -----
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Dylan Krishnan" <dylankrishnan at gmail.com>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Saturday, February 07, 2009 11:56 AM
Subject: Re: [Bioperl-l] calculate the frequency of occurrence of
themostcommonly observed amino acid at each position ofmultiplesequence
alignment
> Dylan- It's worth mentioning that the BioPerl method is very overhead-heavy;
> all
> the objects make it easy to just write a few lines, but probably won't be the
> absolute
> fastest way to do what you want. Another path to follow would be
>
> # your seqs are plain strings in the array @seqs, and are aligned and same
> length
> my $len = length($seqs[0]);
> my @residue_counts;
> foreach (0..$len-1) {
> my %h = ();
> foreach $seq (@seqs) {
> $h{ substr($seq, $_, 1) }++;
> }
> push @residue_counts, \%h;
> }
>
> Now, for each elt in @residue_counts (each elt is a reference to a hash), look
> for the
> key that has the maximum hash value. The snippet above is also worth working
> through for the educational value, esp. w/r to using hashes, which (IMHO) are
> one of
> the absolutely coolest thing about Perl.
>
> cheers- MAJ
> ----- Original Message -----
> From: Dylan Krishnan
> To: Mark A. Jensen
> Cc: bioperl-l at lists.open-bio.org
> Sent: Saturday, February 07, 2009 11:43 AM
> Subject: Re: [Bioperl-l] calculate the frequency of occurrence of the
> mostcommonly observed amino acid at each position of multiplesequence
> alignment
>
>
> thanks mark!
>
> the authors other approach is to load the alignment into a MS Excel worksheet
> and use the "autofilter" procedure to count the occurrences of any residue
> position of the alignment. the claim is "that excel is uselful for this
> purpose."sounds reasonable for 10 alignments but not 2000!
>
> again, many thanks.
>
>
> -dylan
>
> On Sat, Feb 7, 2009 at 10:25 AM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
> Dylan,
>
> This is an extremely good exercise for anyone learning Perl to do
> bioinformatics.
> When you have done many exercises like this, you will see what people mean
> when they say it is very straightforward.
>
> Here are some hints:
>
> Use the "entropy" scrap at
> http://www.bioperl.org/wiki/Site_entropy_in_an_alignment .
> You will convert the function entropy_by_column() into the function you
> need.
> Replace the line
>
> $ent{$col} = entropy(values %res);
>
> with a line you will write using the "hash key at max value" scrap, found
> here: http://www.bioperl.org/wiki/Hash_key_at_the_max_value .
>
> Happy coding!
> Mark
>
> ----- Original Message ----- From: "Dylan Krishnan"
> <dylankrishnan at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Saturday, February 07, 2009 11:10 AM
> Subject: [Bioperl-l] calculate the frequency of occurrence of the
> mostcommonly observed amino acid at each position of multiplesequence
> alignment
>
>
>
> I am new to perl but this is somethign I am seeking to do either through
> a
> bioperl module or just perl. Apparently, this is quite "straightforward
> using PERL," but I beg to differ.
>
> Any assistance regarding this matter would be greatly appreciated.
>
> Thanks!
>
> -dylan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list