[Bioperl-l] calculate the frequency of occurrence of themostcommonly observed amino acid at each position ofmultiplesequence alignment

Dylan Krishnan dylankrishnan at gmail.com
Sat Feb 7 15:51:39 EST 2009


Thanks Mark! I'm still working on this - as a newbie, I'm still digesting
your suggestions - here is what I think I want to do for a multiple sequence
alignment  -

1. find the total number of residues,n, in the alignment
2. find the total number of a specific residue, x, in an alignment
3. find the totalk number of times a residue,x, appears at a specific site
4. total number of sequences in an alignment.

I initially thought about writing a single script to generate all these
parameters but now think four separate (read: unsophisticated and utterly
reductionist) scripts will do...

I think your suggestions will clearly help me on this quest!

-dylan


On Sat, Feb 7, 2009 at 2:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> oops-bugs in that. Try
>
>  my $len  = length($seqs[0]);
>> my @residue_counts;
>> my %h;
>> foreach (0..$len-1) {
>>  %h = ();
>>  foreach $seq (@seqs) {
>>   $h{ substr($seq, $_, 1) }++;
>> }
>> push @residue_counts, {%h};
>> }
>>
>
>
> ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Dylan Krishnan" <dylankrishnan at gmail.com>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Saturday, February 07, 2009 11:56 AM
> Subject: Re: [Bioperl-l] calculate the frequency of occurrence of
> themostcommonly observed amino acid at each position ofmultiplesequence
> alignment
>
>
>
>  Dylan- It's worth mentioning that the BioPerl method is very
>> overhead-heavy; all
>> the objects make it easy to just write a few lines, but probably won't be
>> the absolute
>> fastest way to do what you want. Another path to follow would be
>>
>> # your seqs are plain strings in the array @seqs, and are aligned and same
>> length
>> my $len  = length($seqs[0]);
>> my @residue_counts;
>> foreach (0..$len-1) {
>>  my %h = ();
>>  foreach $seq (@seqs) {
>>   $h{ substr($seq, $_, 1) }++;
>> }
>> push @residue_counts, \%h;
>> }
>>
>> Now, for each elt in @residue_counts (each elt is a reference to a hash),
>> look for the
>> key that has the maximum hash value. The snippet above is also worth
>> working
>> through for the educational value, esp. w/r to using hashes, which (IMHO)
>> are one of
>> the absolutely coolest thing about Perl.
>>
>> cheers- MAJ
>>  ----- Original Message -----  From: Dylan Krishnan
>>  To: Mark A. Jensen
>>  Cc: bioperl-l at lists.open-bio.org
>>  Sent: Saturday, February 07, 2009 11:43 AM
>>  Subject: Re: [Bioperl-l] calculate the frequency of occurrence of the
>> mostcommonly observed amino acid at each position of multiplesequence
>> alignment
>>
>>
>>  thanks mark!
>>
>>  the authors other approach is to load the alignment into a MS Excel
>> worksheet and use the "autofilter" procedure to count the occurrences of any
>> residue position of the alignment. the claim is "that excel is uselful for
>> this purpose."sounds reasonable for 10 alignments but not 2000!
>>
>>  again, many thanks.
>>
>>
>>  -dylan
>>
>>  On Sat, Feb 7, 2009 at 10:25 AM, Mark A. Jensen <maj at fortinbras.us>
>> wrote:
>>
>>   Dylan,
>>
>>   This is an extremely good exercise for anyone learning Perl to do
>> bioinformatics.
>>   When you have done many exercises like this, you will see what people
>> mean
>>   when they say it is very straightforward.
>>
>>   Here are some hints:
>>
>>   Use the "entropy" scrap at
>> http://www.bioperl.org/wiki/Site_entropy_in_an_alignment .
>>   You will convert the function entropy_by_column() into the function you
>> need.
>>   Replace the line
>>
>>   $ent{$col} = entropy(values %res);
>>
>>   with a line you will write using the "hash key at max value" scrap,
>> found
>>   here: http://www.bioperl.org/wiki/Hash_key_at_the_max_value .
>>
>>   Happy coding!
>>   Mark
>>
>>   ----- Original Message ----- From: "Dylan Krishnan" <
>> dylankrishnan at gmail.com>
>>   To: <bioperl-l at lists.open-bio.org>
>>   Sent: Saturday, February 07, 2009 11:10 AM
>>   Subject: [Bioperl-l] calculate the frequency of occurrence of the
>> mostcommonly observed amino acid at each position of multiplesequence
>> alignment
>>
>>
>>
>>     I am new to perl but this is somethign I am seeking to do either
>> through a
>>     bioperl module or just perl. Apparently, this is quite
>> "straightforward
>>     using PERL," but I beg to differ.
>>
>>     Any assistance regarding this matter would be greatly appreciated.
>>
>>     Thanks!
>>
>>     -dylan
>>
>>     _______________________________________________
>>     Bioperl-l mailing list
>>     Bioperl-l at lists.open-bio.org
>>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>



More information about the Bioperl-l mailing list