[Bioperl-l] Information content of alignment

Shawn Hoon shawnh at stanford.edu
Thu May 13 11:02:10 EDT 2004


On May 13, 2004, at 6:32 AM, martin wrote:

> Hi Malay,
>
> Not quite sure what you mean by 'information content'.  You can access 
> a
> single column of an alignment using the slice() function:
>
> $aln2 = $aln->slice(20, 30)
>
> which returns another AlignI object.  So something like;
>
> foreach (0..$aln->length){
> 	my $column=$aln->slice($_, $_);
> 	# $column is now an AlignI object
> 	# do something with it....
> }
>


I had written something similar for Bio::Graphics::Pictogram, but there 
is nothing explicit
right now that I can think of. Maybe it would be useful to add to 
SimpleAlign.
Something I would do, continuing from the code above, once you get the 
slice you can start counting the frequencies:
my $pos = 1;
foreach (0..$aln->length){
	my $column = $aln->slice($_,$_);
	my @seq = $column->each_seq;
	my $total = 0;
	foreach my $letter(@seq){
	         $hash{$pos}{$letter->seq}++;
		$total++;
	}
	$hash{$pos}{'total'} = $total;
	$pos++;
}
#calculate entropy
foreach my $pos(sort{$a<=>$b} keys %hash){
   my $ent;
   foreach my $base(keys %{$hash{$pos}}){
     my $freq = $hash{$pos}{$base}/$hash{$pos}{'total'};
     $ent += -1 * $freq*log2($freq);
   }
   print "Position $pos, entropy: $ent bits \n";
}

sub log2{
   my ($x) = @_;
   return 0 if $x==0;
   return log($x)/log(2);
}


> you can get the documentation with
>
> % perldoc Bio::Align::AlignI
>
> If you let me know what you want to do with the column, maybe I can 
> give
> some more advice.
>
> Cheers
>
> Martin
>
> 	
>
> On Wed, 2004-05-12 at 18:56, Malay wrote:
>> Hi Bioperlers:
>>
>> Perdon my ignorance. I could not remove by haze about the numerous
>> bioperl modules. I looked as AlignI interface but could not gather the
>> answer to my question:
>>
>> Is there any way to quickly calculate information content of each 
>> column
>> of the alignment in bioperl?
>>
>> Any pointers or source code would be appreciated. Otherwise, I have to
>> get my hand dirty.
>>
>> Cheers,
>>
>> Malay
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> Martin Jones
> The Nematode Genomics Lab
> Institute of Cell, Animal and Population Biology
> University of Edinburgh
> King's Buildings
> West Mains Road
> Edinburgh, EH9 3JT
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list