[Bioperl-l] counting gaps in a column of alignment

Thu Jul 25 15:50:21 UTC 2013

The best and easiest way to count characters in Perl is using the
"transliterate" function, writing the characters in the search-list and
using an empty replace-list
(http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators), so it
searches for a list of characters but don’t modify the string. The tr///
function always returns the number of characters found in the string:
###
my $string = 'ACGTN---actgn';
my $count_A   = ($string =~ tr/Aa//); # <-- Note that by searching both
uppercase and lowercase characters, the search is made case insensitive
my $count_C   = ($string =~ tr/Cc//);
my $count_T   = ($string =~ tr/Tt//);
my $count_G   = ($string =~ tr/Gg//);
my $count_N   = ($string =~ tr/Nn//);
my $count_gap = ($string =~ tr/-//);

print "$count_A\t$count_C\t$count_T\t$count_G\t$count_N\t$count_gap\n";
exit;
###

Cheers,

Francisco J. Ossandon

-----Mensaje original-----
De: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de sk sweety
Enviado el: jueves, 25 de julio de 2013 2:17
CC: Bioperl-l at lists.open-bio.org
Asunto: Re: [Bioperl-l] counting gaps in a column of alignment

Thanks for the reply.

I did try other regex like,
$col eq $gap_char
$col =~ /-/
$col =~ /\-/
$col = "-"

Nothing seems to recognize the '-' gap character. The program returns a
warning like this,

"MSG: Slice [5-5] of sequence [3/1-6] contains no residues. Sequence
excluded from the new alignment".
and doesn't seem to count the gaps in the column.

No worries, that was a typo while getting to the mail list ..have got the
correct one:)

On Thu, Jul 25, 2013 at 3:54 PM, Alexey Morozov <alexeymorozov1991 at gmail.com
> wrote:

> Why would you use a regexp (which by the way I cannot understand: what 
> are \Q and \E?) and not simply
>
> elsif($col eq $gap_char)
>
> $col is always one char long, so KISS.
> Also I hope I'll not sound like your school teacher picking all minor 
> bugs, but you have just \ and not \t in header printing. And %count 
> doubles your five $count_foo variables.
>
>
>
> 2013/7/25 subha kalyanamoorthy <sksweety24 at gmail.com>
>
> > Hi there,
> >
> > I am a new bioperl user.  I made a script to count the nucleotide 
> > composition in each column of the alignment. I am able to get the 
> > count
> for
> > the nucleotides, but not for the gap characters from the following
> program.
> > I would greatly appreciate any suggestions to correct this program.
> >
> > Thanks.
> >
> > #***************************My 
> > Program**********************************
> >
> > #!/bin/perl -w
> > use strict;
> > use warnings;
> >
> > use List::Util 'max';
> > use Bio::SimpleAlign;
> > use Bio::Align::AlignI;
> > use Bio::AlignIO;
> > use Bio::SeqIO;
> >
> > my $in= Bio::AlignIO->new( -file => "seq.fst", -format => "fasta");
> >
> > my $align = $in->next_aln();
> >
> > print "column\tA's\tT's\C's\G's\n";
> >
> > for (my $i = 1; $i <= $align->length; $i++) {
> >
> >  my %count;
> >
> >  my $seqs = $align->slice($i,$i);
> >
> > my $gap_char = $seqs->gap_char();
> >
> >     my $count_A=0;
> >     my $count_C=0;
> >     my $count_T=0;
> >     my $count_G=0;
> >     my $count_N=0;
> >     my $count_gap=0;
> >
> >     foreach my $seq ($seqs->each_seq) {
> >
> >     my $col=$seq->seq;
> >
> >         if ($col eq 'A'){
> >         $count_A++;
> >         }elsif ($col eq 'C'){
> >         $count_C++;
> >         }elsif ($col eq 'T'){
> >         $count_T++;
> >         }elsif ($col eq 'G'){
> >         $count_G++;
> >         }elsif ( $col eq 'N'){
> >         $count_N++;
> >         }elsif ($col =~ m/^\Q$gap_char\E$/){
> >         $count_gap++;
> > }
> >       $count{$seq->seq} += 1;
> >  }
> >
> > print"$i\t$count_A\t$count_T\t$count_C\t$count_G\n";
> >
> > }
> >
> > #*******************************************************************
> > ****` _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> Alexey Morozov,
> LIN SB RAS, bioinformatics group.
> Irkutsk, Russia.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l