[Bioperl-l] counting gaps in a column of alignment

Thu Jul 25 02:17:07 EDT 2013

Thanks for the reply.

I did try other regex like,
$col eq $gap_char
$col =~ /-/
$col =~ /\-/
$col = "-"

Nothing seems to recognize the '-' gap character. The program returns a
warning like this,

"MSG: Slice [5-5] of sequence [3/1-6] contains no residues. Sequence
excluded from the new alignment".
and doesn't seem to count the gaps in the column.

No worries, that was a typo while getting to the mail list ..have got the
correct one:)

On Thu, Jul 25, 2013 at 3:54 PM, Alexey Morozov <alexeymorozov1991 at gmail.com
> wrote:

> Why would you use a regexp (which by the way I cannot understand: what are
> \Q and \E?) and not simply
>
> elsif($col eq $gap_char)
>
> $col is always one char long, so KISS.
> Also I hope I'll not sound like your school teacher picking all minor bugs,
> but you have just \ and not \t in header printing. And %count doubles your
> five $count_foo variables.
>
>
>
> 2013/7/25 subha kalyanamoorthy <sksweety24 at gmail.com>
>
> > Hi there,
> >
> > I am a new bioperl user.  I made a script to count the nucleotide
> > composition in each column of the alignment. I am able to get the count
> for
> > the nucleotides, but not for the gap characters from the following
> program.
> > I would greatly appreciate any suggestions to correct this program.
> >
> > Thanks.
> >
> > #***************************My Program**********************************
> >
> > #!/bin/perl -w
> > use strict;
> > use warnings;
> >
> > use List::Util 'max';
> > use Bio::SimpleAlign;
> > use Bio::Align::AlignI;
> > use Bio::AlignIO;
> > use Bio::SeqIO;
> >
> > my $in= Bio::AlignIO->new( -file => "seq.fst", -format => "fasta");
> >
> > my $align = $in->next_aln();
> >
> > print "column\tA's\tT's\C's\G's\n";
> >
> > for (my $i = 1; $i <= $align->length; $i++) {
> >
> >  my %count;
> >
> >  my $seqs = $align->slice($i,$i);
> >
> > my $gap_char = $seqs->gap_char();
> >
> >     my $count_A=0;
> >     my $count_C=0;
> >     my $count_T=0;
> >     my $count_G=0;
> >     my $count_N=0;
> >     my $count_gap=0;
> >
> >     foreach my $seq ($seqs->each_seq) {
> >
> >     my $col=$seq->seq;
> >
> >         if ($col eq 'A'){
> >         $count_A++;
> >         }elsif ($col eq 'C'){
> >         $count_C++;
> >         }elsif ($col eq 'T'){
> >         $count_T++;
> >         }elsif ($col eq 'G'){
> >         $count_G++;
> >         }elsif ( $col eq 'N'){
> >         $count_N++;
> >         }elsif ($col =~ m/^\Q$gap_char\E$/){
> >         $count_gap++;
> > }
> >       $count{$seq->seq} += 1;
> >  }
> >
> > print"$i\t$count_A\t$count_T\t$count_C\t$count_G\n";
> >
> > }
> >
> > #***********************************************************************`
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> Alexey Morozov,
> LIN SB RAS, bioinformatics group.
> Irkutsk, Russia.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>