[Bioperl-l] counting gaps in sequence data

Michael S. Robeson II popgen23 at mac.com
Fri Oct 15 00:55:34 EDT 2004


Wow, that seems to work pretty well. However, I am unsure of what the 
following line means:

push @{$gaptype{$gap}}, $-[0] + 2;

Especially the  " $-[0] + 2" part of it. I understand that it is an 
array but what is going on there is a little vague. Other than that I 
pretty much understand the code. Also, about the part not being able to 
match gaps at the end of a string will be a problem. I am currently 
working off of what you've posted and seeing if I can fit (using a 
character class I suppose) a "\Z", "\z", or "$" to match any gaps at 
the end of a line.

-Cheers!
-Mike


On Oct 14, 2004, at 17:32, Barry Moore wrote:

> Mike-
>
> Something like this maybe?
>
> use strict;
> use warnings;
>
> my %seqs = (human => "acgtt---cgatacg---acgact-----t",
>            chimp => "acgtacgatac---actgca---ac",
>            mouse => "acgata---acgatcg----acgt");
>
> for my $seq (keys %seqs) { # An array of your sequences
>  print "\n\nThe $seq sequence has the following gaps:\n";
>  my %gaptype;
>  for my $gap (1..5) { # 5 or however large you want gaps to be counted
>    while ($seqs{$seq} =~ /[atgc]-{$gap}[atgc]/g) { #notice that this 
> won't catch terminal gaps
>      #This creates a hash of arrays.  The arrays hold the locations of 
> the
>      #gaps, and the count of each gaptype is determined by the length 
> of that array.
>      push @{$gaptype{$gap}}, $-[0] + 2;
>    }
>    if (defined @{$gaptype{$gap}}) {
>      my $positions = join ", ", @{$gaptype{$gap}};
>      print "\tGap length $gap begining at positions:\t$positions\n";
>    }
>  }
> }
>
> Barry Moore
>
>
> Michael Robeson wrote:



More information about the Bioperl-l mailing list