[Bioperl-l] alignable portion of a genome
Smithies, Russell
Russell.Smithies at agresearch.co.nz
Tue May 12 21:27:57 UTC 2009
Adding the mutations is a little hacky (and probably slow) but I think it works correctly.
The stats should work out OK but it's too early and I haven't had a coffee yet so can't be sure :-)
--Russell
============================
#!perl -w
my $seq = "atcgacgatcgaacgatcga";
my $debug = 0;
foreach ($seq =~ /(?=(\w{5}))/g){
$h++;
# add all the exact words to the hash
$hash{$_}++;
print "$_\n" if $debug;
# mutate words and add to hash
my at rr = mutate($_);
foreach (@rr){
print "$_\n" if $debug;
$h++;
$hash{$_}++;
}
}
# print out the hash counts & stats
foreach (keys %hash){
print "$_\t$hash{$_}\n" if $debug;
$singles++ if($hash{$_} eq 1);
}
print $singles/$h,"\n";
sub mutate{
my @array = split '',shift;
my @res = ();
my $rep = 'X';
for(my$i = 0; $i <= $#array; $i++){
my $old1 = $array[$i];
splice @array, $i, 1, $rep;
push @res, (join '', @array);
for(my$j = $i+1; $j <= $#array; $j++){
my $old2 = $array[$j];
splice @array, $j, 1, $rep;
push @res, (join '', @array);
splice @array, $j, 1, $old2;
}
splice @array, $i, 1, $old1;
}
return @res;
}
================================
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> Sent: Tuesday, 12 May 2009 3:56 p.m.
> To: 'fadista'; 'Bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] alignable portion of a genome
>
> Perfect matches is easy:
>
> $seq = "atcgacgatcgaacgatcga";
>
> foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++}
> foreach (keys %hash){ $singles++ if($hash{$_} eq 1)}
> print $singles/$h;
>
> Could probably be done with map as well.
> Counting the miss-matches might take a bit more thinking....
> Any ideas MAJ?
>
> --Russell
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of fadista
> > Sent: Monday, 11 May 2009 9:32 p.m.
> > To: Bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] alignable portion of a genome
> >
> >
> > Hi,
> >
> > I would like to know of a good and fast way that could help me calculate the
> > alignable portion of a genome (not human), given a reference sequence.
> > When I say alignable portion I mean that I want to know all the positions of
> > the genome that can be covered uniquely by reads of 36 bp and up to 2
> > mismatches.
> >
> > Some have advised me to work with Perl using the following strategy but I am
> > not a Perl user so if someone has already a script for this function, it
> > would be nice:
> >
> > "you could approach it by walking along the genome in a sliding window of
> > 36 nt, and hash the frequency of each 36 nt sequence that you encounter.
> > Then count how many of the 36 nt sequences had a frequency of exactly
> > one. Divide this by the total number of 36nt windows visited. This
> > should be do-able in about 20 lines of Perl."
> >
> >
> > Best regards and thanks in advance
> >
> > --
> > View this message in context: http://www.nabble.com/alignable-portion-of-a-
> > genome-tp23480025p23480025.html
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list