[Bioperl-l] alignable portion of a genome

Smithies, Russell Russell.Smithies at agresearch.co.nz
Tue May 12 21:27:57 UTC 2009


Adding the mutations is a little hacky (and probably slow) but I think it works correctly.
The stats should work out OK but it's too early and I haven't had a coffee yet so can't be sure :-)

--Russell

============================
#!perl -w

my $seq = "atcgacgatcgaacgatcga";
my $debug = 0;


foreach ($seq =~ /(?=(\w{5}))/g){
	$h++; 
	# add all the exact words to the hash
	$hash{$_}++;
	print "$_\n" if $debug;
	# mutate words and add to hash
	my at rr = mutate($_);
	foreach (@rr){
		print "$_\n" if $debug;
		$h++;
	 	$hash{$_}++;
	}
}


# print out the hash counts & stats
foreach (keys %hash){
	print "$_\t$hash{$_}\n" if $debug;
	$singles++ if($hash{$_} eq 1);
}
print $singles/$h,"\n";


sub mutate{
	my @array = split '',shift;
	my @res = ();
	my $rep = 'X';
	for(my$i = 0; $i <= $#array; $i++){
		my $old1 = $array[$i];
		splice @array, $i, 1, $rep;
		push @res, (join '', @array);
		for(my$j = $i+1; $j <= $#array; $j++){
			my $old2 = $array[$j];
			splice @array, $j, 1, $rep;
			push @res, (join '', @array);
			splice @array, $j, 1, $old2;
		}
		splice @array, $i, 1, $old1;
	}
	return @res;
}

================================

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> Sent: Tuesday, 12 May 2009 3:56 p.m.
> To: 'fadista'; 'Bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] alignable portion of a genome
> 
> Perfect matches is easy:
> 
> 	$seq = "atcgacgatcgaacgatcga";
> 
> 	foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++}
> 	foreach (keys %hash){ $singles++ if($hash{$_} eq 1)}
> 	print $singles/$h;
> 
> Could probably be done with map as well.
> Counting the miss-matches might take a bit more thinking....
> Any ideas MAJ?
> 
> --Russell
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of fadista
> > Sent: Monday, 11 May 2009 9:32 p.m.
> > To: Bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] alignable portion of a genome
> >
> >
> > Hi,
> >
> > I would like to know of a good and fast way that could help me calculate the
> > alignable portion of a genome (not human), given a reference sequence.
> > When I say alignable portion I mean that I want to know all the positions of
> > the genome that can be covered uniquely by reads of 36 bp and up to 2
> > mismatches.
> >
> > Some have advised me to work with Perl using the following strategy but I am
> > not a Perl user so if someone has already a script for this function, it
> > would be nice:
> >
> > "you could approach it by walking along the genome in a sliding window of
> > 36 nt, and hash the frequency of each 36 nt sequence that you encounter.
> > Then count how many of the 36 nt sequences had a frequency of exactly
> > one. Divide this by the total number of 36nt windows visited. This
> > should be do-able in about 20 lines of Perl."
> >
> >
> > Best regards and thanks in advance
> >
> > --
> > View this message in context: http://www.nabble.com/alignable-portion-of-a-
> > genome-tp23480025p23480025.html
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list