[Bioperl-l] alignable portion of a genome

Miguel Pignatelli miguel.pignatelli at uv.es
Tue May 12 04:45:46 EDT 2009


For mismatches, take a look at the CPAN module Text::LevenshteinXS  
which calculates the Levenshtein distance (edit distance) of two  
strings.

For more information about Levenshtein distance:

http://en.wikipedia.org/wiki/Levenshtein_distance

M;


El 12/05/2009, a las 5:55, Smithies, Russell escribió:

> Perfect matches is easy:
>
> 	$seq = "atcgacgatcgaacgatcga";
>
> 	foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++}
> 	foreach (keys %hash){ $singles++ if($hash{$_} eq 1)}
> 	print $singles/$h;
>
> Could probably be done with map as well.
> Counting the miss-matches might take a bit more thinking....
> Any ideas MAJ?
>
> --Russell
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of fadista
>> Sent: Monday, 11 May 2009 9:32 p.m.
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] alignable portion of a genome
>>
>>
>> Hi,
>>
>> I would like to know of a good and fast way that could help me  
>> calculate the
>> alignable portion of a genome (not human), given a reference  
>> sequence.
>> When I say alignable portion I mean that I want to know all the  
>> positions of
>> the genome that can be covered uniquely by reads of 36 bp and up to 2
>> mismatches.
>>
>> Some have advised me to work with Perl using the following strategy  
>> but I am
>> not a Perl user so if someone has already a script for this  
>> function, it
>> would be nice:
>>
>> "you could approach it by walking along the genome in a sliding  
>> window of
>> 36 nt, and hash the frequency of each 36 nt sequence that you  
>> encounter.
>> Then count how many of the 36 nt sequences had a frequency of exactly
>> one. Divide this by the total number of 36nt windows visited. This
>> should be do-able in about 20 lines of Perl."
>>
>>
>> Best regards and thanks in advance
>>
>> --
>> View this message in context: http://www.nabble.com/alignable-portion-of-a-
>> genome-tp23480025p23480025.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>




More information about the Bioperl-l mailing list