[Bioperl-l] alignable portion of a genome

Aaron Mackey ajmackey at gmail.com
Tue May 12 12:18:25 UTC 2009


A better idea than using Perl to count the mismatches is to actually
generate all the (unique) 36-mers from the reference genome as an artificial
set of "reads", and then use a program like Mosaik or maq to align them back
to the reference genome.  Both tools have means of then reporting the
coverage along the genome of uniquely aligned reads.  That way you can also
change the Mosaik/maq parameters to reflect your true read alignment
strategy.

-Aaron

On Mon, May 11, 2009 at 11:55 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

> Perfect matches is easy:
>
>        $seq = "atcgacgatcgaacgatcga";
>
>        foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++}
>        foreach (keys %hash){ $singles++ if($hash{$_} eq 1)}
>        print $singles/$h;
>
> Could probably be done with map as well.
> Counting the miss-matches might take a bit more thinking....
> Any ideas MAJ?
>
> --Russell
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of fadista
> > Sent: Monday, 11 May 2009 9:32 p.m.
> > To: Bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] alignable portion of a genome
> >
> >
> > Hi,
> >
> > I would like to know of a good and fast way that could help me calculate
> the
> > alignable portion of a genome (not human), given a reference sequence.
> > When I say alignable portion I mean that I want to know all the positions
> of
> > the genome that can be covered uniquely by reads of 36 bp and up to 2
> > mismatches.
> >
> > Some have advised me to work with Perl using the following strategy but I
> am
> > not a Perl user so if someone has already a script for this function, it
> > would be nice:
> >
> > "you could approach it by walking along the genome in a sliding window of
> > 36 nt, and hash the frequency of each 36 nt sequence that you encounter.
> > Then count how many of the 36 nt sequences had a frequency of exactly
> > one. Divide this by the total number of 36nt windows visited. This
> > should be do-able in about 20 lines of Perl."
> >
> >
> > Best regards and thanks in advance
> >
> > --
> > View this message in context:
> http://www.nabble.com/alignable-portion-of-a-
> > genome-tp23480025p23480025.html
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list