[Bioperl-l] Counting Homopolymer regions
Abhishek Pratap
abhishek.vit at gmail.com
Mon Jan 12 14:06:13 EST 2009
Hi Heikki
Thanks for a quick reply.
Just wondering what happens if there are multiple homopolymeric regions in a
sequence/contig ?
Thanks,
-Abhi
On Mon, Jan 12, 2009 at 8:33 AM, Heikki Lehvaslaiho <
heikki.lehvaslaiho at gmail.com> wrote:
> If you can load the sequence strings into memory, I'd use a regular
> expression to detect the homopolymers and the use the pos function to
> find the location of hits:
>
>
> $s = "AGGGGGGGAAAAACGATCGGGGGGGTGTGGGGGCCCCCGTG";
> $min = 4;
>
> while ( $s =~ /(A{$min,}|T{$min,}|G{$min,}|C{$min,})/g) {
> $end = pos($s);
> $start = $end - length($1) + 1;
> print "$start, $end, $1 \n";
> }
>
>
> -Heikki
>
> 2009/1/9 Abhishek Pratap <abhishek.vit at gmail.com>:
> > Hello All
> >
> >
> > Is there a quick way to find the homopolymer stretches in the contigs and
> > also report their base start and end positions.
> >
> > Thanks,
> > -Abhi
> >
> > --
> > -----------------------------
> > Abhishek Pratap
> > Bioinformatics Software Engineer
> > Institute for Genome Sciences
> > School of Medicine, Univ of Maryland
> > 801, W. Baltimore Street, Baltimore, MD 21209
> > Ph: (+1)-410-706-2296
> > www.igs.umaryland.edu/
> >
> > Chair
> > RSG-Worldwide
> > ISCB-Student Council
> > http://iscbsc.org/rsg
> >
> > www.bioinfosolutions.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> -Heikki
> Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
> http://kapkaupunki.blogspot.com/
>
--
-----------------------------
Abhishek Pratap
Bioinformatics Software Engineer
Institute for Genome Sciences
School of Medicine, Univ of Maryland
801, W. Baltimore Street, Baltimore, MD 21209
Ph: (+1)-410-706-2296
www.igs.umaryland.edu/
Chair
RSG-Worldwide
ISCB-Student Council
http://iscbsc.org/rsg
www.bioinfosolutions.com
More information about the Bioperl-l
mailing list