[Bioperl-l] Counting Homopolymer regions

Smithies, Russell Russell.Smithies at agresearch.co.nz
Mon Jan 19 20:56:36 UTC 2009


You can also use the built-in regex variables and back-references to get the positions of the matches:

print join(", ", $-[0], $+[0], $&),"\n" while ( $s =~ /([ACGT])\1{$min,}/g);

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho
> Sent: Tuesday, 13 January 2009 2:34 a.m.
> To: Abhishek Pratap
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Counting Homopolymer regions
> 
> If you can load the sequence strings into memory, I'd use a regular
> expression to detect the homopolymers and the use the pos function to
> find the location of hits:
> 
> 
> $s = "AGGGGGGGAAAAACGATCGGGGGGGTGTGGGGGCCCCCGTG";
> $min = 4;
> 
> while ( $s =~ /(A{$min,}|T{$min,}|G{$min,}|C{$min,})/g) {
>     $end = pos($s);
>     $start = $end - length($1) + 1;
>     print "$start, $end, $1 \n";
> }
> 
> 
>    -Heikki
> 
> 2009/1/9 Abhishek Pratap <abhishek.vit at gmail.com>:
> > Hello All
> >
> >
> > Is there a quick way to find the homopolymer stretches in the contigs
> and
> > also report their base start and end positions.
> >
> > Thanks,
> > -Abhi
> >
> > --
> > -----------------------------
> > Abhishek Pratap
> > Bioinformatics Software Engineer
> > Institute for Genome Sciences
> > School of Medicine, Univ of Maryland
> > 801, W. Baltimore Street, Baltimore, MD 21209
> > Ph: (+1)-410-706-2296
> > www.igs.umaryland.edu/
> >
> > Chair
> > RSG-Worldwide
> > ISCB-Student Council
> > http://iscbsc.org/rsg
> >
> > www.bioinfosolutions.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> --
>    -Heikki
> Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
> http://kapkaupunki.blogspot.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list