[Bioperl-l] Counting Homopolymer regions

Mon Jan 12 08:33:51 EST 2009

If you can load the sequence strings into memory, I'd use a regular
expression to detect the homopolymers and the use the pos function to
find the location of hits:


$s = "AGGGGGGGAAAAACGATCGGGGGGGTGTGGGGGCCCCCGTG";
$min = 4;

while ( $s =~ /(A{$min,}|T{$min,}|G{$min,}|C{$min,})/g) {
    $end = pos($s);
    $start = $end - length($1) + 1;
    print "$start, $end, $1 \n";
}


   -Heikki

2009/1/9 Abhishek Pratap <abhishek.vit at gmail.com>:
> Hello All
>
>
> Is there a quick way to find the homopolymer stretches in the contigs and
> also report their base start and end positions.
>
> Thanks,
> -Abhi
>
> --
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
>
> Chair
> RSG-Worldwide
> ISCB-Student Council
> http://iscbsc.org/rsg
>
> www.bioinfosolutions.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
   -Heikki
Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
http://kapkaupunki.blogspot.com/