[Bioperl-l] Counting Homopolymer regions
Heikki Lehvaslaiho
heikki.lehvaslaiho at gmail.com
Mon Jan 12 13:33:51 UTC 2009
If you can load the sequence strings into memory, I'd use a regular
expression to detect the homopolymers and the use the pos function to
find the location of hits:
$s = "AGGGGGGGAAAAACGATCGGGGGGGTGTGGGGGCCCCCGTG";
$min = 4;
while ( $s =~ /(A{$min,}|T{$min,}|G{$min,}|C{$min,})/g) {
$end = pos($s);
$start = $end - length($1) + 1;
print "$start, $end, $1 \n";
}
-Heikki
2009/1/9 Abhishek Pratap <abhishek.vit at gmail.com>:
> Hello All
>
>
> Is there a quick way to find the homopolymer stretches in the contigs and
> also report their base start and end positions.
>
> Thanks,
> -Abhi
>
> --
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
>
> Chair
> RSG-Worldwide
> ISCB-Student Council
> http://iscbsc.org/rsg
>
> www.bioinfosolutions.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
-Heikki
Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
http://kapkaupunki.blogspot.com/
More information about the Bioperl-l
mailing list