[Bioperl-l] Counting Homopolymer regions

Heikki Lehvaslaiho heikki.lehvaslaiho at gmail.com
Tue Jan 13 03:22:48 EST 2009


Dear Abhi,

I am not sure what you mean. Could you post a short sequence that has
a "multiple homopolymeric region".

The script I posted detects pure runs of any homopolymers. You have to
set a lower limit to the length, collect them all to a data structure
(a hash, usually), and then decide what to do with them (combine,
define a larger region, ...). All depends what you want to accomplish
with this.

    -Heikki

2009/1/12 Abhishek Pratap <abhishek.vit at gmail.com>:
> Hi Heikki
>
> Thanks for a quick reply.
>
> Just wondering what happens if there are multiple homopolymeric regions in a
> sequence/contig ?
>
> Thanks,
> -Abhi
>
> On Mon, Jan 12, 2009 at 8:33 AM, Heikki Lehvaslaiho
> <heikki.lehvaslaiho at gmail.com> wrote:
>>
>> If you can load the sequence strings into memory, I'd use a regular
>> expression to detect the homopolymers and the use the pos function to
>> find the location of hits:
>>
>>
>> $s = "AGGGGGGGAAAAACGATCGGGGGGGTGTGGGGGCCCCCGTG";
>> $min = 4;
>>
>> while ( $s =~ /(A{$min,}|T{$min,}|G{$min,}|C{$min,})/g) {
>>    $end = pos($s);
>>    $start = $end - length($1) + 1;
>>    print "$start, $end, $1 \n";
>> }
>>
>>
>>   -Heikki
>>
>> 2009/1/9 Abhishek Pratap <abhishek.vit at gmail.com>:
>> > Hello All
>> >
>> >
>> > Is there a quick way to find the homopolymer stretches in the contigs
>> > and
>> > also report their base start and end positions.
>> >
>> > Thanks,
>> > -Abhi
>> >
>> > --
>> > -----------------------------
>> > Abhishek Pratap
>> > Bioinformatics Software Engineer
>> > Institute for Genome Sciences
>> > School of Medicine, Univ of Maryland
>> > 801, W. Baltimore Street, Baltimore, MD 21209
>> > Ph: (+1)-410-706-2296
>> > www.igs.umaryland.edu/
>> >
>> > Chair
>> > RSG-Worldwide
>> > ISCB-Student Council
>> > http://iscbsc.org/rsg
>> >
>> > www.bioinfosolutions.com
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>
>>
>> --
>>   -Heikki
>> Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
>> http://kapkaupunki.blogspot.com/
>
>
>
> --
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
>
> Chair
> RSG-Worldwide
> ISCB-Student Council
> http://iscbsc.org/rsg
>
> www.bioinfosolutions.com
>
>



-- 
   -Heikki
Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
http://kapkaupunki.blogspot.com/


More information about the Bioperl-l mailing list