[Bioperl-l] Re: counting nucleotides in a sequence

Mark Wagner mark at lanfear.net
Wed Feb 19 10:23:44 EST 2003


> Like this:
>
> %perl -e '$str = "gatcgggtttaaggccctttt"; (@arr) = $str =~ /.{1,10}/g;
> print "@arr";'
>
> Remove \n from the sequence before you do the pattern match.
>
> Brian O.

> > I am interested in calculating the nucleotides A,C, G and T
> > in my sequence for every 10 nt window.
> > So I need to write a code that will scan the sequence for every 10 nts
> > like 1-10,11-20.
> > preferably i can then stack the result in arrays and write this array
> > to file.
> > Can someone help me to do this using bioperl code please ?
> > Thanks in advance for your help!
> >
> > Best wishes
> > Parvesh

Last weekend I was attempting to do something similar. I was disappointed
in perl's performance compared to C. Using the Benchmark module I
compared the performance of techniques using regular expressions,
substr, exploded strings, and unpack. My conclusion was that
substr was the best choice.

I made a writeup of the whole ordeal at
<http://www.lanfear.net/~mark/string_bench/>. I'm interested in what
people think about my results.

-- 
Mark Wagner mark at lanfear.net


More information about the Bioperl-l mailing list