[Bioperl-l] indexing conservation scores

Thu Dec 23 00:30:41 UTC 2010

On Wed, Dec 22, 2010 at 7:00 PM, Maxim <deeepersound at googlemail.com> wrote:

> Hi,
>
> bio::db:fasta is a beautiful tool for fast access to sequences present in
> large flat text (fasta) files and I really love it. Now I'd like to speed
> up
> the retrieval of data from large files that store conservation scores. The
> files that I was able to find at UCSC have fixed step wiggle format, like
>
>
Hi, Maxim.  Have you looked at this page?

http://genomewiki.ucsc.edu/index.php/Using_hgWiggle_without_a_database

Sean

> fixedStep chrom=chrYHet start=1 step=1
> 0.117
> 0.092
> 0.092
> 0.085
> 0.071
> 0.051
> 0.021
> 0.010
> 0.008
> 0.010
> 0.019
> 0.023
> 0.023
> 0.019
> ........
>
> Does someone see a chance how to use the indexing mechanism used by
> bio::db::fasta in order to allow retrieval of float numbers. I could
> reformat the wiggle file to a simple space,tab or comma separated list of
> scores per chromosome.
>
> Are there suggestions? Or is there indeed a module that takes care about my
> problem and I have just overlooked it?
> Or won't such an approach  get considerably faster than normal unix
> commands
> like:
> sed -n '2,5001p' chrYHet.pp
> to retrieve the scores?
>
>
> Maxim
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>