[Bioperl-l] indexing conservation scores
cjfields at illinois.edu
Wed Dec 22 19:39:12 EST 2010
Maybe use a tied hash using BerkeleyDB or AnyDBM_File, or DBD::SQLite? Also, maybe convert to BigWig and use Lincoln's Bio::DB::BigFile tools (note the installation process is a little tricky for this):
Also, +1 to Sean's suggestion (don't rely completely on bioperl to implement everything :)
On Dec 22, 2010, at 6:00 PM, Maxim wrote:
> bio::db:fasta is a beautiful tool for fast access to sequences present in
> large flat text (fasta) files and I really love it. Now I'd like to speed up
> the retrieval of data from large files that store conservation scores. The
> files that I was able to find at UCSC have fixed step wiggle format, like
> fixedStep chrom=chrYHet start=1 step=1
> Does someone see a chance how to use the indexing mechanism used by
> bio::db::fasta in order to allow retrieval of float numbers. I could
> reformat the wiggle file to a simple space,tab or comma separated list of
> scores per chromosome.
> Are there suggestions? Or is there indeed a module that takes care about my
> problem and I have just overlooked it?
> Or won't such an approach get considerably faster than normal unix commands
> sed -n '2,5001p' chrYHet.pp
> to retrieve the scores?
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l