[Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format)

Peter Cock p.j.a.cock at googlemail.com
Tue Nov 8 15:41:15 UTC 2011


On Tue, Nov 8, 2011 at 3:38 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> That discussion concluded that random access into simple GZIP files
> was not practical, but BGZF (used in BAM) was worth looking into.
> I wrote some proof of principle code back then:
> http://lists.open-bio.org/pipermail/biopython/2010-June/006555.html
>
> I have recently polished that old code up, and done some
> benchmarking (using some reasonably large FASTA, Swiss,
> and UniProt-XML files). Please read this blog post:
> http://blastedbio.blogspot.com/

More precise link to my BGZF post:
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

Peter



More information about the Biopython-dev mailing list