[Biopython-dev] BGZF support, was Re: Biopython 1.60 plans and beyond
Peter Cock
p.j.a.cock at googlemail.com
Tue Apr 24 11:58:10 EDT 2012
On Fri, Apr 20, 2012 at 11:35 PM, Andrew Sczesnak
<andrew.sczesnak at med.nyu.edu> wrote:
> Peter,
>
> My colleague was writing some code using MafIndex and commented how long it
> took her to download, decompress and index the human multiz alignments from
> UCSC. It seems like it'd be great to keep the files compressed... perhaps if
> the code works well enough we can convince UCSC to host bgzip'd copies (or
> maybe them available on one of our institutions servers).
That does sound good - it is a perfect example of where BGZF is a more
useful alternative to standard GZIP. Some numbers on how much of a
size penalty it imposes would help though...
> Is I.J. interested in joining the community? I'd like to look into adding
> BGZF to MafIO and wouldn't want to duplicate I.J.'s effort. If not, could
> you put me in touch?
Perhaps he's just busy at the moment (BCC'd again)?
It should be easy enough to follow the BGZF changes to Bio/SeqIO/_index.py
and I'm willing to do this myself for MAF (while going over your index work -
something I want to do anyway). The only potential catch is avoiding offset
arithmetic.
Peter
More information about the Biopython-dev
mailing list