[Biopython-dev] BGZF support, was Re: Biopython 1.60 plans and beyond

Peter Cock p.j.a.cock at googlemail.com
Thu May 24 05:18:33 EDT 2012


On Thu, May 24, 2012 at 6:52 AM, Artem Tarasov
<lomereiter at googlemail.com> wrote:
> Hi all,
>
> it's a good point that many line-based formats need some sort of compression
> with indexing, and BGZF is good enough in that sense.

BGZF doesn't have to be used with line-based formats, anything
with sequential records would work (like BAM files of course). I've not
tried it to see how well it compressed, but SFF files in BGZF should
work too as another example.

>> So far, I think Artem's BGZF implementation is entirely in D; I may just
>> add Ruby support for BGZF separately.
>
> The only problem I see with that approach is that it's hardly possible to
> get parallel compression with MRI. But overall I tend to agree with Clayton.
> Firstly, it's hard to abstract away some common interface right now, not
> writing any code and looking at it. Secondly, there're still problems with D
> shared library support. We were assured by GDC developer that they'll get
> solved soon, but at the moment the situation is far from perfect.

My BGZF code is pure Python (using C zlib via Python's zlib library),
and does not currently tackle parallel compression or decompression.
There as been recent work in samtools for this.

We don't need parallel compression/decompression of BGZF for it to
be useful.

Peter


More information about the Biopython-dev mailing list