[Biopython-dev] Fwd: [VCFtools-spec] The BCF2 quick reference document is up on the 1000G wiki

Peter Cock p.j.a.cock at googlemail.com
Mon Jul 9 07:33:44 EDT 2012


On Mon, Jul 9, 2012 at 12:27 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
> Thanks for the heads up. I'm excited about BCF2 and am hopeful it'll
> help with some of the painful parts of VCF, like subsetting large files
> by samples. There is also a page about it on the Broad wiki with more details:
>
> http://www.broadinstitute.org/gsa/wiki/index.php/BCF2
>
> In terms of the representation, this stays close to VCF so shouldn't
> change a lot of the API people see. The main changes would be on the
> backend side where we'd like to be able to swap in and out BCF2 and VCF
> (and GVF) transparently with no visible change to the programmer.
>
> Brad

Yes - that's what we should be aiming for, much like the SAM/BAM
duality which has worked really well for sequence alignments.

Note that like BAM, BCF and BCF2 are both compressed with
BGZF - support for which we included in Biopython 1.60. This
can be combined with the Python struct module to parse the
binary data (and with a little more effort will support both Python
2 and 3, see the SFF code for pointers or ask me).

Peter


More information about the Biopython-dev mailing list