[Biopython-dev] Merging the GFF3 and VCF branches

Peter Cock p.j.a.cock at googlemail.com
Thu Jun 4 10:44:35 UTC 2015

This would be great to have merged - pathological test cases
and interconversion too :)

Did we settle on a plan for parent/child relationships in
SeqFeature objects (beyond deprecating sub_features
which has been replaced with CompoundLocations)?


On Thu, Jun 4, 2015 at 10:54 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Eric;
> Thanks for looking at this. +1 on getting Lenna's work in and I'll let
> her comment on that compared to the current state of VCF support in
> pysam and PyVCF. For GFF, I'd actually rather see
> integration/collaboration with Ryan's gffutils:
> https://github.com/daler/gffutils
> It uses sqlite to organize the data and is much better engineered than
> my GFF work. He took all my pathological test cases and made them work,
> and it also has initial biopython integration:
> https://github.com/daler/gffutils/blob/master/gffutils/biopython_integration.py
> The main work would be to take some of the scripts in bcbio-gff that
> folks find useful, like the GFF/GenBank conversion through SeqIO, and
> port these over. This has been something I wanted to do for a while but
> never got done. What does everyone think?
> Brad
>> Biopythoneers,
>> I am interested in improving Biopython's support for genomic data, namely
>> through merging the existing GFF3 and VCF branches.
>> Where we last left off, Brad's GFF branch was available on a fork:
>> http://biopython.org/wiki/GFF_Parsing
>> https://github.com/chapmanb/bcbb/tree/master/gff
>> When this branch was submitted to Biopython, in 2009 or so, there was a
>> subtle conflict with the way nested annotations were represented as
>> SeqFeatures in Biopython. Peter tested several possible resolutions to this
>> issue on branches, the last of which appears to be f_loc5:
>> https://github.com/peterjc/biopython/tree/f_loc5
>> For GSoC 2012, Lenna developed a VCF parser and genomic coordinate mapper
>> compatible with Peter's SeqFeature updates (actually the f_loc4 branch, I
>> guess?) and Brad's GFF parser:
>> http://biopython.org/wiki/Google_Summer_of_Code#Representation_and_manipulation_of_genomic_variants
>> http://arklenna.tumblr.com/post/29808300789/and-the-summer-ends
>> https://github.com/lennax/biopython/
>> What would it take to merge all of this once-recent work into Biopython?
>> Are the SeqFeature CompoundLocation changes satisfactory and ready to merge
>> into the mainline? Are we willing to make this compatibility break? If not,
>> should we instead add another class/module to support the new behavior
>> (BetterSeqFeature)?
>> Happy to help,
>> Eric
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev

More information about the Biopython-dev mailing list