[Biopython-dev] [Biopython] Update: call for Google Summer of Code project ideas
Eric Talevich
eric.talevich at gmail.com
Thu Mar 1 18:30:19 UTC 2012
2012/3/1 Peter Cock <p.j.a.cock at googlemail.com>
> 2012/3/1 Eric Talevich <eric.talevich at gmail.com>:
> >
> > Here's one semi-coherent project idea that could fly:
> >
> > Overhaul Biopython's parsing infrastructure for protein
> > primary, secondary and tertiary structures
> >
> > - Refactor PDBParser and parse_pdb_header to allow parsing
> > amino-acid sequences from SEQRES lines (header) and ATOM
> > records (body) without building the PDB structure object,
> > i.e. without using numpy
> > - Write a pure-Python replacement for parsing mmCIF files.
> > (The module MMCIF2Dict already does almost all the work;
> > lex+yacc just manages a fairly simple state machine for
> > recognizing comments, special sub-sections, etc.)
> > - Wrap the parsers for PDB, PDBML and mmCIF under a common
> > I/O interface under the Bio.Struct namespace
> > - Add parsing support for protein secondary structures,
> > based on the relevant PDB records or (perhaps) DSSP
> > output. (Note that João did some work on this already.)
>
> Do you think you could mentor that? One serious downside
> would be even more work on PDB related code which will
> make future merging even harder. We do need to tackle the
> GSoC back log as a priority.
>
I would serve if called upon, but I think it's best if we set this one
aside for E&J SoC (JESoC?) rather than GSoC this year.
>
> > Variants
> > --------
> >
> > So, from the Biopython 1.60 thread:
> >
> > - James Casbon has offered to merge PyVCF into Biopython, right?
> > - BCF, the binary form of VCF (via blocked gzip), may also
> > be worthwhile to support
> > - GVF, the Genome Variation Format, appears to be intended
> > to be competitive with VCF. It's probably at least as well
> > thought-out as VCF, sight unseen. It's based on GFF.
> >
> > Synthesizing the above, we have a GSoC project that looks like:
> >
> > - Help merge PyVCF into Python (w/ James's support -- I
> > don't mean to volunteer him for this in absentia)?
> > - Write a GVF parser that emits the same object type as
> > PyVCF, potentially also using existing GFF code
> > - Time permitting, look into blocked gzip support for VCF
> > (BCF), also looking at SAM/BAM for inspiration and
> > reusable code.
>
> Sounds interesting - who might be willing to mentor it?
>
Does someone feel comfortable asking James for his thoughts on this?
I'm not especially well qualified to mentor this, though I could assist as
a secondary mentor if needed. Any other Biopython devs/users well
acquainted with VCF/PyVCF?
More information about the Biopython-dev
mailing list