[Biopython-dev] [Biopython] Update: call for Google Summer of Code project ideas

Rodrigo Faccioli rodrigo.faccioli at gmail.com
Thu Mar 1 13:44:14 EST 2012


Hi,

Although I'm not an specialist to be mentor, I have experience to implement
at PDBParser the reading of SEQRES section. In fact, I already have
implemented it and I'm able to share it for BioPython project.

Best regards,

--
Rodrigo Antonio Faccioli
Ph.D Student in Electrical Engineering
University of Sao Paulo - USP
Engineering School of Sao Carlos - EESC
Department of Electrical Engineering - SEL
Intelligent System in Structural Bioinformatics
http://laips.sel.eesc.usp.br
Phone: 55 (16) 3373-8739
Curriculum Lattes - http://lattes.cnpq.br/1025157978990218
Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5
Personal Blogg - http://rodrigofaccioli.blogspot.com/



On Thu, Mar 1, 2012 at 3:30 PM, Eric Talevich <eric.talevich at gmail.com>wrote:

> 2012/3/1 Peter Cock <p.j.a.cock at googlemail.com>
>
> > 2012/3/1 Eric Talevich <eric.talevich at gmail.com>:
> > >
> > > Here's one semi-coherent project idea that could fly:
> > >
> > > Overhaul Biopython's parsing infrastructure for protein
> > > primary, secondary and tertiary structures
> > >
> > > - Refactor PDBParser and parse_pdb_header to allow parsing
> > >   amino-acid sequences from SEQRES lines (header) and ATOM
> > >   records (body) without building the PDB structure object,
> > >   i.e. without using numpy
> > > - Write a pure-Python replacement for parsing mmCIF files.
> > >   (The module MMCIF2Dict already does almost all the work;
> > >   lex+yacc just manages a fairly simple state machine for
> > >   recognizing comments, special sub-sections, etc.)
> > > - Wrap the parsers for PDB, PDBML and mmCIF under a common
> > >   I/O interface under the Bio.Struct namespace
> > > - Add parsing support for protein secondary structures,
> > >   based on the relevant PDB records or (perhaps) DSSP
> > >   output. (Note that João did some work on this already.)
> >
> > Do you think you could mentor that? One serious downside
> > would be even more work on PDB related code which will
> > make future merging even harder. We do need to tackle the
> > GSoC back log as a priority.
> >
>
> I would serve if called upon, but I think it's best if we set this one
> aside for E&J SoC (JESoC?) rather than GSoC this year.
>
>
> >
> > > Variants
> > > --------
> > >
> > > So, from the Biopython 1.60 thread:
> > >
> > > - James Casbon has offered to merge PyVCF into Biopython, right?
> > > - BCF, the binary form of VCF (via blocked gzip), may also
> > >   be worthwhile to support
> > > - GVF, the Genome Variation Format, appears to be intended
> > >   to be competitive with VCF. It's probably at least as well
> > >   thought-out as VCF, sight unseen. It's based on GFF.
> > >
> > > Synthesizing the above, we have a GSoC project that looks like:
> > >
> > > - Help merge PyVCF into Python (w/ James's support -- I
> > >   don't mean to volunteer him for this in absentia)?
> > > - Write a GVF parser that emits the same object type as
> > >   PyVCF, potentially also using existing GFF code
> > > - Time permitting, look into blocked gzip support for VCF
> > >   (BCF), also looking at SAM/BAM for inspiration and
> > >   reusable code.
> >
> > Sounds interesting - who might be willing to mentor it?
> >
>
> Does someone feel comfortable asking James for his thoughts on this?
>
> I'm not especially well qualified to mentor this, though I could assist as
> a secondary mentor if needed. Any other Biopython devs/users well
> acquainted with VCF/PyVCF?
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>



More information about the Biopython-dev mailing list