[Biopython] GSoC project application: Representation and manipulation of genomic variants: Some questions

Mic mictadlo at gmail.com
Tue Mar 27 04:33:08 UTC 2012


Hello,
http://code.google.com/p/pysam/downloads/detail?name=pysam-0.5.tar.gz&can=2&q=
*added vcf parsing*
What is the difference between pysam's VCF and PyVCF?*
*
On Mon, Mar 26, 2012 at 9:07 PM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Chaitanya;
> Thanks for the interest and specific questions.
>
> > 1. For the implementation of variants what would be better, to create
> > a new SeqVariant class from scratch or to extend the SeqFeature class
> > to accomodate variants? I guess a separate class would be better.
>
> My preference would be to see how far the SeqFeature class can take you
> before implementing a new class. It should be general enough to handle
> variant data, but the bigger challenge might be designing a lightweight
> representation that is compatible with existing SeqFeatures.
>
> > 2. While looking at the Biopython wiki I came across an implementation
> > of GFF at
> > https://github.com/chapmanb/bcbb/tree/master/gff
> > As GVF is an extension of GFF3, this module could be used for reading
> > GVF's too. Is this module a good start to modify it to support GVFs?
>
> That would be perfect. We're hoping to merge this into the Biopython
> code base before the next release. There is also an existing VCF parser
> we'd love to use here:
>
> https://github.com/jamescasbon/PyVCF
>
> > 3. I've been going through the VCF documentation and SNPs, insertions
> > and deletions can be represented just like it is done in VCF, the
> > object would have a start position, length of reference sequence(no
> > need to store this sequence) and a list of alternate sequence objects.
> > I have to still look into the SV(Structural variants), rearrangements
> > and imprecise variant information, so this representation is only for
> > SNPs and small indels. The GVF has a very similar format for small
> > indels and SNPs, just that it provides an extra end position column
> > which is not required if we have the reference sequence.
>
> This sounds good. My general suggestion is to start writing your
> proposal as soon as possible. A concrete first draft will help with more
> detailed comments. The wiki has good information on the project plan:
>
> http://open-bio.org/wiki/Google_Summer_of_Code#When_you_apply
>
> and the NESCent wiki has some examples of well-written proposals from
> previous years:
>
>
> http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2012#Writing_your_application
>
> One of the key aspects is having a detailed week-by-week outline of your
> plans for the summer.
>
> Thanks again for the interest,
> Brad
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list