[Biopython] GSoC project application: Representation and manipulation of genomic variants: Some questions

Brad Chapman chapmanb at 50mail.com
Sun Apr 1 20:07:31 UTC 2012


Chaitanya;
Thanks for the additional work on this, that's great work. I left
specific comments in-line but my general suggestion is to keep expanding
and clarifying the timeline. Up front work building a detailed timeline
makes the summer work so much easier, as well as building a stronger
proposal. Thanks again,

Brad

> I have uploaded a second draft incorporating the changes. Please
> provide comments on my proposal.
> Thanks,
> Chaitanya
> 
> On Fri, Mar 30, 2012 at 6:43 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> >
> > Chaitanya;
> > Thanks for making this available. It's a great start and you need to
> > work from here on being much more detailed in your project plan. I left
> > specific comments in-line in the proposal. Let us know when you have a
> > revised version and we can work more. Thanks again,
> > Brad
> >
> >> Here's the google doc link, I have made it editable too.
> >>
> >> https://docs.google.com/document/d/12N1aEzagMZ8akc1mrfP4MxHdILT2wapjENJOoxZBIh0/edit
> >>
> >> On Wed, Mar 28, 2012 at 6:13 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> >> >
> >> > Chaitanya;
> >> > The easiest way to work on your proposal is to write it in a
> >> > public Google Doc and then share with the list. I don't yet have access
> >> > to all of the Melange GSoC project and I'd imagine others who might
> >> > have thoughts are in the same boat. As a side benefit it's also much
> >> > easier to collaborate on editing and notes.
> >> >
> >> > Brad
> >> >
> >> >> Hi,
> >> >> I have uploaded the first draft of my project proposal. I will add
> >> >> more sections to the project plan in a day or two. Just wanted to have
> >> >> the initial draft up. I hope to write a better proposal with your
> >> >> feedback.
> >> >>
> >> >> Regards,
> >> >> Chaitanya
> >> >>
> >> >> On Mon, Mar 26, 2012 at 4:37 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> >> >> >
> >> >> > Chaitanya;
> >> >> > Thanks for the interest and specific questions.
> >> >> >
> >> >> >> 1. For the implementation of variants what would be better, to create
> >> >> >> a new SeqVariant class from scratch or to extend the SeqFeature class
> >> >> >> to accomodate variants? I guess a separate class would be better.
> >> >> >
> >> >> > My preference would be to see how far the SeqFeature class can take you
> >> >> > before implementing a new class. It should be general enough to handle
> >> >> > variant data, but the bigger challenge might be designing a lightweight
> >> >> > representation that is compatible with existing SeqFeatures.
> >> >> >
> >> >> >> 2. While looking at the Biopython wiki I came across an implementation
> >> >> >> of GFF at
> >> >> >> https://github.com/chapmanb/bcbb/tree/master/gff
> >> >> >> As GVF is an extension of GFF3, this module could be used for reading
> >> >> >> GVF's too. Is this module a good start to modify it to support GVFs?
> >> >> >
> >> >> > That would be perfect. We're hoping to merge this into the Biopython
> >> >> > code base before the next release. There is also an existing VCF parser
> >> >> > we'd love to use here:
> >> >> >
> >> >> > https://github.com/jamescasbon/PyVCF
> >> >> >
> >> >> >> 3. I've been going through the VCF documentation and SNPs, insertions
> >> >> >> and deletions can be represented just like it is done in VCF, the
> >> >> >> object would have a start position, length of reference sequence(no
> >> >> >> need to store this sequence) and a list of alternate sequence objects.
> >> >> >> I have to still look into the SV(Structural variants), rearrangements
> >> >> >> and imprecise variant information, so this representation is only for
> >> >> >> SNPs and small indels. The GVF has a very similar format for small
> >> >> >> indels and SNPs, just that it provides an extra end position column
> >> >> >> which is not required if we have the reference sequence.
> >> >> >
> >> >> > This sounds good. My general suggestion is to start writing your
> >> >> > proposal as soon as possible. A concrete first draft will help with more
> >> >> > detailed comments. The wiki has good information on the project plan:
> >> >> >
> >> >> > http://open-bio.org/wiki/Google_Summer_of_Code#When_you_apply
> >> >> >
> >> >> > and the NESCent wiki has some examples of well-written proposals from
> >> >> > previous years:
> >> >> >
> >> >> > http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2012#Writing_your_application
> >> >> >
> >> >> > One of the key aspects is having a detailed week-by-week outline of your
> >> >> > plans for the summer.
> >> >> >
> >> >> > Thanks again for the interest,
> >> >> > Brad



More information about the Biopython mailing list