[Biopython] GSoC project application: Representation and manipulation of genomic variants: Some questions

Chaitanya Talnikar chaitanya.talnikar at iitb.ac.in
Sun Apr 1 05:42:43 EDT 2012


I have uploaded a second draft incorporating the changes. Please
provide comments on my proposal.
Thanks,
Chaitanya

On Fri, Mar 30, 2012 at 6:43 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Chaitanya;
> Thanks for making this available. It's a great start and you need to
> work from here on being much more detailed in your project plan. I left
> specific comments in-line in the proposal. Let us know when you have a
> revised version and we can work more. Thanks again,
> Brad
>
>> Here's the google doc link, I have made it editable too.
>>
>> https://docs.google.com/document/d/12N1aEzagMZ8akc1mrfP4MxHdILT2wapjENJOoxZBIh0/edit
>>
>> On Wed, Mar 28, 2012 at 6:13 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> >
>> > Chaitanya;
>> > The easiest way to work on your proposal is to write it in a
>> > public Google Doc and then share with the list. I don't yet have access
>> > to all of the Melange GSoC project and I'd imagine others who might
>> > have thoughts are in the same boat. As a side benefit it's also much
>> > easier to collaborate on editing and notes.
>> >
>> > Brad
>> >
>> >> Hi,
>> >> I have uploaded the first draft of my project proposal. I will add
>> >> more sections to the project plan in a day or two. Just wanted to have
>> >> the initial draft up. I hope to write a better proposal with your
>> >> feedback.
>> >>
>> >> Regards,
>> >> Chaitanya
>> >>
>> >> On Mon, Mar 26, 2012 at 4:37 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> >> >
>> >> > Chaitanya;
>> >> > Thanks for the interest and specific questions.
>> >> >
>> >> >> 1. For the implementation of variants what would be better, to create
>> >> >> a new SeqVariant class from scratch or to extend the SeqFeature class
>> >> >> to accomodate variants? I guess a separate class would be better.
>> >> >
>> >> > My preference would be to see how far the SeqFeature class can take you
>> >> > before implementing a new class. It should be general enough to handle
>> >> > variant data, but the bigger challenge might be designing a lightweight
>> >> > representation that is compatible with existing SeqFeatures.
>> >> >
>> >> >> 2. While looking at the Biopython wiki I came across an implementation
>> >> >> of GFF at
>> >> >> https://github.com/chapmanb/bcbb/tree/master/gff
>> >> >> As GVF is an extension of GFF3, this module could be used for reading
>> >> >> GVF's too. Is this module a good start to modify it to support GVFs?
>> >> >
>> >> > That would be perfect. We're hoping to merge this into the Biopython
>> >> > code base before the next release. There is also an existing VCF parser
>> >> > we'd love to use here:
>> >> >
>> >> > https://github.com/jamescasbon/PyVCF
>> >> >
>> >> >> 3. I've been going through the VCF documentation and SNPs, insertions
>> >> >> and deletions can be represented just like it is done in VCF, the
>> >> >> object would have a start position, length of reference sequence(no
>> >> >> need to store this sequence) and a list of alternate sequence objects.
>> >> >> I have to still look into the SV(Structural variants), rearrangements
>> >> >> and imprecise variant information, so this representation is only for
>> >> >> SNPs and small indels. The GVF has a very similar format for small
>> >> >> indels and SNPs, just that it provides an extra end position column
>> >> >> which is not required if we have the reference sequence.
>> >> >
>> >> > This sounds good. My general suggestion is to start writing your
>> >> > proposal as soon as possible. A concrete first draft will help with more
>> >> > detailed comments. The wiki has good information on the project plan:
>> >> >
>> >> > http://open-bio.org/wiki/Google_Summer_of_Code#When_you_apply
>> >> >
>> >> > and the NESCent wiki has some examples of well-written proposals from
>> >> > previous years:
>> >> >
>> >> > http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2012#Writing_your_application
>> >> >
>> >> > One of the key aspects is having a detailed week-by-week outline of your
>> >> > plans for the summer.
>> >> >
>> >> > Thanks again for the interest,
>> >> > Brad


More information about the Biopython mailing list