[Biopython] GSoC project application: Representation and manipulation of genomic variants: Some questions

Chaitanya Talnikar chaitanya.talnikar at iitb.ac.in
Wed Mar 28 06:19:04 EDT 2012


Here's the google doc link, I have made it editable too.

https://docs.google.com/document/d/12N1aEzagMZ8akc1mrfP4MxHdILT2wapjENJOoxZBIh0/edit

On Wed, Mar 28, 2012 at 6:13 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Chaitanya;
> The easiest way to work on your proposal is to write it in a
> public Google Doc and then share with the list. I don't yet have access
> to all of the Melange GSoC project and I'd imagine others who might
> have thoughts are in the same boat. As a side benefit it's also much
> easier to collaborate on editing and notes.
>
> Brad
>
>> Hi,
>> I have uploaded the first draft of my project proposal. I will add
>> more sections to the project plan in a day or two. Just wanted to have
>> the initial draft up. I hope to write a better proposal with your
>> feedback.
>>
>> Regards,
>> Chaitanya
>>
>> On Mon, Mar 26, 2012 at 4:37 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> >
>> > Chaitanya;
>> > Thanks for the interest and specific questions.
>> >
>> >> 1. For the implementation of variants what would be better, to create
>> >> a new SeqVariant class from scratch or to extend the SeqFeature class
>> >> to accomodate variants? I guess a separate class would be better.
>> >
>> > My preference would be to see how far the SeqFeature class can take you
>> > before implementing a new class. It should be general enough to handle
>> > variant data, but the bigger challenge might be designing a lightweight
>> > representation that is compatible with existing SeqFeatures.
>> >
>> >> 2. While looking at the Biopython wiki I came across an implementation
>> >> of GFF at
>> >> https://github.com/chapmanb/bcbb/tree/master/gff
>> >> As GVF is an extension of GFF3, this module could be used for reading
>> >> GVF's too. Is this module a good start to modify it to support GVFs?
>> >
>> > That would be perfect. We're hoping to merge this into the Biopython
>> > code base before the next release. There is also an existing VCF parser
>> > we'd love to use here:
>> >
>> > https://github.com/jamescasbon/PyVCF
>> >
>> >> 3. I've been going through the VCF documentation and SNPs, insertions
>> >> and deletions can be represented just like it is done in VCF, the
>> >> object would have a start position, length of reference sequence(no
>> >> need to store this sequence) and a list of alternate sequence objects.
>> >> I have to still look into the SV(Structural variants), rearrangements
>> >> and imprecise variant information, so this representation is only for
>> >> SNPs and small indels. The GVF has a very similar format for small
>> >> indels and SNPs, just that it provides an extra end position column
>> >> which is not required if we have the reference sequence.
>> >
>> > This sounds good. My general suggestion is to start writing your
>> > proposal as soon as possible. A concrete first draft will help with more
>> > detailed comments. The wiki has good information on the project plan:
>> >
>> > http://open-bio.org/wiki/Google_Summer_of_Code#When_you_apply
>> >
>> > and the NESCent wiki has some examples of well-written proposals from
>> > previous years:
>> >
>> > http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2012#Writing_your_application
>> >
>> > One of the key aspects is having a detailed week-by-week outline of your
>> > plans for the summer.
>> >
>> > Thanks again for the interest,
>> > Brad


More information about the Biopython mailing list