[Biopython] GSoC project application: Representation and manipulation of genomic variants: Some questions

Chaitanya Talnikar chaitanya.talnikar at iitb.ac.in
Sun Mar 25 18:25:25 UTC 2012


Hi again,

I have a few questions on this topic.
1. For the implementation of variants what would be better, to create
a new SeqVariant class from scratch or to extend the SeqFeature class
to accomodate variants? I guess a separate class would be better.
2. While looking at the Biopython wiki I came across an implementation
of GFF at
https://github.com/chapmanb/bcbb/tree/master/gff
As GVF is an extension of GFF3, this module could be used for reading
GVF's too. Is this module a good start to modify it to support GVFs?
3. I've been going through the VCF documentation and SNPs, insertions
and deletions can be represented just like it is done in VCF, the
object would have a start position, length of reference sequence(no
need to store this sequence) and a list of alternate sequence objects.
I have to still look into the SV(Structural variants), rearrangements
and imprecise variant information, so this representation is only for
SNPs and small indels. The GVF has a very similar format for small
indels and SNPs, just that it provides an extra end position column
which is not required if we have the reference sequence.

Regards,
Chaitanya Talnikar
Undergraduate Student
Department of Chemical Engineering
IIT Bombay



More information about the Biopython mailing list