[Biopython] GSOC 2012: Representation and manipulation of genomic variants

Sun Mar 25 15:59:41 EDT 2012

On Sun, Mar 25, 2012 at 8:13 PM, Chris Mitchell <chris.mit7 at gmail.com> wrote:
> Hey everyone,
>
> I'm interested in undertaking this project.  I'm currently a PhD student in
> Biochemical, Cellular, & Molecular Biology at Johns Hopkins School of
> Medicine, and I've been a hobby programmer for several years.  I primarily
> code in Python and C++.

Great - and your background sounds good.

> I'm currently working on large -omics based data (whole genome alignments,
> RNA-Seq) so I have a flavor of what formats end users will encounter (I've
> worked with Illumina & Complete Genomics RNA-Seq and genome assemblies, and
> Affy arrays for SNPs/CNVs) and more importantly, I know how the end user
> will want to utilize the data.  By far, I see the biggest hurdle is to
> arrange several types of data representations into a universal reference
> frame (for instance bam files being 0 based, sam being 1 based, CG vcf
> files being 0 based, closed interval versus half open, etc etc etc).

That's easy - we're Python programers therefore any parsed
data structure should be converted to used Python counting.

Peter