[Biopython] GSOC 2012: Representation and manipulation of genomic variants
Peter Cock
p.j.a.cock at googlemail.com
Sun Mar 25 15:59:41 EDT 2012
On Sun, Mar 25, 2012 at 8:13 PM, Chris Mitchell <chris.mit7 at gmail.com> wrote:
> Hey everyone,
>
> I'm interested in undertaking this project. I'm currently a PhD student in
> Biochemical, Cellular, & Molecular Biology at Johns Hopkins School of
> Medicine, and I've been a hobby programmer for several years. I primarily
> code in Python and C++.
Great - and your background sounds good.
> I'm currently working on large -omics based data (whole genome alignments,
> RNA-Seq) so I have a flavor of what formats end users will encounter (I've
> worked with Illumina & Complete Genomics RNA-Seq and genome assemblies, and
> Affy arrays for SNPs/CNVs) and more importantly, I know how the end user
> will want to utilize the data. By far, I see the biggest hurdle is to
> arrange several types of data representations into a universal reference
> frame (for instance bam files being 0 based, sam being 1 based, CG vcf
> files being 0 based, closed interval versus half open, etc etc etc).
That's easy - we're Python programers therefore any parsed
data structure should be converted to used Python counting.
Peter
More information about the Biopython
mailing list