[GSoC] GSoC python variant update
James Casbon
casbon at gmail.com
Tue May 8 04:57:57 EDT 2012
On 8 May 2012 01:24, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Lenna;
> This all looks great for a top level overview of the classes. This
> should give you sufficient flexibility to work on the different file
> types. Another approach is to avoid some of the inheritence and have
> parse/write dispatch to VCF or GFF specific classes based on the
> filetype:
>
> if filetype == "vcf":
> variant_handler = PyVCFVariants()
> elif filetype == "gvf":
> variant_handler = GVFVariants()
> variant_handler.parse(*args)
>
> Avoiding layers can be nice to simplify the architecture, as long as it
> gives you the flexibility you need.
Hi Lenna,
This looks a good start, but I would agree with Brad that layers of
inheritance aren't always the best way to proceed with python.
Specific feedback: why does the Variant have parse/write methods when
you state that you will use adaptation from the general variation
class to the actual parser? I'm also slightly worried this could be
pretty slow when dealing with the volume of data you get from a VCF
file.
As for the points in your blog post...
I have plenty of data, do we know any SNP callers capable of creating
GVF files? If so, I can give you both formats.
The simplest variant workflows would be to filter and then score on
some metric. Filter would be to remove noise, so quality threshold is
the simplest one. The metric used depends on the experimental setup.
For case/control, a fishers test is quite easy, or for a single
population an HWE test is fairly simple.
Hope this helps,
--
James
http://casbon.me/
More information about the GSoC
mailing list