[Biopython-dev] [Biopython] Google Summer of Code 2014 - Call for project ideas

Mon Feb 3 17:37:41 UTC 2014

Hi Peter,

My idea was to have a QC/preprocessing module inside Biopython, which
could then be integrated with the rest of the NGS tools wrappers.
Though you are right, these functionalities as such are already part
of  fastQC and replicating might not be a good idea.

As for limma, I had these things in mind:
1. Correct me if I am wrong, but Biopython only supports Affymetrix
data, right? My idea was to build parsers for Genepix, Agilent etc
2. Add other methods for in/between array normalisation, MA, volcano plots

Yes, it is like reinventing the wheel, but I have been thinking of
porting this to python myself, this might not be good from the point
of view of a GSoC project however.

Saket

On 3 February 2014 12:31, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Feb 3, 2014 at 4:22 AM, Saket Choudhary <saketkc at gmail.com> wrote:
>>
>> I would like to propose a QC module for NGS & Microarray data.
>> Essentially a fastQC[1] and limma[2], respectively ported to
>> Biopython.
>>
>> [1] http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
>> [2] http://bioconductor.org/packages/devel/bioc/html/limma.html
>
> Hi Saket,
>
> What did you have in mind for 'porting' fastQC? Recreating it in
> Python alone doesn't seem like a sensible use of time & effort.
> Are there particular functions etc you think make sense to have
> available as a library of code?
>
> For limma, the linear model side would fall nicely under SciPy,
> eg http://scikit-learn.org/stable/modules/linear_model.html
> However, Biopython's existing microarray support could do
> with some love.
>
> Peter