[Biopython] Feature selection techniques modules

Sean Davis sdavis2 at mail.nih.gov
Sun Feb 6 21:35:09 UTC 2011


On Sun, Feb 6, 2011 at 3:37 PM, chris dimitrakopoulos <
dimitrakopoul at gmail.com> wrote:

> Hello everyone,
>
> I am an msc student in University of Patras, Greece, in the research field
> of Bioinformatics. I recently become a member of the OBF and i appreciate
> the open source work of your OBF project.
>
> I had a discussion with Mr. Robert Buels about this year gsoc, cause i look
> forward to make an application and i found that OBF would be the
> organization most suitable for me. Generally, i was idling in the projects
> announced on previous years and i found them very interesting. As this
> year's potential projects have not been announced yet, i wanted to express
> to you an idea of mine, say briefly what I am thinking of doing, and ask
> you
> if you think it is a good idea and it is worth to make an application with
> this subject after March 28.
>
> Well, I think that feature selection techniques have become a very
> important
> issue in many bioinformatics implementations. In many cases (like protein
> interactions prediction), you have to find a way to collect the best set of
> features that leads to the best classification performance. I looked in
> Biopython libraries and i didn't find something relative about FS
> techniques
> implementation to a dataset of features (like t-test, ANOVA, Wilcoxon, CFS
> etc... ). Hence, i think that the creation of a library focused on FS
> techniques would be a good idea. Moreover, that library can have an
> hierarchical structure as there are different types of FS techniques, like
> filter, wrapper and embedded techniques. Furthermore, each type of them is
> divided into more groups, (f.e. filter methods are divided into univariate
> and multivariate methods, according to the consideration of feature
> dependencies) etc...
>
> Only some of the methods i am thinking of implementing are:
>
> T-test, ANOVA, Gamma, bivariate methods, CFS, MRMR which are some known
> filter feature selection techniques.
> In wrapper and embedded methods, the classifiers are been used in the
> process of feature selection, so we have techniques based on Genetic
> algorithms, Random forests, logistic regression, Decision Tree Learners,
> Bayesian Classifiers, etc.. In this case, the existing Biopython modules
> Bio.LogisticRegression, Bio.GA and Bio.NaiveBayes could be used.
>
>
Hi, Chris.

You might want to look at the Rpy project.  All of the above machine
learning and feature selection algorithms (and many more) are implemented in
R and can be wrapped fairly easily in python using Rpy.

Sean



> More information on the techniques I describe can be found on the following
> links:
>
> http://bioinformatics.oxfordjournals.org/content/23/19/2507.full.pdf+html
>
> http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=3570EDE4C7E11AAE7CA5F727800DC58A?doi=10.1.1.37.4643&rep=rep1&type=pdf
>
> New functions computing the above measures can be created. The calculation
> can be done between vectors of features, between a feature vector and the
> output vector, or even if in large datasets (with many features) been
> readen
> from a file, in which we want to implement feature selections.
>
> I send to you this email in order to express briefly my idea. Please let me
> know what do you think about it and if it is worth been proposed as one of
> my student applications in gsoc 2011, to open bioinformatics foundation. If
> you want me to tell you any further details about my thinking just ask me!
> :-)
>
> Look forward to hearing from you,
> Chris Dim
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list