[Biopython-dev] Proposal for GSoC 2017

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 2 12:35:58 UTC 2017


On Thu, Mar 2, 2017 at 11:48 AM, Sourav Singh <ssouravsingh12 at gmail.com> wrote:
> Hello Everyone,
>
> I am looking to propose a project for GSoC 2017 under BioPython.
>

Great.

> I have written my project proposal below. If anyone would be interested in
> mentoring me on the project, it would be great.
>
> Project Title- Add support for LLVM/ CUDA kernels to BioPython using Numba.
>

Sadly even if I had time, this is not an area I could mentor for GSoC.

> About Project-
>
> Currently Biopython has support for PyPy compiler, but the support for PyPy
> is not proper since Biopython depends on NumPy for certain functionalities,
> and NumPy has been ported to PyPy compiler.

I don't quite understand this sentence.

The PyPy team have got a lot of NumPy working nicely under PyPy, and
we do need to review how much of non-C-code NumPy using bits of
Biopython will now work here, e.g. Bio.PDB.

Code like Bio.Cluster uses NumPy at the C level, which remains a bigger
hurdle for using with PyPy.

> The aim of this project is to add support for LLVM compiler and if needed,
> support for GPUs through Numba.
>
> Approach-
>
> I am currently trying to undertake some pilot tests on kNN module of
> Biopython and benchmark the results accordingly. The project would involve
> adding support for LLVM using Numba for certain specific modules in
> Biopython which can benefit highly with the speedup. If needed, Support for
> CUDA kernels can also be added to Biopython.
>
> Knowledge required-
>
> 1) Programming skills in Python
> 2) Knowledge of BioPython internals.
> 3) Knowledge of LLVM workings
> 4) Knowledge of CUDA.
>
> Difficulty-
>
> Medium to Hard depending on the kind of module being worked on.
>
> Regards,
>
> Sourav

Other that the relatively small Bio/kNN.py code, which other bits of
Biopython are you thinking about? The kNN module is problematic
in that is doesn't really have a current maintainer, who would be a
natural candidate for mentoring work in this area.

Since it seems you are focusing on numerical analysis here, you might
find a more satisfying project with SciPy or scikit-learn - or indeed with
PyPy themselves?

https://github.com/scipy/scipy/wiki/GSoC-2017-project-ideas
https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2017
http://pypy.readthedocs.io/en/latest/project-ideas.html

Peter


More information about the Biopython-dev mailing list