[Biopython-dev] GEO database and bio-python

Peter biopython at maubp.freeserve.co.uk
Thu Mar 25 08:22:40 EDT 2010


On Thu, Mar 25, 2010 at 11:58 AM, Nicolas Rapin
<nicolas.rapin at bric.ku.dk> wrote:
> Dear all,
>
> I just started python, and use biopython quite a lot lately. It's a nice package,
> and is very convenient. Oh, and I m also new on the mailing list...

Great, and welcome :)

> I  need to get access to a lot of data from GEO, and i noticed that it might be
> a good idea to have the database locally, which lead me to write  a little class
> that can download  the compressed files  form ncbi (the GSE/GPLxxx_family.tgz
> files) , and parse the MINimL sort of xml they have in there together with the
> actual data that is in the compressed files. In the end i have a nicely organized
> hdf5 file, that i can use to do data mining.

Have you looked at the existing Bio.GEO module? It hasn't got an active
maintainer at the moment, as in some ways is rather simplistic. I found that
Sean Davis' GEOquery package for R/Bioconductor was much more
complete.

> I wondered if that was for Biopython.

This sounds like a useful addition.

> If yes, how do I contribute ?

First of all we use the public mailing lists to discuss things. In
terms of code,
starting a branch on github would let you show us what you are working on
and makes it easier to eventually merge things. See
http://biopython.org/wiki/GitUsage

Peter



More information about the Biopython-dev mailing list