[Biopython-dev] GEO database and bio-python

Sean Davis sdavis2 at mail.nih.gov
Thu Mar 25 08:29:52 EDT 2010


On Thu, Mar 25, 2010 at 7:58 AM, Nicolas Rapin <nicolas.rapin at bric.ku.dk> wrote:
> Dear all,
>
> I just started python, and use biopython quite a lot lately. It's a nice package, and is very convenient. Oh, and I m also new on the mailing list...
>
> I  need to get access to a lot of data from GEO, and i noticed that it might be a good idea to have the database locally, which lead me to write  a little class that can download  the compressed files  form ncbi (the GSE/GPLxxx_family.tgz files) , and parse the MINimL sort of xml they have in there together with the actual data that is in the compressed files. In the end i have a nicely organized hdf5 file, that i can use to do data mining.
>
> I wondered if that was for Biopython.

Hi, Nico.

Not a direct answer to your question, but have a look at the
Bioconductor package GEOmetadb.  (There is also an online version.)
We have parsed all of GEO metadata into a SQLite database and made it
available within R.  However, the SQLite database can be used
standalone and python has built in support for SQLite, as of late.

http://gbnci.abcc.ncifcrf.gov/geo/
http://gbnci.abcc.ncifcrf.gov/geo/GEOmetadb.sqlite.gz
http://watson.nci.nih.gov/bioc_mirror/packages/2.6/bioc/html/GEOmetadb.html

Also, as for the data, if you are inclined to use R for anything (or
rpy2), the GEOquery package can download and parse all the record
types in GEO into objects within R and the number of tools for data
analysis of microarray data in R/Bioconductor is enormous.

http://watson.nci.nih.gov/bioc_mirror/packages/2.6/bioc/html/GEOquery.html

Sorry for the advertisement-like email....

Sean

> If yes, how do I contribute ?
>
>
> best,
>
> Nico
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>



More information about the Biopython-dev mailing list