[Biopython-dev] GEO database and bio-python
Sean Davis
sdavis2 at mail.nih.gov
Thu Mar 25 12:29:52 UTC 2010
On Thu, Mar 25, 2010 at 7:58 AM, Nicolas Rapin <nicolas.rapin at bric.ku.dk> wrote:
> Dear all,
>
> I just started python, and use biopython quite a lot lately. It's a nice package, and is very convenient. Oh, and I m also new on the mailing list...
>
> I need to get access to a lot of data from GEO, and i noticed that it might be a good idea to have the database locally, which lead me to write a little class that can download the compressed files form ncbi (the GSE/GPLxxx_family.tgz files) , and parse the MINimL sort of xml they have in there together with the actual data that is in the compressed files. In the end i have a nicely organized hdf5 file, that i can use to do data mining.
>
> I wondered if that was for Biopython.
Hi, Nico.
Not a direct answer to your question, but have a look at the
Bioconductor package GEOmetadb. (There is also an online version.)
We have parsed all of GEO metadata into a SQLite database and made it
available within R. However, the SQLite database can be used
standalone and python has built in support for SQLite, as of late.
http://gbnci.abcc.ncifcrf.gov/geo/
http://gbnci.abcc.ncifcrf.gov/geo/GEOmetadb.sqlite.gz
http://watson.nci.nih.gov/bioc_mirror/packages/2.6/bioc/html/GEOmetadb.html
Also, as for the data, if you are inclined to use R for anything (or
rpy2), the GEOquery package can download and parse all the record
types in GEO into objects within R and the number of tools for data
analysis of microarray data in R/Bioconductor is enormous.
http://watson.nci.nih.gov/bioc_mirror/packages/2.6/bioc/html/GEOquery.html
Sorry for the advertisement-like email....
Sean
> If yes, how do I contribute ?
>
>
> best,
>
> Nico
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
More information about the Biopython-dev
mailing list