[Biopython] GEO profiles retrieval

Paulo Nuin nuin at genedrift.org
Wed Apr 17 18:45:20 UTC 2013


Hi everyone

Quite a longish question about some data retrieval we are trying to implement on GEO profiles. I don't know if this is possible to achieve programatically with (or without BioPython), but some parts I already have set using Python and BioPython. What we are trying to achieve:

- we are building a pipeline where initially we want to see if the gene in question (let's say PTEN) is over or under expressed in certain conditions.
- using a eSearch URL/procedure I can get an XML with all the profile IDs for PTEN
- in order to get more information about each profile, I can use an eSummary URL/procedure that will get an XML file for each profile
- with these profiles we then want to check the gene expression level in each sample subgroup or the study and see if the gene is under or over expressed, or there's no change between the groups.

The problem I have is that in the profile XML file there's no information about sample annotation, or gene expression in each sample. I created a workaround that from the eSummary XML, I can get to this page of the profile

http://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS2877:1441937_s_at

using the GDS and probe ID found on the XML. Again, from this file there's no easy way to extract the sample grouping/annotation, although it's quite straightforward to extract the gene expression levels for each sample. What I want to find is:

- a way to get sample grouping/annotation for a specific GDS, that would give me the sample IDs that I could correlate to an expression value
- a eSearch, eSummary, eFetch, any URL that would give me expression values per sample, with sample ID annotated to a group

Thanks in advance for any help, idea and comments.

Paulo



More information about the Biopython mailing list