[Biopython-dev] Bio.Geo for NCBI's GEO microarry SOFT files

Peter biopython-dev at maubp.freeserve.co.uk
Sat Dec 10 13:39:13 EST 2005


I've just been looking at the Bio.Geo module by Katharine Lindner, 
contributed back in 2002 which should parse the NCBI's Gene Expression 
Omnibus (GEO) microarray data files.

http://www.ncbi.nlm.nih.gov/geo/

Is anyone using Bio.Geo at the moment?

The NCBI seem to call these SOFT files, (*.soft) and the format is 
documented here:

http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html#SOFTformat

Apparently in 2005, they began a switch to a revised file format, new 
format files here:

ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/

Old format files here:

ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old/
ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old_gz/

As far as I can tell, neither the "old" or "new" versions work in 
Bio.Geo, so there may have been another format change between 2002 and 2005.

In addition the 2005 change introduces new lines, before and after the 
actual data:

!dataset_table_begin
!dataset_table_end

These are definitely not supported in the current Martel grammar for GEO 
files.

Peter



More information about the Biopython-dev mailing list