[Biopython-dev] Bio.Geo for NCBI's GEO microarry SOFT files
Peter
biopython-dev at maubp.freeserve.co.uk
Sat Dec 10 13:39:13 EST 2005
I've just been looking at the Bio.Geo module by Katharine Lindner,
contributed back in 2002 which should parse the NCBI's Gene Expression
Omnibus (GEO) microarray data files.
http://www.ncbi.nlm.nih.gov/geo/
Is anyone using Bio.Geo at the moment?
The NCBI seem to call these SOFT files, (*.soft) and the format is
documented here:
http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html#SOFTformat
Apparently in 2005, they began a switch to a revised file format, new
format files here:
ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/
Old format files here:
ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old/
ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old_gz/
As far as I can tell, neither the "old" or "new" versions work in
Bio.Geo, so there may have been another format change between 2002 and 2005.
In addition the 2005 change introduces new lines, before and after the
actual data:
!dataset_table_begin
!dataset_table_end
These are definitely not supported in the current Martel grammar for GEO
files.
Peter
More information about the Biopython-dev
mailing list