[BioSQL-l] Affymetrix SQL for PostgreSQL
Allen Day
allenday at ucla.edu
Thu May 1 16:54:02 EDT 2003
Hi,
Things are going well with the Chado/RAD merger. So far I've managed to
port the table and view create statements from Oracle over to PostgreSQL,
and the table creates are also portable to MySQL using SQL::Translator.
I loaded some of the Affymetrix MAGE-ML files of all the database
crossreference info for their probesets last week. This week I've started
to gather our protocol data, which is prerequisite to loading any real
data.
So... I can't give you any opinion as to how I've found the RAD schema to
be from a data analyst's point of view yet. From the loading and schema
porting experience I've had so far though, it seems that both the Chado
and RAD teams have put a lot of thought into creating clear schemata.
Hopefully within a month or so I'll have some expression values loaded
into Chado/RAD and will be starting to use the db for analysis, and can
give some better feedback.
> The way I could envision a different design of a gene expression model
> in BioSQL is as a warehouse star-schema, where there'd be essentially
> one (or very few) analytical data tables, and all the rest is hosted by
> the existing biosql tables (i.e., mostly the term table). It would be
> understood then that people would host their expression data in another
> schema, and the biosql table(s) would be used as a warehouse only.
Ah, okay. You could certainly strip the RAD schema down. Right now the
Chado port is ~50 tables with a handful of views.
-Allen
> Sounds great. Here are a few comments as for my $0.02 ...
>
> There's probably as many expression data schemas out there as labs
> hosting expression data. There's not that many big efforts making a
> generalizing attempt, but there are some (GEO, ArrayExpress, GeneX,
> RAD, SMD, and I'm sure a couple more).
>
> If gene expression tables in the 'official' BioSQL (everyone can - and
> many will - have his/her own, extended or whatever, build), a design
> that attempts to be generic and technology agnostic would be most
> attractive to me.
>
> Gene expression not having been within the scope of BioSQL yet ever,
> I'd prefer to take as much advantage of existing open-source schemas as
> possible, since then the reality-check has already happened and the
> software support may come with it.
>
> Lately GMOD/Chado faced a similar situation, and Allen who I believe
> took the lead on that project settled on integrating the respective
> parts of GUS/RAD.
>
> Allen, how did that work out? Could we just build on your work and RAD?
>
> Marc, what made you decide to disregard the big expression schemas? (No
> offense whatsoever, I'm just curious.)
>
> The way I could envision a different design of a gene expression model
> in BioSQL is as a warehouse star-schema, where there'd be essentially
> one (or very few) analytical data tables, and all the rest is hosted by
> the existing biosql tables (i.e., mostly the term table). It would be
> understood then that people would host their expression data in another
> schema, and the biosql table(s) would be used as a warehouse only.
>
> -hilmar
>
> On Thursday, May 1, 2003, at 12:08 PM, Marc Colosimo wrote:
>
> >
> > Since I couldn't easily find a good schema, I made my own based on
> > Affymetrixs GATC schema. My hope is that as I develope it, that it will
> > use parts of BioSQL to handle the non-array stuff (taxon, sequence
> > databases, etc...). I only have a few tables made and they are not
> > normalized (one actually I think is best de-normalized). Oh, I am
> > keeping
> > in mind MIAME stuff.
> >
> > I have one script that is almost finished that loads in CEL files. I
> > just
> > have a few complex regexs to make/debug and add support for bulk
> > loading
> > on a local machine (piping it to psql). Now that I have played around
> > with
> > DBI, loading CDF files are next.
> >
> > If people are interested in the code to try it out, let me know.
> >
> > Marc
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
>
More information about the BioSQL-l
mailing list