[BioSQL-l] Microarrays and BioSQL

Marc Colosimo mcolosim at brandeis.edu
Wed Apr 2 15:50:38 EST 2003


On Wed, 2 Apr 2003, Hilmar Lapp wrote:

> 
> On Wednesday, April 2, 2003, at 10:47  AM, Marc Colosimo wrote:
> 
> > Each gene has at least 11 oligos for it. They have the same name, like
> > 171720_x_at. I have files for the target sequence (500bp), the 
> > sequence of
> > each oligo and their positions, and a file that has descriptions of the
> > probes. At a minimum I have 13 items, each with the same name.
> >
> 
> The way I did this here is to treat Affy probesets as bioentries (you 
> read them in in FASTA format target sequence), with the individual 
> probes (oligos) being features on the probeset (you read those in from 
> the tab file and then associate by look-up in memory while you're 
> loading). Note that the name for oligos is artificial, since they 
> really have no identifier (and neither do seqfeatures). I leave those 
> probeset bioentries pretty bare otherwise, since they are not more than 
> that - expression reporters.
> 
> I then associate the target sequence bioentries with the (fully 
> annotated) transcript (also a bioentry) they supposedly target via a 
> bioentry_relationship. Note that this is computed content and is 
> subject to change according to your current state of knowledge (about 
> transcripts), and there are different algorithms for how to actually 
> establish that relationship (e.g., just take Affy's annotation, or 
> blast against UniGene, or map both UniGene and target sequences to the 
> genome and then go for co-location; the first one is the easiest but 
> also the worst because dated - we chose to recompute ourselves).
> 
> The question which protein a transcript encodes we solve through 
> another (computed) bioentry to bioentry relationship. You get the idea.
> 
> This works pretty nicely for us. It is one of the things why I like 
> biosql in fact.
> 

That seems much easier in ways. I was trying to keep as much of the 
original information intact. One reason I want (and I sure you also 
want) to keep the oligos is so that I can look to see why some matches 
give the signals that they do. I'm guessing that you can't release the code for that to save me time and 
debugging.

Thanks,
Marc



More information about the BioSQL-l mailing list