What are regions for? (was Re: [DAS2] DAS intro)

Helt,Gregg Gregg_Helt at affymetrix.com
Mon Dec 5 12:13:16 EST 2005



> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Tuesday, November 29, 2005 4:02 PM
> To: DAS/2
> Subject: What are regions for? (was Re: [DAS2] DAS intro)
> 
> Ed:
> > I understand this as talking about coordinates in general, not the
> > <region> elements or "pos" attributes in the spec.  Suzi
specifically
> > mentions chromosomes and contigs; one can definitely be backwards
with
> > respect to the other. But top-level regions in an assembly would
> > probably all be chromosomes or all be contigs, rather than a
mixture.
> 
> I'm trying to figure out when people use the /region.

	Okay, for now ignore the whole issue of assembly.  The need for
something like /region doesn't depend on different levels of assembly.
I do think handling assembly information is necessary, but that's for a
different post.
	In the current spec the ".../region" query is the only way to
_efficiently_ discover the set of sequences that can be used for
region/sequence-based filters in feature queries.  Pretty much any
client that wants to restrict feature queries by sequence needs to use
it.  Now you _can_ determine this same info via an unqualified
".../sequence" query but then you're retrieving all the residues for
each sequence -- this is about as inefficient as you can get.
	Another alternative to the current approach would be to combine
/region and /sequence into one type of query, but to add modifiers
(format param?) that specify what to return:

	.../sequence?format=x-das-regions  (or something similar)
.../sequence?format=fasta

We would need to specify at least these two different formats to allow
for both efficient retrieval of minimal information about the set of
seqs and retrieval of sequence residues.

...
> My questions, to summarize, are:
>    - why do we need a /region space when we can
>        1. point directly to a sequence (for chromosome regions) and/or
>        2. point to a "contig" or "assembly" or "region" feature type
>                (for other regions)
> 
>    - When would someone have regions which have more than one of
>       contigs, ESTs and chromosomes?  Especially given that this
>       is the genome spec, so chromosome-level info is known, at
>       least enough for a rough assembly.
> 
> In other words, what are regions for?
> 

I'm really only addressing question 1.1, as I said before I think
assembly is a separate issue.

	gregg





More information about the DAS2 mailing list