[DAS2] Sequence retrieval proposal

Steve Chervitz Steve_Chervitz at affymetrix.com
Wed Dec 7 13:56:02 EST 2005


Here's a proposal regarding sequence retrievals that apparently never made
it to the list. (I was compiling a list of agenda items for next week's spec
discussion when I noticed I sent this message only to myself...)

Steve

------ Forwarded Message
From: Steve Chervitz <Steve_Chervitz at affymetrix.com>
Date: Mon, 14 Nov 2005 13:38:44 -0800
To: Steve Chervitz <Steve_Chervitz at affymetrix.com>
Subject: Re: [DAS2] DAS/2 weekly meeting notes for 14 Nov 05

>From the notes of today's meeting (14 Nov 05):

> LS: When you request versioned source from a server, it should say what
> assembly coords it's working on and give a uri for that. In this case
> there's no guarantee you can do a 'get' on that URI.
> We want to say:
> 1- what is unique uri for assembly (everyone agrees to share this)
> 2- das URL for how to fetch it (some server's region url - trusted,
> faithful copy with what is at ncbi). Diff servers could assert that
> you can fetch it from various places.

This raises another issue we didn't discuss: How about allowing some
way to verify that the sequence data received from a given reference
server are in fact faithful copies?

Use case 1: Validate a given reference server as providing correct
sequence data for a specific assembly (either the entire assembly or a
specific chromosome).

Use case 2: Verify that the sequence or subsequence I received from a
specific sequence request is correct and complete.

Case #1 requires that the official source of the assembly (or some other
trusted reference server) publish checksums on each complete sequence
it provides (e.g., each full-length chromosome of each assembly).

Case #2 requires the ability to encode a checksum in a sequence
response. But there are two issues here: validating the data transfer
for the request and validating the correctness of the sequence or
subsequence itself with respect to the original assembled sequence.

The first issue of case #2 is already supported in the current spec,
if the request specifies a format that incorporates a checksum (e.g.,
sequence/chr21?format=GCG). However, not all servers may support that
format, yet they could support checksums. The second issue of case #2
is covered only for responses from trusted reference servers.

To consider:

1. What do folks think about adding to the DAS/2 retrieval spec
   facilities supporting sequence data validation? (i.e., Add an
   optional checksum attribute in the REGION response.)

2. What do folks think about specifying a DAS2XML format for sequence
   requests (text/x-das-sequence+xml)? In addition to permitting an
   optional checksum attribute to address the above use case, it would
   add some consistency and flexibility to the spec, since at present,
   the default sequence response format is the only one that is not under
   our control (currently it's text/x-fasta).

Steve

------ End of Forwarded Message




More information about the DAS2 mailing list