[DAS2] Sequence retrieval proposal
Thomas Down
td2 at sanger.ac.uk
Thu Dec 8 05:33:16 EST 2005
On 7 Dec 2005, at 23:22, Andrew Dalke wrote:
>
>> 2. What do folks think about specifying a DAS2XML format for sequence
>> requests (text/x-das-sequence+xml)? In addition to permitting an
>> optional checksum attribute to address the above use case, it
>> would
>> add some consistency and flexibility to the spec, since at
>> present,
>> the default sequence response format is the only one that is
>> not under
>> our control (currently it's text/x-fasta).
>
> As a consumer of this sort of data, I don't want to write another
> parser. It isn't just the parsing part - it's the effort of mapping
> to my program's data model.
>
> There's already a huge number of existing sequence file formats.
> What would another provide? Are some of them already extensible?
>
> Several of those formats are designed and developed by people involved
> with DAS. If it's important, extend GAME or GFF.
Do GAME or GFF have a sequence representation? I thought they were
both primarily feature-table formats (right now I'm having trouble
finding the GAME documentation though...).
The problem I have with Fasta format (other than the tendency of many
data-providers to over-load the header line) is that there's no
explicit marker for the alphabet and encoding of sequence data. This
is pretty nasty for codebases like BioJava which want to present a
richer view of sequence data than just a String. I'd certainly be in
favour of a nice XML format that made alphabet information explicit.
The DAS 1.5 DASSEQUENCE document has a moltype attribute which
supports this (at least the three most important cases, DNA/RNA/
Protein -- there's not a standards-compliant way to add other
alphabets though).
I guess an alternative, more classically RESTful, way of doing things
might be with MIME types:
Content-Type: application/fasta; sequence-alphabet=DNA;
sequence-encoding=IUPAC
I admit I'd prefer the XML though...
Thomas.
More information about the DAS2
mailing list