[DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006
Andrew Dalke
dalke at dalkescientific.com
Mon Oct 23 12:19:03 EDT 2006
On Oct 9, 2006, at 6:30 PM, Steve Chervitz wrote:
> [A] Ask andrew about feature group assembly resolution, if any.
As far as I know there was no resolution.
At last standing the problem is as follows. Consider a complex
annotation
with a single parent A and a single child B.
There are several ways to represent this
Option 1:
<FEATURE uri="A" part="B"/>
<FEATURE uri="B" parent="A"/>
This is the current spec. Parents point to children and children to
parents. This was different than the GFF-style where only the children
have a parent reference. My hope was to assemble complex annotations
while reading the data from the remote server.
In practice this streaming assembly proved hard to implement. The
algorithm is non-trivial for complex structures so most people will
do the assembly only after reading all features. Also, there's a
possible error when parents don't list all children or vice versa,
and likely most clients won't fully validate, so a top-down and a
bottom-up assembly may give different results for the same server.
Option 2:
<FEATURE uri="A"/>
<FEATURE uri="B" parent="A"/>
This is the GFF-style. The main limitations are support for streaming
data, such as showing partial results while downloading and converting
to/from other formats. In both cases this is because parent nodes may
(and do) occur after children nodes, and there's no knowledge that all
children have been seen.
There is a problem in both option1 and option2 of not easily detecting
cycles or multi-rooted structures.
Variation: require that children are listed after parents.
Option 3:
<FEATURE-GROUP>
<FEATURE uri="A"/>
<FEATURE uri="B" parent="A"/>
</FEATURE-GROUP>
That is, put all features which are part of the same feature group into
a single element. This is essentially like the ### "no forward
references"
token in GFF3.
It's cumbersome because either there are two data types ("FEATURE-GROUP"
and "FEATURE") elements under the root or there are a lot of
FEATURE-GROUPs
containing a single sequence. There's still the need for cycle
detection
and checking that the parent/part relationship are valid.
Option 4:
<FEATURE uri="A">
<FEATURE uri="B"/>
</FEATURE>
Break the DAG into a tree structure (a spanning tree). In this case
"B" is a child of "A". For a more complex structure where "C" is a
child of "A" and "B",
<FEATURE uri="A">
<FEATURE uri="B">
<FEATURE uri="C" parent="A"/>
</FEATURE>
</FEATURE>
This doesn't fit well with relational databases. There's still the need
to check for cycles but it's much simpler.
Given the feedback I've heard, the use cases for streaming the data are
not seen as important. Hence I'm willing to go with #2 (GFF-style,
children
point to parents) and have nothing like the no-forward-references of
GFF3.
Andrew
dalke at dalkescientific.com
More information about the DAS2
mailing list