[DAS2] Brief summary of DAS/BioSapiens workshops from a DAS/2 perspective
Helt,Gregg
Gregg_Helt at affymetrix.com
Mon Mar 5 12:30:10 EST 2007
Summary of DAS & Feature Classification workshops, February 26-28 2007,
Hinxton
DAS Developers Workshop:
http://www.sanger.ac.uk/Users/ap3/dasworkshop.html
BioSapiens Feature Type Classification Workshop:
http://www.ebi.ac.uk/~hhe/tmp/BioSapiensFeatureMeeting.htm
DAS1 clients discussed:
Dasty2, JalView, VectorBase, IGB, Pepper, Spice, ProView,
Ensembl ContigView, ...
DAS1 servers discussed:
PFam, Ensembl, ProServer, Sisyphus, ...
DAS1 extensions:
Gene DAS
Protein DAS
Alignmen tDAS
Structure DAS
3D-EM DAS
Interaction DAS
MaDAS (writeback?)
"simple" DAS
DAS/2
BioSapiens Overview: http://www.biosapiens.info
<http://www.biosapiens.info/>
Large-scale genome/protein annotation, 25 institutions from 14
countries across Europe participating
Currently 23 DAS servers within BioSapiens project serving 69 DAS
sources.
4 servers appear to be down (21 sources fail features query)
See http://www.biosapiens.info/page.php?page=biosapiensdir for more
DAS server stats
Major concerns for Ensembl / Sanger / BioSapiens I think we've addressed
well in DAS/2
Gene DAS
Protein DAS
Alignment DAS
"simple" DAS
Major concerns for Ensembl / Sanger / BioSapiens that surprised me:
A) In general the use of a smaller subset of DAS1 than expected
Many BioSapiens DAS servers don't support "entry_points" query
(64 fail|NA)
Many BioSapiens DAS servers don't support "types query" (49
fail|NA)
in DAS1 features themselves can carry most of the types
info
Some BioSapiens DAS servers don't support "features" query
parameters (only the features query with no params)
Many BioSapiens clients don't use "entry_points" query, "types"
query, or any feature filters (always get all features for a given
segment)
BioSapiens protein annotation almost exclusively uses flat
(one-level) features
very little or no use of "group" attribute to make two-level features
example: disulfide bond annotation- relies on rendering or prior
knowledge to differentiate
Ensembl DAS servers are in general serving one type per source
These simplifications of clients and servers are reinforcing
each other
If using subset of DAS1, does this mean that DAS/2 might be too
complex?
But with these simplifications, the complexity is getting pushed
into other places
B) Data overload
Number of servers, sources, types
Ensembl: will have 1000s of sources soon
Redundancy concerns
example: Pfam domain
Many sources with same / similar annotation type - "Pfam domain"
Slight differences in feature ranges
Which is the authority?
Is there a way to help clients decide which can be combined
Mirrors
C) Feature Classification / Ontology issues
SO currently inadequate for describing protein annotation
developing PAO (Protein Annotation Ontology)
types proliferation
example: one feature type for each PFam domain?
~9K PFam-A domains
If look at PFam-B (PRODOM that don't overlap PFam-A),
then ~70K / 450K more (>2 proteins in family / not)
of not in unique type, where does that information go?
Need multiple ontology terms to describe a single type?
------------------------------------------------------------------------
------
DAS WishList (last session of DAS workshop, people listed desired
improvements on whiteboard)
Multi-level features (Gregg)
Multi-level stylesheets (Ed)
Caching (last-modified, if-modified-since, TTL)
Provenance of features from other sources (features from different
sources with same IDs? types?)
Large analysis / Scalibility
1000s of seqs + 1000s sources + types ?
More queries: feature types / date
Entry point support
Encryption support
Stats-query interface -- count # of features of type for a source
ID ref external (URI / URN)
Proper error / exception handling
Asynchronous requests
process
batches
Better Stylesheets
Mapping servers
We've discussed most of these wishlist issues before while developing
DAS/2, though we certainly haven't completely solved all of them...
More information about the DAS2
mailing list