[DAS2] Notes from the weekly DAS/2 teleconference, 6 Mar 2006
Chervitz, Steve
Steve_Chervitz at affymetrix.com
Mon Mar 13 10:38:58 EST 2006
[These are notes from last week's meeting. -Steve]
Notes from the weekly DAS/2 teleconference, 6 Mar 2006
$Id: das2-teleconf-2006-03-06.txt,v 1.1 2006/03/13 15:41:03 sac Exp $
Note taker: Steve Chervitz
Attendees:
Affy: Steve Chervitz, Ed E., Gregg Helt
CSHL: Lincoln Stein
Sanger: Thomas Down
Dalke Scientific: Andrew Dalke
UC Berkeley: Nomi Harris
UCLA: Brian O'Connor
Action items are flagged with '[A]'.
These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org
DISCLAIMER:
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit.
Agenda:
-------
upcoming Code Sprint, March 13-17 at Affymetrix
status reports
coordinate system resolution via COORDINATES element
features with multiple locations vs. alignments
features with multiple parents
???
[ Some trouble with passcode for teleconf - hopefully fixed ]
TD: The coord syst things we were hoping to discuss with Andreas who
won't make it today.
GH: We can push this off till next week.
Code Sprint
-------------
LS: At sanger mon-tues for ensembl sab meeting, able to participate
from tues pm to fri eve.
AD: Planning to come to Affy
BO: Allen and I are planning to come up to Emeryville
GH: For payment, submit expenses to affy.
Hotels? Marriott or Woodfin. Will send out rec's today.
NH: Planning to attend at affy mon-tues, thur.
[A] Ed will look into accts for andrew and brian (internet access)
GH: Plan on 9-10am phone teleconf daily. Greg can pick up people from
hotel.
GH: Goals/deliverables for this code sprint?
LS: Write das/2 client for bioperl. Plan to plug into Gbrowse
All I need is a working server
AD: Writing writeback and locks, improving validator .
NH: Apollo and registry, feature types. Wrote a writer, can test in
AD's validator (plan to).
GH: Keep working on das/2 client for igb at affy. Hoping by then to
have an affy das/2 server up and running.
SC: Can help get it up
GH: Can we put on in our dmz, so it's publically accessible at least
for the code sprint.
[A] Steve will look into setting up publically accessible affy das/2 test
server
TD: Working on getting an Ensembl das/2 server up.
GH: Java middle ware on top of biojava?
TD: Yes. Using the biojava to ensembl bridges.
EE: Getting IGB to use style sheets.
AD: And/or using a proper style sheet system, if you decide what I put
in there is not good enough.
BO: Looking for something to do. Hoping to start on writeback.,
Helping separate out igb model layer. Finished rpm packages in last
code sprint, this is pretty much done.
GH: Guess Allen will be working on the biopackages server.
BO: Waiting on spec for writeback.
AD: My writeup specifies how they do writeback at Sanger, overlaps
well with Lincoln's proposal. See that.
GH: Need to tighten up the read-only spec. A fair number of things to
resolve.
AD: A partial draft of 3rd version. Planning to update it before next
sprint. Examples so people can get a feel for how things go together.
GH: My agenda stuff: coord system resolution system to match
annotations on same genome coming from diff servers.
[A] Gregg will wait for Andreas to join in before discussing coordinate
issues.
GH: Feats w/ multiple locations (see email Gregg sent to the list
today with examples). Current spec says if you use >1 coord
system, you can have feats with multiple locations. Is this what we
want to say?
GH: Allen's server has feats w/ >1 location on same coord system. Do
we want to allow or disallow? If disallow, how?
AD: Possible usecase for alignments.
GH: Feat model for bioperl. Locations have multiple parts. Feats with
mult locations feels similar to that. Do you have multple children
each with a loc, or do you use the align element?
LS: Prefers children. That's what SO ended up doing after much
arguing. Makes it easier.
GH: Enforce it with the ontology. E.g, an alignment hit has alignment
hsps.
This forces client to understand the ontology.
LS: Consider that an hsp will have scores attached to it, different
cigar line. So you end up with mult children anyway. An improverished
type of alignment. Can use cigar line to indicate mismatches. Can have
a single HSP and a cigar line to indicate gaps. Only one child. You
don't have to have multiple locations
GH: Looking for use case of multiple locations with PCR products...
My main concern is how much semantic knowledge the clients need to
understand these things. Nothing in the spec that restricts mult
locations.
AD: Won't client just get the multiple children and not care about
types?
GH: I gues a simple client could do that. It disturbs me that it's up
the server how to handle multip loc, childrent, vs aligmnets.
Will send an example.
LS: Yes. this is a vague area. There should be a best-practices
section in the spec.
Single match feature from begin to end. HSP children, each one covers
major gaps. Cigar line w/in hsp to cover minor gaps. Can give each hsp
an alignment score.
GH: Main diff between locn and alignment is cigar string, and cigar
string is optional.
If we're allowed to use locations to designate alignments...
LS: How about if we consolidate location and alignment: location has
an optional cigar and then do away with alignment. Generalize
location to allow for gaps.
TD: Example: Aligning an est to the genome. Falls into several blocks
of exact/near exact matching. If location has cigar line, could serve
it up as a single feature.
GH: You can do this since cigar can represent arbitrary length gaps.
TD: Neat and compact way to do it. Does this scare anyone?
GH: Sounds reasonable.
AD: Let's do it. And will put in examples of best practices.
[A] Consolidate location and alignment in spec, loc has optional cigar
GH: Feats with mult parents. Need examples to test. This is a question
to people putting up servers. Will anyone have these?
TD: Ensembl might do this. Exon shared between several transcripts. A
toss up between multiple parents vs. multiple copies of same
exon. Think mult parents is the way to do it.
LS: Flybase use multiple parents for exons in this way.
TD: Ensemble db is a many-to-many between transcripts and exons.
GH: Spec says: If you have a child in the feat document, you have to
include its parent; If you have a parent you must include it's
children. As long as this plays policy nice with that requirement, I'm
ok with it.
GH: Anyone else see things that need to be ironed out in spec?
AD: Not yet
NH: We should write a paper about das/2. This will help get more
people using it, increase the success of the spec.
GH: Agreed -- good idea. We have lots of text in grant about the
philosophy of das/2.
NH: Can pull text from these places. Publish at a conference perhaps?
ISMB, CSB2006
GH: PLoS Bioinformatics?
NH: Conference would be nice, to involve people in discussion.
AD: Poster session is available for ISMB.
NH: Prefers a conference talk. Paper will require more finished
stable. Poster is too much work for little payoff.
AD: Ann L complains that the only paper to cite for das is an old
ref. Wants an updatable citable paper.
NH: CSB will publish a proceedings.
Genome informatics at CSHL (they don't publish though).
NH/GH: What's the best conference to get published in these days?
LS: ISMB
NH: We missed deadline for it.
LS: Biocurators meeting?
NH: Can ask Sima about. Another one: Computational Genomics (TIGR
sponsored). Not published. Submit abstracts, they select
talks. Halloween in Baltimore. If conf proceedings are published, you
can't submit to a paper, so we could go that way, get double mileage
out of it.
GH: Sounds good to get something ready for a paper rather than a
conference. Did a presentation at Bosc, Genome informatics last year.
[A] Nomi will help get paper ready for PLoS (after code sprint)
AD: Can do poster for ismb, bosc in Brazil, if I end up going.
NH: ISMB deadline is 10 May, so we should get going on it
GH: Continuation grant submission, in theory has been reviewed, but
haven't heard back. Maybe will take another month, to get score back.
Final word?
LS: Have you checked ERA Commons? They may update it there before you
get the note.
More information about the DAS2
mailing list