[DAS2] Notes from DAS/2 code sprint #2, day five, 17 Mar 2006

Sun Mar 19 23:54:36 EST 2006

Notes from DAS/2 code sprint #2, day five, 17 Mar 2006

$Id: das2-teleconf-2006-03-17.txt,v 1.2 2006/03/20 05:05:22 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  Dalke Scientific: Andrew Dalke (at Affy)
  UCLA: Allen Day, Brian O'Connor (at Affy)

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Agenda: 

* Status reports
* Writeback progress

Status reports:
---------------

gh: This is the last mtg of code sprint. For the status reports, focus
on where you are at and what you are hoping to accomplish post-sprint.

gh: working on version of affy server that impls das/2 v300 spec for
all xml responses. sample responses passed andrew's validation.
steve rolled it out to public server.

updated igb client to handle v300 xml.
worked more on server to impl v300 query syntax using full uri for
type segment, segment separate from overlaps and inside.
only impls a subset of the feature query. requires one and only one
segment, type, insides.

hoping todo for rest of sprint and after:
1. supporting name feat filters in igb client
2. remove restrictions from the server
3. making sure new version of server gets rolled out,
4. roll out jar for this version of igb. maybe put on genoviz sf site for
testing purposes.

bo: looked at xml docs that andrew checked in, updating ucla templates
on server, not rolled out to biopackages.net, waiting to make rpm,
hoping to do code cleanup in igb.
getting andrew's help running validator on local copy of server.

gh: igb would like to support v300, but one server is v200+ (ucla),
one at v300 (affy) complicates things. so getting your server good to
go would be my priority.

bo: code clean up involves assay and ontology interface.

gh: we're planning an igb release at end of march. as long as the code
is clean by then it's ok.

aday: code cleanup, things removed from protocol. exporting data
matrices from assay part of server.
validate sources document w/r/t v300 validator. work with brian to
make sure everything is update to v300. probably working on fiter
query, since we now treat things as names not full uri's.

ad: what extra config info do you need in server for that? can you get
it from the http headers?
gh: mine is being promiscuous, just name of type will work. might give
the wrong thing back, but for data we're serving back now, it can't be
wrong.

ad: how much trouble does the uri handling cause for you?

gh: has to be full uri of the type, doing otherwise is not an option
(in the spec).
ad: you could just use name internally, then put together full uri
when you go to the outside world.

ad: I updated comments in schema definitions, updated query lang
spec. string searches are substring searches not word-substring
searches. 
abc = whole field must be equal
*abc = suffix match
abc* = prefix match

previously said it was word match, but that's too complicated on
server.
worked with gregg to pin down what inside search means.

I'm thinking about the possibility of a validating proxy server,
configure das client to go through proxy before outside world, the
server would sniff everything going by.
Support for proxys can enable lots of sorts of things w/o needing
additional config for each client.

gh: how do you do proxy in java? i.e., redirect all network calls to a
proxy.
bo: there's a way to set proxy options via the system object in the
java vm. can show you some examples of this.

aday: performance.
gh: current webstart based ibg works with the existing public das/2
server, [comment pertaining to: the new version of igb and a new
version of the affy das/2 server].

ad: when will we get reference names from lincoln?
gh: should happen yesterday. poke him about this.
would be really nice to be able to overlay anotations!

The current version of igb can turn off v300 options, and then ti can
load stuff from the ucla server. The version of igb in cvs now can hit
both biopackages.net and affy server in the dmz. and there's
hardwiring to get things to overlay. temporary patch.

ee: two things:
1. style sheets. info from andrew yesterday. looking over that. will
   discuss questions w/ andrew.
2. making sure that when we do a new release of igb in a couple of
   weeks (when I'm not here) that it will go smoothly . go over w/
   gregg, steve. lots of testing.
made changes in parser code, should still work.

sc: I updated jars for das/1 not das/2 on netaffxdas.affymetrix.com.
ee: it's the das/1 I'm most concerned about.

sc: installed and updated gregg's new das/2 server on a publically
accessible machine (separate box from the production das/1 and das/2
servers on netaffxdas.affymetrix.com).
Also spent a time loading data for new affy arrays (mouse rat
exons). this required lots of memory, had to disable support for some
other arrays. [gregg's das servers load all annotations into memory at
start up, hance the big memory requirements for arrays with lots of
probe sets.]

[A] gregg optimize affy das server memory reqts for exon arrays.

gh: we' gotten a lot done this week. I think we have a stable spec.

gh: serving alignments, no cigars, but blat alignment to genome as
coords on mrna and coords on the genome. igb doesn't use it yet, but
it's there.
ad: xid in region elements.
gh: we haven't exercised the xids. so 'link' in das/1 is equivalent to
xid in das/2?
ad: yes. i believe
gh: if you have links in das/1. without links it can build links from
feature id using a template. This is used for building links from
within IGB back to netaffx, for example.

Topic: Writebacks
-----------------

gh: writebacks haven't been mentioned at all this week.
ad: we need people committed to writing a server to implement it.
gh: we decided that since ed griffith would be working on it at
Sanger, we wouldn't worry about it for ucla server.
bo: we started prototyping. locking mechanism. persisting part of a
mage document. the spec changed after that. andrew's delta model. a
little different from what we were prototyping.
actual persistence will be done in the assay portion of our server.
gh: grant focuses on write back for genome portion, and this was a big
chunk of the grant. ends in end of may or june.

ad: delta model was: here's a list of add, delete, modify in one
document. An issue was if you change an existing record, do you give
it a new identifier?
gh: you never modify something with an existing id, just make a new
one, new id, with a pointer back to old one. Ed Griffith said this a
month ago. I like this idea. but told we cannot make this requirement
on the database. but very few dbs will be writeback, so it's not
affecting all servers

ad: making new uris, client has to know the new uri for the old
one. needs to return a mapping document.
if network crashes partway through, client won't know mapping is and
will be lost.
gh: server doesn't know if client got it. it could act(?) back.

gh: if a response from http server dies, server has no way to know.
ad: There could be a proxy in the middle, or isp's proxy server. The
server sent it successfully to the proxy, but never made it to the
client. 

gh: how is this dealt with for commits into relational dbs? same thing
applies 
ad: don't know
ee: could ask for everything in this region.
ad: have a new element that says 'i used to be this'.
bo: you do an insert in a db, to get last pk that was issued. client
talks back to server, give me last feature uri that was provisioned on
my connection. so the client is in control.

sc: it's up to client to get confirmation from server. If it failed to
get the response after sending in the modification request, it could
request that the server send it again.

ad: (drawing on whiteboard) two stage strategy, get a transaction state.

     post "get transaction url"
    <---------------
    post (put?) to transaction URL
    ------------->
    can do multiple (if identical)
       ---------->
       ---------->
    Get was successful and here's transformation info
    <---------------

ad: server can hold transformation info for some timespan in case
client needs to re-fetch.

gh: I'm more insterested in getting a server up than a client
regarding writeback. complex parts of the client are already
implemented (apollo).

gh: locks are region based not feature based.
ad: can't lock...

gh: we can talk about how to trigger ucla locking mechanism.
bo: did flock transactional locking the suggested in perl
cookbook. mage document has content. server locks an id using flock,
(for assay das).
gh: to lock a region on the genome, lock on all ids for features in
this region.
bo: make a file containing all the ids that are locked. flock this
file. 

ad: file locking is frought with problems. why not keep it in the
database and let the db lock it for you. don't let perl + file system
do it for you. there could be fs problems. nfs isn't good at that. a
database is much more reliable.

bo: I went with perl flock mechanism since you could have other
non-database sources (though so far it's all db).

[A] steve, allen send brian code tips regarding locking.

gh: putting aside pushing large data chunks into the server, for
curation it's ok if protocol is a little error prone, since the
curator-caused errors will be much more likely/common.

ad: UK folks haven't done any writeback work as far as I know.
gh: they haven't billed us in 2 years. Tony cox is contact, ed
griffith is main developer.
ad: andreas and thomas are not funded by this grant or the next one.
gh: they are already funded by other means.

ad: if someone want's to change an annotation should they need to get
a lock first or can it work like cvs? do it if it can, get lock,
release lock in one transaction.
ee: that's my preference.

ad: if every feature has it's own id, you know if it's...

ee: some servers might not have any writeback facility at
all. conflicts will be rare.

[A] ask ed/tony on whether they plan to have any writeback facility

gh: ed g wanted to work on client to do writeback, don't know who
would work on a server there.
ad: someone else, can't remember - roy?
gh: unless we hear back from sanger, the highest priority for ucla
folks after updating server for v300, is working server-side
writeback. 

gh: spec freeze is for the read portion. the writeback portion will
have to change as needed.
ad: and arithmetic? ;-)