<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>MOBY Use Case Wormbase</title>
<meta http-equiv="content-type"
content="text/html; charset=ISO-8859-1">
<meta name="author" content="Fiona Cunningham">
</head>
<body>
<br>
<h1>Wormbase Use Cases </h1>
<h2><A NAME= Wormbase>Name: </a>Wormbase enzyme information.</h2>
<P><B>Scenario Reference: </B><A HREF='fc_wb_scenarios.html#ScenarioWBenz'> Wormbase enzyme information. </a></P>
<P><B>Problem: </B>Find all the enzymes on a particular chromosome. </P>
<hr width="10%" align="left">
<h2> Background Knowledge: </h2>
<p>Wormbase is an online database for the genome and biology of C. elegans <A HREF="http://www.wormbase.org">(www.wormbase.org)</A></p>
<P> Gene Ontology (GO) terms <A HREF = 'http://www.geneontology.org'> (www.geneontology.org) </A> are designed to allow high-level groupings of sequences based on function rather than sequence identity.</p>
<hr width="10%" align="left">
<h2> Primary Actors: </h2>
<ol>
<li>A biologist</li>
<li>A worm sequence data provider such as Wormbase
        e.g. <a HREF='http://www.wormbase.org'>(http://www.wormbase.org/)</a>
        </li>
</ol>
<hr width="10%" align="left">
<h2>Other Actors: </h2>
<an exhaustive list of other entities which must participate in
the flow of events to achieve this result>
<hr width="10%" align="left">
<h2>Initial State and Preconditions: </h2>
<p> The service provider must have C.elegans genome and enzyme information.</p>
<hr width="10%" align="left">
<h2> End Result: </h2>
<p>
Success: The biologist receives a list of enzymes on a particular chromosome with their features.
<br>
Fail: The biologist fails to find the information required. Either the biologist cannot find the correct data provider, the data provider does not have the information or the biologist cannot formulate his query correctly. </p>
<hr width="10%" align="left">
<h2> Current Workflow: </h2>
<ol>
        <li>Search to find a database containing C. elegans genome information. </li>
        <li>Find the link to the relevant search page.</li>
        <li>Read the literature on the search page to determine whether it can do the query.</li>
        <li>Enter the query enzyme name in the search box (Search on the home page for Wormbase).</li>
        <li>The data provider returns a list of hits.</li>
        <li>To see if the hits returned can be brought under one grouping, click on one of the Gene Ontology (GO) terms.</li>
        <li>This will return the GO term and its hierarchy.</li>
        <li>Click on the a general GO term further up the hierarchy that should represent the whole group and the GO provider should return a summary of the number of sequence that are connected to this term.</li>
        <li>Click on the number of sequences to expand the list.</li>
        <li>Cut and paste the list into a text file. (WormBase does not
have tools for cross-comparing and processing big lists of sequences). </li>
        <li>Extract the gene names (e.g "AC3.2") from the added text (e.g. "UDP-glucuronosyltransferase") either by hand or with a perl or other script.</li>
        <li>Find the correct search tool on wormbase that will allow you to find all the CDs on a particular chromosome (Genome Dumper on Wormbase).</li>
        <li>Scan (or use the Unix function grep) the CDs and select those that are in the list of enzyme gene names.</li>
        <li> This will leave a list of chromosome specific gene names for the enzyme.</li>
        <li>Find an advanced search page (Batch Retrieval on Wormbase) to get more details about the genes.</li>
        <li>Cut and paste the entire list of genes into the "Genes or Loci" window and select the buttons corresponding to the information you require (e.g. Locus, Gene,RNAi, Brief Identification, Prominent Motifs etc).</li>
        <li>Submit query and the data provider should return a table showing the information it has for these features.
</ol>        
<br>
<hr width="10%" align="left">
<h2> Existing Workflow Limitations: </h2>
<p> The biologist must know the names of the appropriate databases and where they reside, or do a Google, PubMed or similar search, to find them. It is possible he finds an existing database but that it does not provide the facility to do his query.</p>
<p> Different sections of the query require different search pages in Wormbase. This can be confusing and time consuming. </p>
<P> WormBase right now does not have particularly useful embedded tools for cross-comparing and processing big lists of sequences.</P>
<hr width="10%" align="left"><br>
<h2>Existing Workflow Exemplars: </h2>
<P>The web-based nature of the data means that the information is kept up to date and the biologist does not need to maintain the UI.</p>
<br>
<hr width="10%" align="left">
<h2> MOBY Workflow: </h2>
<ol>
        <li>The biologist asks MOBY which are the known service providers that could answer his query, given an input of a locus name.</li>
        <li>MOBY directs the biologist straight to the search page and informs the biologist which format the input data must have.</li>
</ol>
<hr width="10%" align="left">
<h2>MOBY Workflow Limitations: </h2>
<p>MOBY must keep up to date with changes in the UI and data presented by the database. </p>
<hr width="10%" align="left">
<h2> MOBY Workflow Exemplars: </h2>
<P>The advantage of having MOBY here is it that the user would no longer have to search for a service provider, hunt and digest the search pages, work out if there is away to formulate his query. All this could be done at the MOBY level.</P>
<hr width="10%" align="left">
<h2> Discussion: </h2>
<hr width="10%" align="left">
<h2> Priority:</h2>
<p>Mandatory. Moby should have an index and a description of each possible database site and be able to direct the biologist straight to the search page that best fits his/her needs.</P>
<hr width="10%" align="left">
<h2> Key References: </h2>
<p>
Wormbase <a HREF='http://www.wormbase.org/'>(http://www.wormbase.org/) </A></p>
<hr>
</body>
</html>