<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <title>MOBY GetSNPs use case</title>

  <meta http-equiv="content-type"

 content="text/html; charset=ISO-8859-1">

  <meta name="author" content="Fiona Cunningham">

</head>

  <body>

<br>

<h1>Haplotype Mapping Project Use Cases </h1>

<hr>

<h2> <A NAME= GetSNPsForGene>  Name: GetSNPsForGenes</a></h2>

<P><B>Scenario Reference: <a HREF='fc_hapmap_scenarios.html'></B>Find SNPs in a specific locations</a></P>

<P><B>Problem: </B>Find the SNP(s) that are near a given gene.</P>

&nbsp;<br>

<hr width="10%" align="left"> 

<p>  </p>

<p> </p>

<h2>   Background Knowledge: </h2>

<p>SNPs (single nucleotide polymorphisms) are single base variations in DNA that occur in the population at a frequency of greater than 1%.  </p>

<p>As well as being useful genetic markers, they have been associated with disease. The SNP consortium and dbSNP currently hold over a million SNPs.  Having finished the allele frequency project, there is now a new Haplotype Mapping project that has just been launched to characterise the haplotype blocks in several different human populations.</p>

<hr width="10%" align="left">

<h2> Primary&nbsp; Actors: </h2>

<ol>

        <li>A biologist</li>

        <li>A known SNP data provider such as dbSNP 

<a HREF='http://www.ncbi.nlm.nih.gov/SNP/'>(http://www.ncbi.nlm.nih.gov/SNP/)</a>

        or The SNP Consortium (TSC) <a HREF='http://snp.cshl.org/'>

(http://snp.cshl.org/)</a>

        </li>

</ol>

<p><b>Other&nbsp; Actors: </b>

&nbsp;&lt;an exhaustive list of other entities which must participate in

the flow of events to achieve this result&gt;</p>

<hr width="10%" align="left">

<h2>Initial State and preconditions: &nbsp;</h2>

<p> The biologist has the name of one or two genes, STS markers, or genomic positions. </p>

<p> The SNP provider has a database of SNPs, their genomic position and other associated information (e.g. type of amino acid changes) necessary to satisfy the query.</p>

<hr width="10%" align="left">

<h2>    End Result: &nbsp;</h2>    

<p>Biologist gets a list of SNPs and their characteristics that fulfill his criteria. </p>

<hr width="10%" align="left">

<h2>    Existing Workflow: </h2>

     <ol>

        <li>The biologist browses to a known SNP database.</li>

        <li>Biologist finds the link to the relevant search page.</li>

        <li>Biologist reads literature on the search page to determine whether he can do the query he wants.  If not he has to find another SNP database and start again.</li>

        <li>Biologist works out how to fill in the CGI form to search using the correct criteria (e.g. by gene name, chromosomal positions, STS markers).  If he wants only SNPs that have allele frequency data or that cause non-synonymous amino acid change (or other types of change) he enters these criteria as well.</li>

        <li>The web page is returned with a list of SNPs that matches the search specifications.  Each SNP links up to a page with information requested.</li>

        <li>On the TSC website the biologist can select a subset of the SNPs and request a table dump of the information or you can browse the each SNP's genomic region individually.

     </ol>        

<br>

<hr width="10%" align="left">

<h2>    Existing Workflow Limitations: </h2>

<p>  The biologist must know the names of the SNP databases and where they reside, or do a Google, PubMed, or similar search, to find them. </p>

<p> The SNP Consortium and dbSNP currently have very different search page layouts.  For the novice, it is time consuming to find and understand the way these pages are laid out.  They also hold different data.  The TSC has more specific SNP data but with dbSNP it is easier to link to the wealth of other scientific information stored on the NIH website.  As a result, a biologist might need to flick back from one to the other depending on the set of queries he wishes to undertake. In addition, this has the added complication that the results are in different formats.</p>

<hr width="10%" align="left"><br>

<h2>Existing Workflow Exemplars: </h2>

<P>The web-based nature of the data means that the information is kept up to date and the biologist does not need to maintain the UI.</p>

<hr width="10%" align="left">

<h2>    MOBY Workflow: </h2>

<ol>

        <li>The biologist asks MOBY which are the known SNP databases.</li>

        <li>The biologist asks MOBY which of the SNP db can perform a query given his input data type.</li>

        <li>MOBY directs the biologist straight to the search page and informs the biologist the format the input data must have.</li>

        <li>The rest of the query proceeds as before.</li>

</ol>

<hr width="10%" align="left">            

<h2>MOBY Workflow Limitations: &nbsp;</h2>        

<p>MOBY must keep up to date with changes in the UI and data presented by each SNP database. </p>

<hr width="10%" align="left">

<h2>    MOBY Workflow Exemplars: &nbsp;</h2>

<P>The advantage of having MOBY here is it that the user would no longer have to search for the SNP databases, hunt and digest the search pages, work out if there is away to formulate his query.  All this could be done at the MOBY level.</P>

<hr width="10%" align="left">

<h2>     Discussion: &nbsp;</h2>

<hr width="10%" align="left">

<h2>     Priority:</h2>

<p>Mandatory. Moby should have an index and a description of each possible database site and be able to direct the biologist straight to the search page that best fits his/her needs.</P>

<hr width="10%" align="left">

<h2>   Key References:        </h2>

<p>

dbSNP: <a HREF='http://www.ncbi.nlm.nih.gov/SNP/'>(http://www.ncbi.nlm.nih.gov/SNP/)</a>

     </p>

<p> The SNP Consortium: <a HREF='http://snp.cshl.org/'>(http://snp.cshl.org/) </A></p>

</body>

</html>