[BioSQL-l] how to quickly retrieve feature sequences

Gang Wu gwu at molbio.mgh.harvard.edu
Tue Jun 15 13:12:36 EDT 2004


Just forgot to attach the SQL.

=========================================
ATTACHMENT 1
=========================================
CREATE TABLE `term_relationship_term` (
  `term_relationship_id` int(11) NOT NULL default '0',
  `term_id` int(11) NOT NULL default '0',
  PRIMARY KEY  (`term_relationship_id`,`term_id`),
  UNIQUE KEY `term_relationship_id` (`term_relationship_id`),
  UNIQUE KEY `term_id` (`term_id`)
) TYPE=InnoDB;
========================================

Gang


-----Original Message-----
From: biosql-l-bounces at portal.open-bio.org
[mailto:biosql-l-bounces at portal.open-bio.org]On Behalf Of Gang Wu
Sent: Tuesday, June 15, 2004 12:38 PM
To: biosql-l at open-bio.org
Subject: [BioSQL-l] how to quickly retrieve feature sequences


Hi,

I just loaded the 5 Arabidopsis thalian Genbank genome files into my
sequence database(BioSQL 1.38). My question is: How can I efficiently
retrieve all gene sequences from the database? I tried to do that by joining
seqfeature, seqfeature_qualifier_value, location, term and biosequence
tables, but it turned out to be extremely slow(See the attached SQL, 2
records take about 20 seconds on my Dell PowerEdge 2650 with dual 2.6G
Xeons). Does anyone have a better way to do it?

All I can imagine to do this faster is(by Java or other languages): Pull all
gene location info; Pull erlated sequence from biosequence table; rotate
through the gene location list and retrieve the substring of the sequence.
But this does not seem attractive for me since for different applications, I
have to write code to pull the sequences by myself. Is it possible to
extend/modify the BioSQL schema to serve this purpose better?

My understanding is that a lot subsequent applications would be only
interested in certain pieces of the whole genome sequences and there must be
an efficient way to do that. If everyone has to invent their method, the
BioSQL might be a little bit too limited. Any idea on this?

Gang

_______________________________________________
BioSQL-l mailing list
BioSQL-l at open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l



More information about the BioSQL-l mailing list