[BioSQL-l] how to quickly retrieve feature sequences
Hilmar Lapp
hlapp at gnf.org
Sun Jun 20 09:21:28 EDT 2004
Gang,
do you want to do this in high-throughput? Otherwise you could use
bioperl and bioperl-db as the language-binding and then use the bioperl
object model to retrieve the information.
I'm away from my desk for a week, so I won't be able to elaborate
further before the week after next week.
-hilmar
On Tuesday, June 15, 2004, at 09:38 AM, Gang Wu wrote:
> Hi,
>
> I just loaded the 5 Arabidopsis thalian Genbank genome files into my
> sequence database(BioSQL 1.38). My question is: How can I efficiently
> retrieve all gene sequences from the database? I tried to do that by
> joining
> seqfeature, seqfeature_qualifier_value, location, term and biosequence
> tables, but it turned out to be extremely slow(See the attached SQL, 2
> records take about 20 seconds on my Dell PowerEdge 2650 with dual 2.6G
> Xeons). Does anyone have a better way to do it?
>
> All I can imagine to do this faster is(by Java or other languages):
> Pull all
> gene location info; Pull erlated sequence from biosequence table;
> rotate
> through the gene location list and retrieve the substring of the
> sequence.
> But this does not seem attractive for me since for different
> applications, I
> have to write code to pull the sequences by myself. Is it possible to
> extend/modify the BioSQL schema to serve this purpose better?
>
> My understanding is that a lot subsequent applications would be only
> interested in certain pieces of the whole genome sequences and there
> must be
> an efficient way to do that. If everyone has to invent their method,
> the
> BioSQL might be a little bit too limited. Any idea on this?
>
> Gang
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list