[BioSQL-l] how to quickly retrieve feature sequences

Hilmar Lapp hlapp at gnf.org
Sun Jun 20 09:21:28 EDT 2004


Gang,

do you want to do this in high-throughput? Otherwise you could use 
bioperl and bioperl-db as the language-binding and then use the bioperl 
object model to retrieve the information.

I'm away from my desk for a week, so I won't be able to elaborate 
further before the week after next week.

	-hilmar

On Tuesday, June 15, 2004, at 09:38  AM, Gang Wu wrote:

> Hi,
>
> I just loaded the 5 Arabidopsis thalian Genbank genome files into my
> sequence database(BioSQL 1.38). My question is: How can I efficiently
> retrieve all gene sequences from the database? I tried to do that by 
> joining
> seqfeature, seqfeature_qualifier_value, location, term and biosequence
> tables, but it turned out to be extremely slow(See the attached SQL, 2
> records take about 20 seconds on my Dell PowerEdge 2650 with dual 2.6G
> Xeons). Does anyone have a better way to do it?
>
> All I can imagine to do this faster is(by Java or other languages): 
> Pull all
> gene location info; Pull erlated sequence from biosequence table; 
> rotate
> through the gene location list and retrieve the substring of the 
> sequence.
> But this does not seem attractive for me since for different 
> applications, I
> have to write code to pull the sequences by myself. Is it possible to
> extend/modify the BioSQL schema to serve this purpose better?
>
> My understanding is that a lot subsequent applications would be only
> interested in certain pieces of the whole genome sequences and there 
> must be
> an efficient way to do that. If everyone has to invent their method, 
> the
> BioSQL might be a little bit too limited. Any idea on this?
>
> Gang
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the BioSQL-l mailing list