[Bioperl-l] bioperl based database infrastucture for directed graphs

Sendu Bala bix at sendu.me.uk
Wed Jan 9 13:59:08 UTC 2008


Robson Francisco de Souza wrote:
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,

I'm using Bio::DB::SeqFeature for that purpose, but just a warning: I 
found that with millions of features it made a db that was too large in 
terms of disc space and too slow in terms of query time. I had to hack 
out its storage of feature objects in the db, instead generating feature 
objects on request from the stored attributes. Doing this turned out to 
be faster than simply unfreezing certain kinds of feature objects!

(I also had to hack in support for retrieval by source, a patch that 
Lincoln hasn't gotten back to me about yet.)

While I can't answer your main questions, I wish you good luck with your 
project and request that you keep us posted with what you achieve.



More information about the Bioperl-l mailing list