[Bioperl-l] bioperl based database infrastucture for directed graphs

Wed Jan 9 15:00:38 UTC 2008

On Jan 9, 2008, at 7:59 AM, Sendu Bala wrote:

> Robson Francisco de Souza wrote:
>> Before starting, I would like to know if the BioSQL and Chado  
>> schemata
>> do have accelerators for quering intervals among billions of features
>> and feature relatioships (some examples using these databases would
>> also help, if they that these databases are efficient for such  
>> tasks).
>> If these or other databases are not as suitable as  
>> Bio::DB::SeqFeature
>> for feature retrieval based on interval overlap and attributes,
>
> I'm using Bio::DB::SeqFeature for that purpose, but just a warning:  
> I found that with millions of features it made a db that was too  
> large in terms of disc space and too slow in terms of query time. I  
> had to hack out its storage of feature objects in the db, instead  
> generating feature objects on request from the stored attributes.  
> Doing this turned out to be faster than simply unfreezing certain  
> kinds of feature objects!

Would this be Bio::SF::Annotated objects? If so I bet Storable is  
storing the OntologyStore object information along with the SF (which  
argues for refactoring the FeatureIO/Bio::SF::Annotated stuff in 1.7).

Not sure what can be done about that beyond your hack, though it might  
be worth exploring whether one can optionally set the DB::Store to  
store the object instance.

> (I also had to hack in support for retrieval by source, a patch that  
> Lincoln hasn't gotten back to me about yet.)
>
> While I can't answer your main questions, I wish you good luck with  
> your project and request that you keep us posted with what you  
> achieve.

You can always try Lincoln on the GBrowse list as well.  I would say  
go ahead and commit the patch if it isn't a big deal.

chris