[Bioperl-l] bioperl based database infrastucture for directed graphs

Wed Jan 9 16:52:21 UTC 2008

[cc-d to gmod-schema]

Chado does have some views and pg functions for interval-based  
retrieval. AFAIK there are no accelerators for deep feature graphs,  
as most chado users have relatively shallow gene-model/SO feature  
graphs. It may not be so hard to extend cvterm code for doing this,  
depending on the characteristics of your graphs (the closure of  
feature neighbourhood graphs may be particularly large)

On Jan 9, 2008, at 5:20 AM, Robson Francisco de Souza wrote:

> Hello All!
>
> Greetings for everybody and happy new year for those following an
> western calendary!
>
> I'm starting a new project to store and analyze distinct sets of
> sequence annotation data which are related in a way suitable for
> representation in a directed (e.g. transcript splicing) or undirected
> (e.g. gene product interaction) graph. Analysis will require frequent
> queries based on interval overlaps, feature neighbourhood, annotation
> and, most importantly, feature relationships and stored paths.
>
> At first, I thought of build an entire new database structure to store
> project specific data (e.g. alternative splicing or protein  
> interaction),
> but as I have some experience with Lincon's
> Bio::DB::SeqFeature::Store, I'm now considering extending it for the
> purpose of storing graphs describing relationships among features.
>
> I'm aware that some other bioperl related databases, specifically
> BioSQL and Chado, do have  components which might be suitable for
> storing all or some of these data but, since Lincon's feature storage
> and interval binning implementations in
> Bio::DB::SeqFeature::Store::mysql are both clean, simple and very  
> fast,
> perhaps extending it in a seemingly modular way is desirable. A good
> extension to Lincon's database could include tables like
> feature_relationship and feature_path, for edges and transitive
> closures (just like in BioSQL) and feature_stored_path, for exclusion
> of biologically irrelevant paths in DAGs, like certain splicing
> isoforms. These tables could be used  to store sequence assemblies or
> EST alignments efficiently, including scaffolds inferred by connecting
> contigs.
>
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,  then
> again I might consider extending Bio::DB::seqFeature
> and contributing such extensions back to bioperl...
>
> Any thoughts?
>
> Best regards,
> Robson
>
> PS: sorry if anyone gets two copies of this post, but took me some
> time to realize my new e-mail wasn't subscribed to bioperl-l...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>