[BioSQL-l] ontology for transitive closure table
Hilmar Lapp
hlapp at gnf.org
Thu Mar 27 18:00:35 EST 2003
So, I've tried solving this straightforwardly on the OntologyI API
level, exploding the graph into an array of Bio::Ontology::Path objects
that I can then serialize, but I'm afraid that given that this is still
a transitive closure computation which has time complexity O(V^3), and
given your results, this is probably just not going to complete in
finite time. So, unless someone has a flash of brilliance (I don't),
I'm going to table this until maybe your DBIx::Graph can do the job.
Until then, we need to either have a method on the OntologyAdaptor that
asks the driver to do this in SQL, or have a completely separate
script, both of which requires the client to explicitly trigger the
operation. Sadly, MySQL doesn't have stored procedures nor triggers, so
taking care of it that way would be neat but would leave MySQL behind.
The post-processing thing you mention is not going to be very easy. I'm
afraid this whole ontology support is like a huge button reading 'press
here to tickle the next bug'.
How did the Biojava guys solve this again? Do you update the path table
after every single manipulation of the relationships in an ontology? Or
is there a button you have to press in order to get the path table
back in sync? Have you tried computing the transitive closure for GO in
memory? What do you do with paths that have both is-a and part-of?
Should we by default add a relationship (is-a,is-a,part-of)?
-hilmar
On Thursday, March 27, 2003, at 12:38 PM, Aaron J Mackey wrote:
>
> On Thu, 27 Mar 2003, Hilmar Lapp wrote:
>
>>>> my $dbigraph = new DBIx::Graph @params;
>>>> my $graph = $dbigraph->graph;
>>>>
>>>> my $tc = $graph->transitive_closure;
>>>
>>> So this bit now works for me (including being able to access all
>>> various
>>> "paths" described by a given transitive closure edge, useful for
>>> determining predicate type of the new edge).
>>
>> what is $tc? a reference to an array?
>
> $tc is a Graph::Directed object.
>
>> Do you have some place where one can checkout the code?
>
> No, not yet (dbix-graph is on sourceforge, but is currently empty)
>
>> Also, Graph.pm
>> has
>>
>> $G->has_path($u, $v, ...)
>>
>> Return true if the graph $G has the cycle defined by the
>> vertices $u, $v, ..., false otherwise.
>>
>> Didn't this help? Maybe it didn't because you need to enumerate all
>> vertices along the path then? (I haven't used that method)
>
> No, that doesn't help with the speed issue.
>
>> How do you deal with paths that have heterogeneous predicates, i.e.,
>> both is-a and part-of? In Singapore we defined that the predicate of
>> the path is the GCD, but there is no common denominator defined yet
>> for
>> those two predicates. Did you define one behind the scenes (like
>> related-to)?
>
> The changes I've made to Graph::Directed are to simply store the set of
> vertices that were traversed (i.e. the actual "path") to generate a new
> edge in the transitive closure; of course, there may be multiple paths,
> and they're all available to you. You may now do whatever
> introspection
> you like to "label" the edges as you wish. This was basic step #1 to
> get
> to what you're talking about (which may involve looking at all possible
> paths between A and B, and deciding a) whether the edge should exist at
> all and b) which "labels" the edge should receive). So a
> Bio::Whatever::RelationshipI manager might use DBIx::Graph and
> Graph::Directed to a) fetch the vertices and edges from a database,
> and b)
> compute the "raw" transitive closure and would then go off and
> "post-process" the tc graph before saying "I'm done, save this away".
>
>>>> Slightly OT, but if I were to implement our own Graph.pm-like object
>>>> via
>>>> an Inline::C-wrapped graph library, would people be keen on that?
>
>> The problem is its more a hassle for you than for people, because if
>> someone can't compile the code on his specific platform, or hits a
>> dynlink error, or hits a segfault, then most likely it is going to be
>> you chasing down subtle allocation/de-allocation bugs, or stupid
>> OS/perl version differences or whatever.
>
> I'll remind you of what I said:
>
>>> But, computing the tc via
>>> Graph::Directed on GO isa/partof is extraordinarily slow (I've waited
>>> more
>>> than 4 hours now; by comparison, an SQL-only solution finishes in
>>> less
>>> than 20 minutes).
>
> I think people are going to have bigger problems if we promise them a
> solution that only works for graphs that don't include GO. It may
> compile
> just fine on WeirdOS, but it certainly won't finish.
>
> What I'm envisioning is a drop-in replacement for the pure-perl
> Graph.pm
> that only implements Transitive_Closure_Floyd_Warshall, but in C. You
> can
> go grab it (and try to compile it) if you need it, but you get to start
> out by using Graph.pm
>
> Put another way, I *have* to write this thing for my own use, and will
> go
> to the extra trouble of packaging a wrapper-class to switch between the
> two options if people foresee their own needs as similar.
>
> Your arguments about maintaining a C package are equally troublesome
> for
> perl packages (people on foreign machines with different library
> versions
> having problems that you need to address), so I don't really see any
> difference there.
>
> -Aaron
>
> --
> Aaron J Mackey
> Pearson Laboratory
> University of Virginia
> (434) 924-2821
> amackey at virginia.edu
>
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list