HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 155 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/42846 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
SIGMOD 201917 / 155
34
37
44
116
120
122
144
148
155
00:00
Data managementBenchmarkObject (grammar)Helmholtz decompositionData structureAlgorithmBenchmarkPerformance appraisalCholesky-VerfahrenLecture/Conference
00:15
Dynamic random-access memoryBenchmarkQuery languagePerformance appraisalBoolean algebraConstraint-ErfüllungHomomorphismusHill differential equationInheritance (object-oriented programming)Kolmogorov complexityP (complexity)Social classNetwork topologyConnected spaceCondition numberClique-widthCovering spaceFunction (mathematics)Helmholtz decompositionMaxima and minimaPerformance appraisalBoolean algebraVariable (mathematics)Condition numberAtomic numberQuery languageConnected spaceHelmholtz decompositionPoint (geometry)NP-completeData structureClique-widthSocial classArithmetic meanCASE <Informatik>Category of beingResultantComplex (psychology)Graph (mathematics)Power (physics)NumberGroup actionSet (mathematics)HypercubeConstraint (mathematics)Electronic program guideInstance (computer science)Function (mathematics)RootData loggerNeuroinformatikAreaNetwork topologyDatabaseAlgorithmCombinational logicWell-formed formulaMathematicsFirst-order logicVertex (graph theory)Theory of relativityOrder (biology)Graph (mathematics)MereologyHybrid computerState of matterOffice suiteComputer scienceLattice (order)Vulnerability (computing)Focus (optics)HomomorphismusRule of inferenceNormal (geometry)TheoryWeightExistential quantificationType theoryOptical disc driveCycle (graph theory)Computer animation
09:03
Degree (graph theory)Clique-widthP (complexity)MultiplicationSocial classCategory of beingLinear programmingTheoryRandom numberProcess (computing)BenchmarkTwin primeTotal S.A.Computer fontNumberAlgorithmFocus (optics)Personal digital assistantSimilarity (geometry)Cross-correlationMathematical analysisFraction (mathematics)ApproximationWeb pageLink (knot theory)Negative numberUsabilityImplementationExtension (kinesiology)Maxima and minimaCASE <Informatik>Incidence algebraNeuroinformatikCategory of beingClique-widthSocial classInstance (computer science)AlgorithmHelmholtz decompositionDegree (graph theory)Query languageTheoryRandomizationMultiplication signConstraint (mathematics)ResultantBound stateExtension (kinesiology)Cross-correlationMathematical analysisFraction (mathematics)BenchmarkNegative numberSoftware testingCovering spaceWeb pageSet (mathematics)NumberApproximationsalgorithmusDifferent (Kate Ryan album)Strategy gameDivisorType theoryGraph (mathematics)Hybrid computerMultiplicationPhysical lawFreewareWordCellular automatonObservational studyReal numberGraph (mathematics)Sheaf (mathematics)LoginArithmetic meanPairwise comparisonRight angleComputer animation
17:51
Data managementComputer animationLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:04
The object of this paper is hyperbench. It is a benchmark that we have assembled to evaluate algorithms for structural decomposition methods over hypergraphs. And in this talk, I'm going to focus mostly on the empirical findings.
00:20
So in computer science, we have three important problems. And namely, they are the evaluation of Boolean conjunctive query in the database area. And we have also constraint satisfaction problems in AI and the homomorphism problem. So when we have a careful look at these three problems, we can easily realize that they are essentially the same.
00:42
And basically, they are just the evaluation of first-order formula using only the existential quantifier and conjunction. And also, these three problems have another thing in common. And it's that we can just use hypergraph to represent their structure. So let's have a look at an example using
01:03
a conjunctive query. So when we have such a conjunctive query expressed here as a data log rule, we can just transform the query into a hypergraph. So for each variable, we introduce a node in the hypergraph. And for each atom, we introduce a hyper edge.
01:23
And the hyper edge, of course, groups together the nodes representing the variables of the query. And from this kind of structure, we can already see some important properties of the query, such as acyclicity, or in this case, we can see that the query has a cycle.
01:42
And this is very important for the complexity results of conjunctive query answering. Because we know that in the general case, answering a conjunctive query over a database is NP-complete. But there is a very well-known specific tractable fragment that is the one of acyclic conjunctive queries.
02:02
And so, of course, we are interested in finding even more large tractable fragments. And we can do this by exploiting the structure of the hypergraph, or equivalently of the query. So the first method that I want to introduce is the one of three decompositions.
02:21
So let's consider a more interesting query, and its associated hypergraph. And a three decomposition is basically a tree structure whose nodes are labeled by set of vertices, or equivalently variables, which we call bag, that must satisfy some conditions.
02:41
The first one is that the atoms, the variables that appear together in one atom must be covered by some bag. So for instance, if we have a look at the first atom, the atom A, we have that all these variables are covered here, or contained here.
03:01
And the same holds for the second atom, and for the third, and so on and so forth for all the atoms. Another important condition is the connectedness condition that says that if we focus our attention on a single variable, and we look at the bags in which this variable is contained,
03:22
these bags must form a connected sub-tree of the three decomposition. So again, for instance, if we focus on variable X, we have this path here on the left. If we focus on the variable F, we have again a connected sub-tree in the three decomposition, and this must hold
03:41
for all the variables appearing in the query. An important notion that we associate to the decomposition is the notion of width. And the width in the case of three decomposition is just the maximum size of a bag, minus one for a technical reason.
04:01
And so in this case, we have nine variables here, and so therefore the width is eight. And the width tells us how difficult is to answer the query by using this particular decomposition as a guide.
04:20
So three decompositions are well-suited for the case in which we consider graphs, normal graphs. But if we want to improve this notion of three decomposition over hypergraphs, we can use additional power. That is, we can use hyper-edges. And so in this case, I want to show you
04:42
a generalized hyper-three decomposition for this query. And so we can basically take this three decomposition and add edge labeling to the bags. Basically, we add set of atoms such that the atoms cover the bags. So for instance here, we have that the variables
05:03
appearing in the root here can be covered by the atom J. And the variables appearing in the child of the root can be covered by the combination of the two atoms, A and B, and so on and so forth. And so now we can actually change,
05:21
slightly modify our notion of width, and we can count the maximum number of atoms that we need to use to cover the bags. And in this case, we have that two atoms are sufficient to cover the bags of the decomposition.
05:42
So we can see that we have an improvement from the three decomposition to the generalized hyper-three decomposition, meaning that when we use the generalized hyper-three decomposition, we can evaluate the query faster, basically, more or less. And we can focus on a specific class
06:01
of generalized hyper-three decompositions that satisfy an additional special condition. And the special condition says that if there are variables that can be covered by an atom at some point in the tree, and these variables disappear, then those variables should not appear
06:23
anywhere else in the sub-tree rooted at this node. And this helps us in the computation of such decompositions. And again here, the notion of width is the maximum number of atoms that we need to cover the bags. So I shall also define what does it mean to compute,
06:44
what is the width of a hypergraph? And the width of a hypergraph is the minimum width over all the three hyper-three or generalized hyper-three decompositions, depending on the notion that we consider. So by the way we have defined them, we can easily see that the generalized hyper-three width
07:03
is the best we can hope for for these three notions that I've defined. And we have some nice results that is, if we consider classes of conjunctive queries for which the width is bounded,
07:21
and when I say width, I mean either three width or hyper-three width or generalized hyper-three width, then answering those kind of conjunctive queries is tractable. So a natural problem that arises here is the problem of checking if a hypergraph has low width.
07:41
So the check problem is defined in this way. We fix a notion of width in which we are interested in, and we fix a constant, k, that is our upper bound for the width. And so the algorithm shall say yes if the width is bounded by k,
08:01
and also should output the composition that witnesses that the width is bounded by k, or says no otherwise. So it's really important to know that here we fix the notion of width and also the upper bound, and what changes is the hypergraph.
08:22
So we have that for the notions that we have defined, the complexity of the check problem is different. In the case of three width and hyper-three width, the problem of checking for a lower bounded width is tractable, but in the case of generalized hyper-three width, the problem is NP-complete.
08:44
But nevertheless, last year in PODS, a paper has been presented in which there are, that says that we can restrict our attention to certain classes of hypergraphs with some structural restrictions. And for these classes, the problem of checking
09:03
bounded generalized hyper-three width is tractable. So the first class is that of bounded degree, and the degree here is the maximum number of incident edges on a single node, and in such a hypergraph it would be two, three, and four.
09:21
And the other class considers the maximum intersection of any pair of edges. So in such a hypergraph, this intersection would have value one, two, or three. And there is also a more general tractable class
09:42
that basically says that we can consider not only the intersection between any pair of edges, but we can fix a number C, and then we can consider the maximum size of the intersection of these C edges. And for all these classes, so hypergraphs satisfying
10:02
bounded degree property, bounded intersection property, or bounded multi-intersection property, checking for bounded generalized hyper-three width is tractable. So in our paper, we focused on some practical questions.
10:20
So we know all these theoretical results, and we want to see if we can actually use these properties for efficient computation of decompositions. So here I want to answer these four questions. So first, do hypergraphs of real-world instances have low degree or low width? And also, do these hypergraphs have low hyper-three width?
10:46
And also, can we use these properties to efficiently or practically compute generalized hyper-three decompositions? And also, once we can compute hyper-three width and generalize hyper-three width, we want to compare them.
11:00
I mean, how do they relate each other in practice? Do they differ? And if yes, how much? So first of all, we had to collect a set of conjunctive queries and CSP problems from actually real-world cases. So we have grouped our instances in three classes.
11:21
The first one is that of conjunctive queries, again, coming from public benchmarks and query logs. And also, we have a class of CSPs in which we put also CSP instances of coming from public benchmarks and also we have added some instances that we used in previous tests.
11:42
And also, we have a class of random CQs and CSPs that we use for comparisons. So to see what are actually the differences between real-world instances and synthetic instances. So for the first experiment, we wanted to check whether degree and intersection width are low in practice.
12:03
And we can see from here, I show you the percentage of instances for which the various properties are low. And for low, I mean less than or equal to five. And here, we can see that in general, all our classes have low intersection widths
12:21
for the intersection of two, three, or four hyper edges. When it comes to the degree, we can see that synthetic instances have high degree, but in particular, conjunctive queries have also relatively low degree. So yeah, sorry, first, I think I said that this is low,
12:41
but actually, I mean that they have high degree. Random instances have high degree. So for the second question, we wanted to check if hyper-three width is low in practice. And again, for low, I mean that it is less than or equal to five.
13:00
And here, I show you the number of instances and the percentage of instances for which hyper-three width is low. So we have that for the class of conjunctive queries, all of them have hyper-three widths less than or equal to three. So it's very, very low. For conjunctive, for constraint satisfaction problems,
13:21
we have that almost 60% of the instances have low width, while for the random instances, we have slightly worse results. But almost half of them have also low width. And then we go to the actual computation
13:40
of generalized hyper-three decompositions. And we wanted to exploit the bounded intersection property to write efficient algorithms. And we have actually implemented some algorithms using different strategies. And with these algorithms, in a time out of one hour,
14:01
we managed to compute the exact generalized hyper-three widths for more than half of the instances of the dataset. And when I say exact generalized hyper-three width, I mean that we not only have a lower bound, an upper bound, but we have also a lower bound. And therefore, we know exactly what's the value
14:22
for these hyper-graphs, the exact value of the width for those hyper-graphs. And also, by using the generalized hyper-three width as a lower bound for hyper-three width, we managed to even compute the exact hyper-three width for more instances that we were able to compute
14:41
using only the hyper-three decomposition algorithm. So, coming to the last questions, are generalized hyper-three width and hyper-three width similar? We know from the theory that the hyper-three width can be at most three times greater than the generalized hyper-three width,
15:01
but is this always the case? Well, we focused on the instances for which we know the generalized hyper-three width, and we have found out that in 99% of the cases, these two values are equal. So only in the case of 16 instances, hyper-three width is different than generalized hyper-three width.
15:25
So, in our paper, we have more results, and in particular, we have a correlation analysis between hyper-graph properties in which we show that low degree and intersection do not imply automatically low width, and thus, these properties that I've mentioned
15:44
are not trivial in this sense. And also, we have a completely new algorithm to compute generalized hyper-three decompositions, and this algorithm proved to be particularly effective for the computation of negative instances.
16:00
So, using this algorithm, we can assess lower bounds on the generalized hyper-three width, and we have also some approximation algorithms for fractional hyper-three width in which we take an existing hyper-three decomposition and we try to fractionally improve it by adding fractional edge covers. And all of the results in the dataset
16:23
are available at the webpage here, and you can actually navigate through the dataset and to check widths for the hyper-graph, you can also download the instances if you want to test your hyper-graph's
16:40
decomposition algorithms. So, to conclude, what did we do? We have collected a big set of conjunctive queries and CSPs, and we have performed an extensive set of experiments, and we have found out that degree and intersection width are low and usable in practice.
17:01
Also, the hyper-three width is low for most of the instances, and the algorithms for computing generalized hyper-three decompositions are effective and efficient in the sense that they can be used to actually compute at least some lower and upper bounds. And also, for most of the instances,
17:20
in reality, we have that hyper-three width and generalized hyper-three width coincide. And for future work, we would like to implement the other properties in our algorithms for generalized hyper-three decompositions, and also we would like to find the exact hyper-three width and generalized hyper-three width for the rest of the instances,
17:41
and finally, we would like to extend our dataset. And this is all, thank you.