On the Enumeration Complexity of Unions of Conjunctive Queries
Video in TIB AVPortal:
On the Enumeration Complexity of Unions of Conjunctive Queries
Formal Metadata
Title 
On the Enumeration Complexity of Unions of Conjunctive Queries

Title of Series  
Author 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2019

Language 
English

Content Metadata
Subject Area  
Abstract 
We study the enumeration complexity of Unions of Conjunctive Queries (UCQs). We aim to identify the UCQs that are tractable in the sense that the answer tuples can be enumerated with a linear preprocessing phase and a constant delay between every successive tuples. It has been established that, in the absence of self joins and under conventional complexity assumptions, the CQs that admit such an evaluation are precisely the freeconnex ones. A union of tractable CQs is always tractable. We generalize the notion of freeconnexity from CQs to UCQs, thus showing that some unions containing intractable CQs are, in fact, tractable. Interestingly, some unions consisting of only intractable CQs are tractable too. The question of a finding a full characterization of the tractability of UCQs remains open. Nevertheless, we prove that for several classes of queries, freeconnexity fully captures the tractable UCQs.

00:00
Data management
Query language
Kolmogorov complexity
Multiplication sign
Magnetooptical drive
Query language
Enumerated type
Mathematical optimization
00:16
Scheduling (computing)
Broadcast programming
Algorithm
Consistency
Relational database
Projective plane
Electronic mailing list
Database
Performance appraisal
Query language
Personal digital assistant
Query language
Table (information)
00:48
Complexity class
Scheduling (computing)
Logical constant
Broadcast programming
Table (information)
Algorithm
Consistency
Kolmogorov complexity
Electronic mailing list
Set (mathematics)
Database
Data model
Performance appraisal
Query language
Electronic meeting system
Query language
Resultant
01:29
Logical constant
Complexity class
Complex (psychology)
Clique problem
Boolean algebra
Table (information)
Multiplication sign
Hypercube
Data model
Preprocessor
Performance appraisal
Term (mathematics)
Query language
Spacetime
output
Multiplication
Mathematical optimization
Linear map
Logical constant
Kolmogorov complexity
Preprocessor
Query language
Linearization
Speech synthesis
output
Table (information)
Matrix (mathematics)
02:22
Complexity class
Boolean algebra
Clique problem
Dichotomy
Query language
Numbering scheme
Data structure
Multiplication
Matrix (mathematics)
Hypercube
02:52
Complexity class
NPhard
Dichotomy
Query language
Personal digital assistant
Personal digital assistant
Content (media)
03:40
NPhard
Constraint (mathematics)
Query language
Personal digital assistant
Equivalence relation
04:25
NPhard
Query language
Personal digital assistant
Personal digital assistant
Quicksort
Equivalence relation
Equivalence relation
04:53
NPhard
Category of being
NPhard
Query language
Personal digital assistant
Isomorphieklasse
Personal digital assistant
Gradient
Projective plane
Bound state
Mereology
06:03
NPhard
Query language
Multiplication sign
Personal digital assistant
06:26
Logical constant
Multiplication sign
Uniqueness quantification
Gradient
Letterpress printing
Black box
Streaming media
Neuroinformatik
Preprocessor
Process (computing)
Roundness (object)
Query language
Queue (abstract data type)
Linearization
Queue (abstract data type)
Table (information)
Spacetime
08:52
NPhard
Proof theory
NPhard
Dichotomy
Query language
Content (media)
09:14
Logical constant
Point (geometry)
NPhard
Slide rule
Context awareness
Boolean algebra
Untere Schranke
Wage labour
Multiplication sign
1 (number)
Computational complexity theory
Quadratic equation
Preprocessor
Causality
Atomic number
Matrix (mathematics)
Reduction of order
Codierung <Programmierung>
Multiplication
Theory of relativity
Military base
Bound state
Database
Line (geometry)
Variable (mathematics)
P (complexity)
Subject indexing
Message passing
Query language
Linearization
output
Right angle
Table (information)
12:16
NPhard
Multiplication
NPhard
Cube (algebra)
Matrix (mathematics)
Resultant
12:43
Context awareness
Algorithm
Freeware
Graph (mathematics)
Information
Multiplication sign
Forcing (mathematics)
Combinational logic
Database
Variable (mathematics)
Variable (mathematics)
Connected space
Subset
Category of being
Network topology
Query language
Network topology
Free variables and bound variables
Homomorphismus
Vertex (graph theory)
Freeware
Homomorphismus
Condition number
14:49
Logical constant
Group action
Freeware
Logical constant
Multiplication sign
Lemma (mathematics)
Combinational logic
Enumerated type
Variable (mathematics)
Variable (mathematics)
Number
Neuroinformatik
Preprocessor
Mathematics
Process (computing)
Query language
Personal digital assistant
Linearization
Homomorphismus
Stability theory
16:17
Area
NPhard
Raw image format
NPhard
Information
Online help
Multiplication sign
Projective plane
Content (media)
Mereology
Variable (mathematics)
Subset
Dichotomy
Query language
Personal digital assistant
Different (Kate Ryan album)
Function (mathematics)
Matrix (mathematics)
Multiplication
Extension (kinesiology)
18:02
NPhard
Proxy server
Online help
Multiplication sign
1 (number)
Variable (mathematics)
Axonometric projection
Subset
Process (computing)
Hardy space
Query language
Personal digital assistant
Isomorphieklasse
Query language
Queue (abstract data type)
Extension (kinesiology)
18:34
Latent heat
Query language
Personal digital assistant
Sheaf (mathematics)
Data storage device
Spacetime
Bound state
Reduction of order
Spacetime
00:03
so let's start by understanding their goal the goal is to understand which
00:08
unions of conjunctive queries can be answered
00:11
with what can be seen as optimal a finegrained time guarantees
00:17
so 1st of all it's understand the queries that were standing and this is a standard relational database that describe spots in this case so we have a table for the tutorials and we have a table for the schedule and we can ask this conjunctive query that that though joins these 2
00:35
tables and projects out the title and this table gives us a list of speakers per day also assume we have another table with the research docks and we can
00:48
ask a similar conjunctive query that joins the research talks with the schedule and gives us a list of speakers of the research stocks per day alright if we're interested in the union of all of these answers we just want the speakers then we can ask the union of conjunctive queries and the answers to that the union of the answers to individual conducted courts the the the queries were studying and we want to know how how efficiently can we solve such queries because this is the class we want
01:18
to handle so we should mention the settings where using setting similar to that to previous results were standing we're extending that their RAM on also we can
01:30
access look tables in constant time we're talking in our data complexity terms so that queries considerate of constants I and now what's the optimal time guarantees we need to read the entire input before we know if there's a 1st answer or not and we want to print all of the solution I so we need constant time 4 and so this is why we define mightily the in the class of all problems all queries that can be answered with linear preprocessing time before 1st answer and then constant delay between the answers no for this work we got we we focus on our time complexity and we ignore speech we want to know which queries are in this class see for the congenital queries we already
02:19
have a known answer other queries that are in this class are exactly the queries
02:24
that have a structure that is called free conic and I'm going to explain what free on means half later and so not and I worked with to decide academy the foreign we extended it for for wit when their functional
02:38
dependencies in the scheme and we thought what would be the next entrant interesting step and we decided to look into use accuse because it's it's an interesting class of queries it's equivalent to 2 positive original under wraps
02:52
so and now we have our goal of formal I so we can talk about it we want to know which unions of conjunctive queries are in this class LC's so when to stop when I mean that the query is easy I mean that it's in the really and when I say that it's hard I mean that it's not
03:10
let's start with an overview of of of of of over Warwick and then I'm going to have to go into some explanations in more detail cases of unions of conjunctive queries the 1st case is that all queries are easy and quite easy to show that if every query in the union is easy then and the entire Union is if every queries and delay seeing the entire Union isn't it singing so this case is always now it gets
03:42
interesting where some query union is hot if you just jot down the 1st query that comes into your mind when you think about that a union of a hard querying an easy query it is it will most likely be a hard but this also be easy
04:02
so the easy for the simple fact that some unions that contain a hard query are equivalent to unions that don't contain hard query have for that if you consider this example of the the 1st query is hard and the 2nd queries easy but the 1st query is basically just like the 2nd query with another constraints on 3 so
04:25
every answer to the 1st query is also an answer to the 2nd query in the entire Union is the largest equivalence to to the 2nd query which is easy so the entire Union is easy so this is not that interesting to to discuss this sort of cases and it makes sense to say OK
04:44
what happens with nonredundant union and so it is sometimes easy but what happens if you have a nonredundant union that contains a hard query and this is
04:55
where we were surprised to find out that some nonredundant unions that contain a hard query are in fact is and I'm going to show you such an example later in the talk OK so we have and queries unions which antiquary we know that if they contain a high grade they can be easily they can be hard and we realized some easy cases we found and structural property that tells us when it's easy and now we want to know are all cases hard or are we still missing something that's why we want to look at lower part and the 1st place where we considered lower bounds is I'll where
05:35
we thought it would be easy is to show that or on a union that contains only hard queries and we try to show that the Union that contains all the only hard queries is always hard and we didn't manage to do this for all unions that contain all hard queries we we got stuck where we have queries that equivalent up to a different projection and later we realize that we couldn't show that it's always hard because that's not
06:04
correct so there this unions of conjunctive queries where every query independently in the Union in itself is hard but the entire Union is easy and passes
06:17
so if I have enough time I'm going to show you an example for this too so now let's go into the details why is the union of
06:27
2 easy queries always so I'm going to add all just mentioned that there is a more elegant way of doing this that 1 and then what I'm doing that doesn't use a lot of space but I still want to show you the way that I'm going to show you because it builds up to the more complicated things that we will see later I we don't even need to know how we answer each individual query assume you have 2 black boxes that give you the answer to the 1st query and the answer to the 2nd quiz now you can combine this computation to get the answer to the Union I you can alternate between the steps 1 step that gives you an answer for the 1st query that the 2nd grade in the 1st grade in this inquiry and so on no wonder problems when we do this of the 1st problem is that we can have duplicate here for example you can see that is both an answer for the 1st query entered the 2nd and if we know what I just mentioned you will get a C 8 so 1 thing that you can do is to save a table with all of the answers that you've already seen whenever you see an answer you check if it's there and if it is there you just ignore it and move on know what is the problem we have and was the decay is not always constant so we have in the beginning this out twice linear delay before we get caught and the constant stream of answers but also if if you could begin in the beginning we produce a lot of unique answers and then for a long time we produce audiences that we've seen the delay is not constant so in this so we can show that this gives us something that is just as good as constant delay answers because we can DAB answers that we we see and still get linear preprocessing time and constantly so whenever we see an answer we don't print it immediately we put it in a Q of answers that are ready to be printed and only after we finish an entire round of producing answers we take something out of the queue and print now I want to go through our all all of their details but I think it's it if you think about it for a while you'll be convinced that Our whenever we access you it's not empty but it doesn't give us duplicates and that we get to every processing time and constantly so you the the
08:51
queries is easy now what happens
08:55
when there is a hard queries so 1st of all I want you to explain why our intuition I didn't work why is the union that contains a harm queries not necessarily hard so I'm going to show you where the
09:08
hardness professors and for this we 1st need to understand the hardness proof of a single conjunctive queries so this is an conjunctive query that's hard
09:16
it's a cyclic but it's not our free contexts and I know I didn't give you the definition yet it will come later but telling you this is the hardware and so the the conditional lower bounds that we have and we use the assumption that we cannot multiply to win matrices in quadratic time and what we do is we encode this matrix multiplication problem into database so in the 1st relation with point all of the sense of the ones in the 1st matrix and in the 2nd table here we put all the in the sense of the ones in that 2nd matrix and this query computes exactly the in bases of the ones in answer because for example we have 1 in index 1 you'd still these ones in nice 1 2 and 2 to so the contradiction that this queries easy we can solve it with linear preprocessing time and constantly what near linear in the size of the input so these tables here and their size is bound by in and we have constantly labor answer how many answers can we have at most n squared if this query the we can solve matrix multiplication and quadratic time we assume we cannot do that so this query is hard if it if you it'll make use of the Bayesian but we can't so apart really message from the slide the Connes tractability in this up half what we call a free pass here where x and y appear in the same atom in lines that appear in the same atoms but why is projected out why is not 1 of the 3 variables and this is their hardness cause because we're only allowed to use constant time per and not constant time for explanation of answer and here for example we have this answer 1 tool for other reasons as well but we're not we don't have enough time to go over all of the explanations OK now what happens within the Union now this is a
11:29
union where the 1st queries harder because we have this that this free pass over here where the Y is not free and the 2nd query not only so so light why is so so it is still hard right we can reduce matrix multiplication like before so we can encode inmates with modification in R 1 and R 2 like the we can give 3 some values that don't interfere with the reduction and the answers to Q 1 by exactly matrix multiplication answers know FIL that's so here when we have a union out we just just before the quick the union is easy we don't get all of the answers to Q 1 independently efficiency we get all of the answers to Q 1 union
12:18
with the answer to Q 2 and q till can have killed it cubic number
12:26
of answers so we can using this union compute matrix multiplication but we do it too slow and this doesn't contradict any assumptions so the hardness results do not hold with any so it's
12:45
not necessarily hard when is it easy and so with this we need to understand the definition of free context but I don't have much time so I'll just go over this brief free context is more specific than a cyclic and acyclic where have a joint tree isolated a traitor describes a query we have a node for every action and if we look at every variable independently then we get the connected I'm afraid context also have another condition that means that there is a sub tree that contains exactly the free variables this is more or less the definition because of this query here is also free context but we don't exactly have the force of a market in the property we are also allowed to have subset so if we have a subset of the existing notes so if we add another node with why then I we do get a proper online whenever we have such a graph that describes a query we have an efficient algorithm to solve it so I what happens with this example that we've seen before the joke queries are evaluated over the same database instances and since the 2nd queries easy we can use it to solve the easy query no here we have something that we call a body homomorphism because it's like the homomorphism where the variables of the secondary meant to the variables of the 1st query habits we ignore the the head or in the standard homomorphism the finish so the 2nd a a query is easy and and the of free variables so the 2nd query after we can get it gives us all possible combinations of assignments to a B and C now sense we have this body
14:35
homomorphism in the world of the in the terminology of Q 1 this means all possible assignments are to X Y and Z this gives us information about the joy of R 1 and R 2 so we can say that
14:51
Q to provide these combinations to Q 1 and we can use it when we want to compute Q 1 so we can have stored in a stable of all the answers to Q 2 and we can access it as another action to Q 1 and now this query is frequent so this almost gives us what we want now we know that we can have linear reprocessing them constantly and then we can have again linear preprocessing time constantly but not really preprocessing it's in the middle of a computation for i We use and that we like to call the teachers because it tells us how much were allowed to cheat without actually cheating so it tells us that if an racial problem can be solved with that where the delay is usually constants but the constant number of times it's the near there is there are almost no duplicates but every answer you can get a constant number of times and we can do something similar to what is what I showed in the case of easy queries are and gets linear processing constantly algorithm with no duplicates so this extends what we've seen for art museum EasyCASE case now using
16:00
this changes I this union is now is and we we define when a union can become easy when a union is easy it's when every wary of the Union can become easy by adding variables that are provided by other queries
16:18
now I don't believe I have time to to thoroughly explain the case of hard and hot so I'm just going to give you the intuition and if you want to hear more about it will be happy to discuss it later so what happens in the case of hard and hard queries I have this is an example of 2 queries as I said before the cases that can be easy when the 2 queries have isomorphic body have so it can be seen if you rename the variables as just 1 query with different projections 1 query with 2 hands and these 2 queries are hard the 1st query is hard because we have this free path here where Y is not a free variable but in the 2nd queried this area is not hard because all of the variables are free but in the 2nd queried this part is hard because you inside our free but W is projected out so each wary has a hard part that's easy in the query and the 2 queries can basically help each other so I want to give you all of the details but the idea is that you can find a subset of the answers to Q 2 that give you enough information to cover for them the part that's hard in Q 1 and then you use this information to find all of the answers to Q 1 and then the 1 is the information that you can use for it you to this is an
17:49
example so but you have you 1st ignore that complicated part of 2 2 when you just computes as some sorry
18:02
a subset you you imagine there is a subset of the free variable that is is equal to compute you that only a subset of the answers to Q to use them to compute Q 1 and the used fuel to compute all of the answers to Q 2 again all including the ones you made including the ones you've already seen and then we can I use user jitters them I get to say that this can be done with the in every processing time and constantly so hard queries can also provide
18:27
variables and we have an exact characterization for some of the cases I have for example to hard queries so to conclude it
18:39
seen that unions that contain hard queries can be sometimes easy easy even if they contain only we formalize when queries can help each other meet each other easy within a union and we've seen some lower bound and we don't have a complete characterization yet in fact the last section of our paper is full of examples of specific queries like the 1 we have here where we don't know if this specific queries easy or hard so our for future work it will be interesting to reach a full characterization and will also be interesting to inspect the space I can we use less space in art store all of the answers that we see thank
19:21
you