On the Enumeration Complexity of Unions of Conjunctive Queries

Video in TIB AV-Portal: On the Enumeration Complexity of Unions of Conjunctive Queries

Formal Metadata

On the Enumeration Complexity of Unions of Conjunctive Queries
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
We study the enumeration complexity of Unions of Conjunctive Queries (UCQs). We aim to identify the UCQs that are tractable in the sense that the answer tuples can be enumerated with a linear preprocessing phase and a constant delay between every successive tuples. It has been established that, in the absence of self joins and under conventional complexity assumptions, the CQs that admit such an evaluation are precisely the free-connex ones. A union of tractable CQs is always tractable. We generalize the notion of free-connexity from CQs to UCQs, thus showing that some unions containing intractable CQs are, in fact, tractable. Interestingly, some unions consisting of only intractable CQs are tractable too. The question of a finding a full characterization of the tractability of UCQs remains open. Nevertheless, we prove that for several classes of queries, free-connexity fully captures the tractable UCQs.
Data management Query language Kolmogorov complexity Multiplication sign Magneto-optical drive Query language Enumerated type Mathematical optimization
Scheduling (computing) Broadcast programming Algorithm Consistency Relational database Projective plane Electronic mailing list Database Performance appraisal Query language Personal digital assistant Query language Table (information)
Complexity class Scheduling (computing) Logical constant Broadcast programming Table (information) Algorithm Consistency Kolmogorov complexity Electronic mailing list Set (mathematics) Database Data model Performance appraisal Query language Electronic meeting system Query language Resultant
Logical constant Complexity class Complex (psychology) Clique problem Boolean algebra Table (information) Multiplication sign Hypercube Data model Preprocessor Performance appraisal Term (mathematics) Query language Spacetime output Multiplication Mathematical optimization Linear map Logical constant Kolmogorov complexity Preprocessor Query language Linearization Speech synthesis output Table (information) Matrix (mathematics)
Complexity class Boolean algebra Clique problem Dichotomy Query language Numbering scheme Data structure Multiplication Matrix (mathematics) Hypercube
Complexity class NP-hard Dichotomy Query language Personal digital assistant Personal digital assistant Content (media)
NP-hard Constraint (mathematics) Query language Personal digital assistant Equivalence relation
NP-hard Query language Personal digital assistant Personal digital assistant Quicksort Equivalence relation Equivalence relation
NP-hard Category of being NP-hard Query language Personal digital assistant Isomorphieklasse Personal digital assistant Gradient Projective plane Bound state Mereology
NP-hard Query language Multiplication sign Personal digital assistant
Logical constant Multiplication sign Uniqueness quantification Gradient Letterpress printing Black box Streaming media Neuroinformatik Preprocessor Process (computing) Roundness (object) Query language Queue (abstract data type) Linearization Queue (abstract data type) Table (information) Spacetime
NP-hard Proof theory NP-hard Dichotomy Query language Content (media)
Logical constant Point (geometry) NP-hard Slide rule Context awareness Boolean algebra Untere Schranke Wage labour Multiplication sign 1 (number) Computational complexity theory Quadratic equation Preprocessor Causality Atomic number Matrix (mathematics) Reduction of order Codierung <Programmierung> Multiplication Theory of relativity Military base Bound state Database Line (geometry) Variable (mathematics) P (complexity) Subject indexing Message passing Query language Linearization output Right angle Table (information)
NP-hard Multiplication NP-hard Cube (algebra) Matrix (mathematics) Resultant
Context awareness Algorithm Freeware Graph (mathematics) Information Multiplication sign Forcing (mathematics) Combinational logic Database Variable (mathematics) Variable (mathematics) Connected space Subset Category of being Network topology Query language Network topology Free variables and bound variables Homomorphismus Vertex (graph theory) Freeware Homomorphismus Condition number
Logical constant Group action Freeware Logical constant Multiplication sign Lemma (mathematics) Combinational logic Enumerated type Variable (mathematics) Variable (mathematics) Number Neuroinformatik Preprocessor Mathematics Process (computing) Query language Personal digital assistant Linearization Homomorphismus Stability theory
Area NP-hard Raw image format NP-hard Information Online help Multiplication sign Projective plane Content (media) Mereology Variable (mathematics) Subset Dichotomy Query language Personal digital assistant Different (Kate Ryan album) Function (mathematics) Matrix (mathematics) Multiplication Extension (kinesiology)
NP-hard Proxy server Online help Multiplication sign 1 (number) Variable (mathematics) Axonometric projection Subset Process (computing) Hardy space Query language Personal digital assistant Isomorphieklasse Query language Queue (abstract data type) Extension (kinesiology)
Latent heat Query language Personal digital assistant Sheaf (mathematics) Data storage device Spacetime Bound state Reduction of order Spacetime
so let's start by understanding their goal the goal is to understand which
unions of conjunctive queries can be answered
with what can be seen as optimal a fine-grained time guarantees
so 1st of all it's understand the queries that were standing and this is a standard relational database that describe spots in this case so we have a table for the tutorials and we have a table for the schedule and we can ask this conjunctive query that that though joins these 2
tables and projects out the title and this table gives us a list of speakers per day also assume we have another table with the research docks and we can
ask a similar conjunctive query that joins the research talks with the schedule and gives us a list of speakers of the research stocks per day alright if we're interested in the union of all of these answers we just want the speakers then we can ask the union of conjunctive queries and the answers to that the union of the answers to individual conducted courts the the the queries were studying and we want to know how how efficiently can we solve such queries because this is the class we want
to handle so we should mention the settings where using setting similar to that to previous results were standing we're extending that their RAM on also we can
access look tables in constant time we're talking in our data complexity terms so that queries considerate of constants I and now what's the optimal time guarantees we need to read the entire input before we know if there's a 1st answer or not and we want to print all of the solution I so we need constant time 4 and so this is why we define mightily the in the class of all problems all queries that can be answered with linear preprocessing time before 1st answer and then constant delay between the answers no for this work we got we we focus on our time complexity and we ignore speech we want to know which queries are in this class see for the congenital queries we already
have a known answer other queries that are in this class are exactly the queries
that have a structure that is called free conic and I'm going to explain what free on means half later and so not and I worked with to decide academy the foreign we extended it for for wit when their functional
dependencies in the scheme and we thought what would be the next entrant interesting step and we decided to look into use accuse because it's it's an interesting class of queries it's equivalent to 2 positive original under wraps
so and now we have our goal of formal I so we can talk about it we want to know which unions of conjunctive queries are in this class LC's so when to stop when I mean that the query is easy I mean that it's in the really and when I say that it's hard I mean that it's not
let's start with an overview of of of of of over Warwick and then I'm going to have to go into some explanations in more detail cases of unions of conjunctive queries the 1st case is that all queries are easy and quite easy to show that if every query in the union is easy then and the entire Union is if every queries and delay seeing the entire Union isn't it singing so this case is always now it gets
interesting where some query union is hot if you just jot down the 1st query that comes into your mind when you think about that a union of a hard querying an easy query it is it will most likely be a hard but this also be easy
so the easy for the simple fact that some unions that contain a hard query are equivalent to unions that don't contain hard query have for that if you consider this example of the the 1st query is hard and the 2nd queries easy but the 1st query is basically just like the 2nd query with another constraints on 3 so
every answer to the 1st query is also an answer to the 2nd query in the entire Union is the largest equivalence to to the 2nd query which is easy so the entire Union is easy so this is not that interesting to to discuss this sort of cases and it makes sense to say OK
what happens with non-redundant union and so it is sometimes easy but what happens if you have a non-redundant union that contains a hard query and this is
where we were surprised to find out that some non-redundant unions that contain a hard query are in fact is and I'm going to show you such an example later in the talk OK so we have and queries unions which antiquary we know that if they contain a high grade they can be easily they can be hard and we realized some easy cases we found and structural property that tells us when it's easy and now we want to know are all cases hard or are we still missing something that's why we want to look at lower part and the 1st place where we considered lower bounds is I'll where
we thought it would be easy is to show that or on a union that contains only hard queries and we try to show that the Union that contains all the only hard queries is always hard and we didn't manage to do this for all unions that contain all hard queries we we got stuck where we have queries that equivalent up to a different projection and later we realize that we couldn't show that it's always hard because that's not
correct so there this unions of conjunctive queries where every query independently in the Union in itself is hard but the entire Union is easy and passes
so if I have enough time I'm going to show you an example for this too so now let's go into the details why is the union of
2 easy queries always so I'm going to add all just mentioned that there is a more elegant way of doing this that 1 and then what I'm doing that doesn't use a lot of space but I still want to show you the way that I'm going to show you because it builds up to the more complicated things that we will see later I we don't even need to know how we answer each individual query assume you have 2 black boxes that give you the answer to the 1st query and the answer to the 2nd quiz now you can combine this computation to get the answer to the Union I you can alternate between the steps 1 step that gives you an answer for the 1st query that the 2nd grade in the 1st grade in this inquiry and so on no wonder problems when we do this of the 1st problem is that we can have duplicate here for example you can see that is both an answer for the 1st query entered the 2nd and if we know what I just mentioned you will get a C 8 so 1 thing that you can do is to save a table with all of the answers that you've already seen whenever you see an answer you check if it's there and if it is there you just ignore it and move on know what is the problem we have and was the decay is not always constant so we have in the beginning this out twice linear delay before we get caught and the constant stream of answers but also if if you could begin in the beginning we produce a lot of unique answers and then for a long time we produce audiences that we've seen the delay is not constant so in this so we can show that this gives us something that is just as good as constant delay answers because we can DAB answers that we we see and still get linear preprocessing time and constantly so whenever we see an answer we don't print it immediately we put it in a Q of answers that are ready to be printed and only after we finish an entire round of producing answers we take something out of the queue and print now I want to go through our all all of their details but I think it's it if you think about it for a while you'll be convinced that Our whenever we access you it's not empty but it doesn't give us duplicates and that we get to every processing time and constantly so you the the
queries is easy now what happens
when there is a hard queries so 1st of all I want you to explain why our intuition I didn't work why is the union that contains a harm queries not necessarily hard so I'm going to show you where the
hardness professors and for this we 1st need to understand the hardness proof of a single conjunctive queries so this is an conjunctive query that's hard
it's a cyclic but it's not our free contexts and I know I didn't give you the definition yet it will come later but telling you this is the hardware and so the the conditional lower bounds that we have and we use the assumption that we cannot multiply to win matrices in quadratic time and what we do is we encode this matrix multiplication problem into database so in the 1st relation with point all of the sense of the ones in the 1st matrix and in the 2nd table here we put all the in the sense of the ones in that 2nd matrix and this query computes exactly the in bases of the ones in answer because for example we have 1 in index 1 you'd still these ones in nice 1 2 and 2 to so the contradiction that this queries easy we can solve it with linear preprocessing time and constantly what near linear in the size of the input so these tables here and their size is bound by in and we have constantly labor answer how many answers can we have at most n squared if this query the we can solve matrix multiplication and quadratic time we assume we cannot do that so this query is hard if it if you it'll make use of the Bayesian but we can't so apart really message from the slide the Connes tractability in this up half what we call a free pass here where x and y appear in the same atom in lines that appear in the same atoms but why is projected out why is not 1 of the 3 variables and this is their hardness cause because we're only allowed to use constant time per and not constant time for explanation of answer and here for example we have this answer 1 tool for other reasons as well but we're not we don't have enough time to go over all of the explanations OK now what happens within the Union now this is a
union where the 1st queries harder because we have this that this free pass over here where the Y is not free and the 2nd query not only so so light why is so so it is still hard right we can reduce matrix multiplication like before so we can encode inmates with modification in R 1 and R 2 like the we can give 3 some values that don't interfere with the reduction and the answers to Q 1 by exactly matrix multiplication answers know FIL that's so here when we have a union out we just just before the quick the union is easy we don't get all of the answers to Q 1 independently efficiency we get all of the answers to Q 1 union
with the answer to Q 2 and q till can have killed it cubic number
of answers so we can using this union compute matrix multiplication but we do it too slow and this doesn't contradict any assumptions so the hardness results do not hold with any so it's
not necessarily hard when is it easy and so with this we need to understand the definition of free context but I don't have much time so I'll just go over this brief free context is more specific than a cyclic and acyclic where have a joint tree isolated a traitor describes a query we have a node for every action and if we look at every variable independently then we get the connected I'm afraid context also have another condition that means that there is a sub tree that contains exactly the free variables this is more or less the definition because of this query here is also free context but we don't exactly have the force of a market in the property we are also allowed to have subset so if we have a subset of the existing notes so if we add another node with why then I we do get a proper online whenever we have such a graph that describes a query we have an efficient algorithm to solve it so I what happens with this example that we've seen before the joke queries are evaluated over the same database instances and since the 2nd queries easy we can use it to solve the easy query no here we have something that we call a body homomorphism because it's like the homomorphism where the variables of the secondary meant to the variables of the 1st query habits we ignore the the head or in the standard homomorphism the finish so the 2nd a a query is easy and and the of free variables so the 2nd query after we can get it gives us all possible combinations of assignments to a B and C now sense we have this body
homomorphism in the world of the in the terminology of Q 1 this means all possible assignments are to X Y and Z this gives us information about the joy of R 1 and R 2 so we can say that
Q to provide these combinations to Q 1 and we can use it when we want to compute Q 1 so we can have stored in a stable of all the answers to Q 2 and we can access it as another action to Q 1 and now this query is frequent so this almost gives us what we want now we know that we can have linear reprocessing them constantly and then we can have again linear preprocessing time constantly but not really preprocessing it's in the middle of a computation for i We use and that we like to call the teachers because it tells us how much were allowed to cheat without actually cheating so it tells us that if an racial problem can be solved with that where the delay is usually constants but the constant number of times it's the near there is there are almost no duplicates but every answer you can get a constant number of times and we can do something similar to what is what I showed in the case of easy queries are and gets linear processing constantly algorithm with no duplicates so this extends what we've seen for art museum EasyCASE case now using
this changes I this union is now is and we we define when a union can become easy when a union is easy it's when every wary of the Union can become easy by adding variables that are provided by other queries
now I don't believe I have time to to thoroughly explain the case of hard and hot so I'm just going to give you the intuition and if you want to hear more about it will be happy to discuss it later so what happens in the case of hard and hard queries I have this is an example of 2 queries as I said before the cases that can be easy when the 2 queries have isomorphic body have so it can be seen if you rename the variables as just 1 query with different projections 1 query with 2 hands and these 2 queries are hard the 1st query is hard because we have this free path here where Y is not a free variable but in the 2nd queried this area is not hard because all of the variables are free but in the 2nd queried this part is hard because you inside our free but W is projected out so each wary has a hard part that's easy in the query and the 2 queries can basically help each other so I want to give you all of the details but the idea is that you can find a subset of the answers to Q 2 that give you enough information to cover for them the part that's hard in Q 1 and then you use this information to find all of the answers to Q 1 and then the 1 is the information that you can use for it you to this is an
example so but you have you 1st ignore that complicated part of 2 2 when you just computes as some sorry
a subset you you imagine there is a subset of the free variable that is is equal to compute you that only a subset of the answers to Q to use them to compute Q 1 and the used fuel to compute all of the answers to Q 2 again all including the ones you made including the ones you've already seen and then we can I use user jitters them I get to say that this can be done with the in every processing time and constantly so hard queries can also provide
variables and we have an exact characterization for some of the cases I have for example to hard queries so to conclude it
seen that unions that contain hard queries can be sometimes easy easy even if they contain only we formalize when queries can help each other meet each other easy within a union and we've seen some lower bound and we don't have a complete characterization yet in fact the last section of our paper is full of examples of specific queries like the 1 we have here where we don't know if this specific queries easy or hard so our for future work it will be interesting to reach a full characterization and will also be interesting to inspect the space I can we use less space in art store all of the answers that we see thank