We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Testability of Homomorphism Inadmissibility: Property Testing Meets Database Theory

00:00

Formale Metadaten

Titel
Testability of Homomorphism Inadmissibility: Property Testing Meets Database Theory
Serientitel
Anzahl der Teile
155
Autor
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
In this paper, we utilize the perspective of property testing to consider the testability of relational database queries. A primary motivation is the desire to avoid reading an entire database to decide a property thereof. We focus on conjunctive queries, which are the most basic and heavily studied database queries. Each conjunctive query can be represented as a relational structure A such that deciding if the conjunctive query is satisfied by a relational structure B is equivalent to deciding if there exists a homomorphism from A to B. We phrase our results in terms of homomorphisms. Precisely, we study, for each relational structure A, the testability of homomorphism inadmissibility from A. We consider algorithms that have oracle access to an input relational structure B and that distinguish, with high probability, the case where there is no homomorphism from A to B, from the case where one needs to remove a constant fraction of tuples from B in order to suppress all such homomorphisms. We provide a complete characterization of the structures A from which one can test homomorphism inadmissibility with one-sided error by making a constant number of queries to B. Our characterization shows that homomorphism inadmissibility from A is constant-query testable with one-sided error if and only if the core of A is alpha-acyclic. We also show that the injective version of the problem is constant-query testable with one-sided error if A is alpha-acyclic; this result generalizes existing results for testing subgraph-freeness in the general graph model.
HomomorphismusKundendatenbankMagnetooptischer SpeicherDatenbankStatistischer TestPhysikalische TheorieSignifikanztestAlgorithmusEin-AusgabeFramework <Informatik>AbfrageMultigraphObjekt <Kategorie>Funktion <Mathematik>ComputerphysikRelationale DatenbankLeistungsbewertungDreieckGraphFlächeninhaltMultigraphWeb-SeiteObjekt <Kategorie>Relationale DatenbankAlgorithmusBruchrechnungAbfrageEin-AusgabeFunktionalPunktBeobachtungsstudieSignifikanztestDreieckGraphLeistungsbewertungExistenzaussageRandomisierungQuick-SortDatenbankFormale SpracheDifferenteKontextbezogenes SystemZentralisatorNeuroinformatikWahlfreier ZugriffProgramm/QuellcodeJSON
DatenbankLeistungsbewertungAbfrageEin-AusgabeSignifikanztestLeistungsbewertungAbfrageDatenbankRechter WinkelSignifikanztestComputeranimation
AbfrageLeistungsbewertungDatenbankSignifikanztestEin-AusgabeBeobachtungsstudieHomomorphismusRelationale DatenbankSchnittmengeOrakel <Informatik>DatenmodellDichotomieTheoremBeweistheorieKonstanteTermBaumweitePhysikalische KonstanteBruchrechnungTopologieFehlermeldungParametersystemStatistischer TestAlgorithmusStrebeFunktion <Mathematik>FokalpunktMultiplizität <Mathematik>TupelRelation <Informatik>MultiplikationVollständigkeitSignifikanztestStatistischer TestLeistungsbewertungDatenbankCASE <Informatik>Ein-AusgabeAlgorithmusParametersystemBildschirmmaskeMorphismusHomomorphismusTupelQuick-SortVirtual Home EnvironmentMatchingAbfrageSchnittmengeRechter WinkelBruchrechnungRelationale DatenbankZahlenbereichTopologieKlasse <Mathematik>TypentheorieResultanteApp <Programm>GeradeSpezielle unitäre GruppeKonstanteGrenzwertberechnungSummierbarkeitTermTheoremMapping <Computergraphik>DichotomieFunktionaln-TupelMereologieNatürliche ZahlBitEndliche ModelltheorieMultiplikationsoperatorGrundraumFehlermeldungMultimengeOrakel <Informatik>BeobachtungsstudieGebundener ZustandCliquenweiteDifferenteGibbs-VerteilungComputeranimationJSON
TermEin-AusgabeDatenmodellVollständigkeitOrakel <Informatik>AbfrageSymboltabelleTypentheorien-TupelZeitbereichZufallszahlenExistenzsatzÄquivalenzklasseMultigraphGraphBeobachtungsstudieHomomorphismusFramework <Informatik>TupelStrebeKonstanteFehlermeldungSpeicherabzugTheoremDichotomieVersionsverwaltungSignifikanztestRelation <Informatik>KnotenmengeHypercubeAlgorithmusVariablePartielle DifferentiationTopologieAnalysisBruchrechnungSpezialrechnerStatistischer TestSimplexverfahrenNP-hartes ProblemNotepad-ComputerGebundener ZustandDatenbankPhysikalische TheorieGrenzwertberechnungTupelVersionsverwaltungMomentenproblemBildschirmmaskeGebundener ZustandSpeicherabzugFinitismusSignifikanztestHomomorphismusParametersystemAdditionPhysikalische KonstanteQuick-SortGrundraumAbfrageKonstanteTypentheorieOrakel <Informatik>AnalogieschlussDichotomieCASE <Informatik>BruchrechnungAlgorithmusFehlermeldungTheoremKnotenmengeZahlenbereichMinimumInteraktives FernsehenInjektivitätDreiecksfreier GraphMapping <Computergraphik>MAPDatenbankWahlfreier ZugriffVollständigkeitGraphFramework <Informatik>Bildgebendes Verfahrenn-TupelRelationale DatenbankAnalysisEvolutionärer AlgorithmusEndliche ModelltheoriePhysikalische TheorieEin-AusgabeSimplexverfahrenStatistischer TestMultiplikationsoperatorIsomorphieklasseRechter WinkelMultigraphHypercubePlastikkarteSummengleichungSpezielle unitäre GruppeNP-hartes ProblemFaserbündelOrdnung <Mathematik>KonditionszahlAtomarität <Informatik>Schreib-Lese-KopfHybridrechnerRuhmasseInternetworkingResultanteJensen-MaßRahmenproblemFlächeninhaltWeb SiteComputeranimation
KundendatenbankVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
So a typical setup in this area is the following. The input is read using random access queries. Algorithms can use randomization and are required to distinguish inputs that satisfy a property and inputs that are epsilon far from satisfying a property, the property.
And the exact definition usually depends on the context but this generally means something like one needs to modify at least an epsilon fraction of the input to make it satisfy the property. So this is a sort of promise scenario. And algorithms just have to distinguish between these two cases.
Okay, many different objects have been studied using the lens of property testing, such as functions, graphs, constraint satisfaction problems, and many others. Okay, so of course we all know here that a central computational problem in databases is the evaluation of a database query on a relational database.
And yeah, in this talk we're going to focus on Boolean conjunctive queries. So yeah, they evaluate the true false and they're built from conjunction and existential quantification. Just to make sure we're all on the same page, here's a small example of a conjunctive query in the language of graphs. So it asserts, if we think about it being evaluated
on undirected graphs, it asserts that there are three points such that each pair is joined by an edge. Right, that is to say the graph has a triangle. Okay, so right, in databases a query evaluation can be expensive because databases can be big, right? And property testing offers us an avenue
for coping with big inputs, right? So query evaluation and property testing seem to be a very good match for each other, right? And in this work we study query evaluation using the lens of property testing. So yeah, when I first learned about property testing,
I was surprised to find that there was really limited work in the literature as far as I know. Sort of viewing database query evaluation from the lens of property testing and actually discussing this with my co-author formed the genesis of this work. Okay, so right, a conjunctive query evaluation
can be phrased as the problem of whether or not there exists homomorphism from one relational structure to another. Namely, B satisfies a conjunctive query if and only if there's homomorphism from A of phi to B where A of phi is a structure derived from B, the so-called canonical structure of phi. And okay, we have two relational structures
which are sets and then relations on those sets. A homomorphism from A to B is a mapping from the universe of the first structure to the universe of the second structure such that each time we have a tuple in the first structure, if we map that under a mapping, we get a tuple in the corresponding relation of the second structure.
Okay, so right, in this work, we perform a property testing-based study of database query evaluation. So we present what we believe is a natural oracle model for accessing relational structures. And then under this model, we prove a dichotomy theorem that classifies precisely those conjunctive queries
that are testable using a constant number of queries. I'll define what this means in a moment. Okay, to be a bit more precise, we test the property of psi falsity over various conjunctive queries psi. And actually, for the most part, we phrase our results equivalently
in terms of homomorphisms. So we'll say that a structure is A-free if it does not emit a homomorphism from A. And we test the property of A-freeness over different structures A. Okay, here I want to mention that there is a related work by Adler and Harvat who showed a positive constant-query testability result
for MSO queries on bounded-degree databases of bounded tree width. So the class of queries they consider is more general than ours, but they consider restricted classes of databases, whereas here we don't put any restriction on the type of database. Okay, so here's some more details on our setup.
We'll say that a structure is epsilon far from being A-free if one has to remove at least an epsilon fraction of the tuples from B to ensure that it has no homomorphism from A. So the number of tuples in B is just the sum of the number of tuples over all of the relations of B.
So this says that we have to remove at least an epsilon fraction of all of the tuples to suppress all homomorphisms from A. Okay, and... Yeah, so in our setup we want to test A-freeness for every single fixed A, and a tester algorithm is given an error parameter epsilon
and also access to an input structure B. And the goal of the tester is to distinguish the case that B is A-free from the case that B is epsilon far from being A-free. And we say that a tester is constant query if the number of accesses made to B
is a function only of epsilon. So for each fixed epsilon, we make just a constant number of queries. Okay, and we focus on testers that have one-sided error in the following sense. They have to accept when the input is A-free, and they have to reject with a sufficiently high
a positive probability when the input is epsilon far from being A-free. Okay, and also relations of B are considered to be multisets in this setup. Okay, so as mentioned, we access the input structure B using a sort of random access oracle model.
And so, all right, this is via so-called queries to what we call the completion oracle. So here I have to apologize. In this talk we have two notions of queries. Right, there are database queries and there are also oracle queries, which allow us to access this input structure. So, right, there are two types of queries.
So here we're going to let star be sort of a special wildcard symbol that's not inside the universe of B. And each of the queries takes a partially completed tuple where we allow wildcards in addition to constants inside the tuples.
And the first type of query is a completion query. And this just returns a random completion. Okay, so it takes a relation symbol and one of these tuples, which may have wildcards and also constants. And the completion query just returns a random completion chosen uniformly at random,
which just means that the wildcards are filled out to be constants. Or it returns some bottom element if there's no completion. So here's the formal definition of completion. It just says that wherever we had a constant before, we still have a constant. But then the wildcards are filled out to be actually a tuple inside the relation.
Okay, and then the other type of query is a so-called size query, which just returns the number of completions. Okay, and yeah, I just want to mention that when dealing with graphs, this model is equivalent to an established model in property testing called the general graph model.
Okay, so you might ask, why are we studying homomorphism in inmissibility and query falseness, as opposed to inmissibility and query trueness? Right, like why be so negative? Right, the answer is that the latter problem actually trivializes in the framework that we've set up. So for any structure A, fixed structure A,
we can always add a constant number of tuples to B to make sure that there's a homomorphism from A to B. Right, we just take A, and we take a disjoint union with any structure B to make sure that we have such a homomorphism. Right, so it then turns out that for any fixed epsilon, one can only achieve epsilon farness
on structures of bounded size, and then structures of bounded size are not really interesting inside the present framework. Okay, so our main theorem is the dichotomy theorem, which again, it classifies precisely the finite structures A for which A-freeness is constant-query testable.
So the theorem says that for any finite structure A, A-freeness is constant-query testable inside the framework that we set up, if and only if the core of A is alpha-aseclic. Okay, so what does this mean? Okay, the core of A essentially is a minimized version of A,
which has as few tuples as possible, that gives off homomorphisms to exactly the same structure that A does. And then alpha-aseclicity is an established notion of hypergraph-aseclicity, which I'll review briefly in a moment. I also wanted to mention that
when the property of our main theorem does not hold, we get a lower bound of the form N to the epsilon for some positive epsilon. So we prove that testing A-freeness really requires omega N to the epsilon queries. Again, where epsilon is positive. Right, so yeah, it's also maybe worth mentioning that this lower bound is unconditional, right?
The arguments are, in some sense, information-based. So in this area, the lower bounds are unconditional. Okay, yeah, and also, just via our main theorem, we can obtain as a corollary, actually a dichotomy theorem for not just conjunctive queries,
but also we can classify all unions of conjunctive queries. Okay, so yeah, let me just briefly say what alpha-aseclicity is. So the hypergraph of a structure A has vertex set equal to the universe of A. And the edge set, to obtain the edge set,
we just take every single tuple, and we just turn it into an edge, right? So we just collect together the entries and put that into an edge and make that a hyperedge. So that's the hypergraph of a structure. And then we say that a structure is alpha-acyclic if it's hypergraph is, right? So what does it mean for a hypergraph to be acyclic?
It means that we have some ordering on the hyperedges, such that every single edge in this ordering, SI, is an ear with respect to the previous edges. So what this means is that every single vertex inside the SI that appears previously in the ordering,
they're all contained in some single edge that comes before SI, right? So yeah, in some sense it says that the ear, maybe intuitively speaking, interacts minimally or in some very easy way with the rest of the hypergraph, right? Yeah, another way to say it is that we can build a hypergraph just by adding on ears.
Okay, so right, so here's our positive algorithm. Our positive algorithm for the acyclic case makes a constant number of queries, and it succeeds with some constant positive probability. And okay, from the definition of acyclicity, we get an ordering of the edges of the hypergraph of A,
and thus an ordering of the tuples of A. Okay, so the idea of the algorithm is just to follow the order, and then to constantly make completion queries to try to complete whatever partial mapping we have to create a homomorphism, right?
So precisely, we start with the empty map, and then we just walk along this ordering of the tuples, and just for every single tuple, we make a completion query, and we just try to, we put in wildcards wherever we don't have values yet, and we just try to complete whatever partial map we have into a full homomorphism.
Now, if one of the completion queries returns bottom element, we just sort of give up, so the algorithm accepts, claiming that there's no homomorphism, and if no completion query fails, then actually we got a homomorphism, right? We really have a homomorphism, because now we have a mapping
that respects every single one of these tuples, so homomorphism is found. So, wait, we have one-sided error, right? If the algorithm rejects, we really have a homomorphism, so if there's no homomorphism, the algorithm accepts, so we get one-sided error. Okay, so, right, we just saw that
when our structure is A-free, the algorithm will always accept, and when B is not A-free, okay, by our assumption, it's far from being A-free, and in this case, one can prove that essentially a constant fraction of B consists of disjoint homomorphic images of A,
and so what we prove is that it suffices to analyze the algorithm just on disjoint, these disjoint images, structures that are just disjoint images, and then we perform the analysis on this type of structure. Okay, as a bonus, we get a test here
for the problem of injective A-freeness, which is just a problem of not admitting an injective homomorphism of A, so this is just the same thing as not containing a copy of A as a substructure up to homomorphism, up to isomorphism, and this is a problem that's been studied in some special cases in the literature.
Okay, what about our hardness results? So we make use of a known characterization of non-ascyclicity, which essentially says that if a hypergraph is not alpha-asyclic, it has an induced sub-hypergraph
that is either a so-called simplex or a so-called cycle, where a cycle is some sort of hypergraph analog of a graph cycle. Okay, so essentially what we do in our paper is we reduce to these two cases, and then we explicitly prove lower bounds for these two cases. Okay, so to wrap up, in this work,
we completely classified the conjunctive queries that are constant-query testable in the framework that we presented, and yeah, we hope that the dialogue between property testing and database theory will continue, and yeah, one particular issue that one could raise for future work is to try to understand other types of queries
or other types of data. Thank you very much.