CAvSAT: A System for Query Answering over Inconsistent Databases

Video thumbnail (Frame 0) Video thumbnail (Frame 347) Video thumbnail (Frame 5689) Video thumbnail (Frame 7529) Video thumbnail (Frame 10167) Video thumbnail (Frame 10588) Video thumbnail (Frame 15064)
Video in TIB AV-Portal: CAvSAT: A System for Query Answering over Inconsistent Databases

Formal Metadata

CAvSAT: A System for Query Answering over Inconsistent Databases
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Managing inconsistencies in databases is an old, but recurring, problem. An inconsistent database is a database that violates one or more integrity constraints. In the real-world, inconsistent databases arise in several different contexts, including data warehousing and information integration. The framework of database repairs and consistent query answering (CQA) is a principled way of handling inconsistencies. In this work, we propose a novel approach that has a potential to build a comprehensive and scalable CQA system. We report preliminary experimental results on a prototype CQA system CAvSAT (Consistent Answering via Satisfiability), implemented using this approach.
Data management Query language Consistency Magneto-optical drive Query language System programming System programming Database Database Student's t-test
Email Tuple Complex (psychology) Consistency INTEGRAL Constraint (mathematics) Multiplication sign Set (mathematics) Database Instance (computer science) Scalability Finitary relation System programming Software framework Series (mathematics) Data integrity Domain name Variety (linguistics) Key (cryptography) Relational database Consistency Projective plane Theory Database Data warehouse Subset Arithmetic mean Query language System programming Software framework Theorem Heuristic Whiteboard Prototype Data integrity Reverse engineering
Complex (psychology) Group action Boolean algebra Correspondence (mathematics) Set (mathematics) Database Methodenbank Variable (mathematics) Positional notation Query language Endliche Modelltheorie Social class Data integrity Satellite Relational database Bit Maxima and minima Instance (computer science) Variable (mathematics) Arithmetic mean Well-formed formula System programming Different (Kate Ryan album) output Arithmetic progression Resultant Divisor Consistency Constraint (mathematics) Dichotomy Conjunctive normal form Scalability Number Attribute grammar Local Group Architecture Machine vision Well-formed formula Reduction of order System programming Maize output Computer architecture Key (cryptography) Database Limit (category theory) Query language Personal digital assistant Social class Key (cryptography)
Complex (psychology) Group action Digital electronics State of matter Multiplication sign Set (mathematics) Database Insertion loss Semantics (computer science) Preprocessor Spherical cap Phase transition Forest Query language Finitary relation Software framework Pairwise comparison Social class Data integrity Area Satellite Relational database Real number Bit Phase transition Different (Kate Ryan album) System programming Software framework Arithmetic progression Resultant Writing Functional (mathematics) Open source Consistency Algorithm Constraint (mathematics) 2 (number) Latent heat Energy level System programming Consistency Physical law Knowledge-based systems Theory Database Query language Function (mathematics) Social class Key (cryptography)
you're do will present Cossack
which stands for consistent answering where satisfiability and it is a system for query answering inconsistent databases in
the relation reward we we specify a relational schema and a set of indignity constraint on
the schema a database that violates the integrity constraints on that of what is known as inconsistent database and the lord inconsistencies arise due to its reasons the most common being data integration data warehousing and so on as a quick example in this employees stable the employ ID is supposed to be a primary key but we end up having to employs in the sand lady that's the inconsistent database of the problem is how the global public opened though inconsistencies there are 2 main approaches the foes being data cleaning and the idea is to we somehow remove the inconsistency from the database and recover the clean was of the data and then run the queries against the data this involves of often making arbitrary choices of that that are based on heuristics external domain knowledge and so on the other approach is is a principled framework of that is given by a notion of database the best so system opts for the for the 2nd approach let me quickly take you through what this and more consistent answers so just as a reference we had a great dog by little see himself yesterday in gems of boards about this exact topic of the database the notion of it was this was so introduced by at announced that those him home at key in 1999 and or importance of it's is that a database but but better of inconsistent database is the maximal consistent subset of of that particular database and clearly there is no unique way in which Europe but that it in general in fact there are exponentially many ways to repair a database and this notion of database this is used to give the semantic meaning to query answering or inconsistent databases and these are called consistent answers and they are at the intersection of the answers to the query on all of the reverse of the database and the problem of the consistently answering or CQA is about commuting these consistent answers that has been a lot of work on cancers and query answering over in last 20 years than most of the work has been in Chile of for example we have the double the land researchers the granny 17 of the could Ryzen tricorder material . completely classifies though so join 3 select select project join the queries for of primary key constraints And it's is that the complexity of computing consistent answer which is the it it it's the rescue deductible or it is in polynomial time but not irritable ordered is going be complete people have attempted of developing secure systems in the past but none of the systems that made an impact on the industry due to several reasons and the remainders academy for adults 1 of the challenges in building a scalable competences secure system is that CityU has hiding complexity even for a very simple as major queries on very simple of with key constraints and this this causes issues with the scalability with the data what we propose
is of we envision for Oak scalable and comprehensive system that we call Cossack it has a modular architecture and we leverage though the results that is of that we have in the past about the query preprocessing module detects me it determines the complexity of the of of computing consistent answer to the query to input query and due to the dichotomy kiddo me nor that of it's going to be 1 of these 3 and the query people as forwards the query and do the corresponding model for query is low for which we do not know the complexity was still nor there it isn't going to be so though this actually modern all of you will always work meanwhile in the sack solving community people have made tremendous progress in of breathing very fast that's always so today we have martin is actually lost that are used as general was a problem solving tools for many and problems so if you're our idea is to use those sacks always who of due to compute consistent answers of without getting into the detail
notation so let me quickly take you through the basic reduction that we have from the problem of secure to the SAC what we do is for each key equally group of the top was in the database meaning that the set of doubles that agree on the key attributes we have a clause that contains the variables corresponding to those jumpers intuitively this means that the clause is satisfied if at least 1 of the fact from each equal group is is in the is in the database so again without getting into the notations of the kid clauses for the witnesses to the query will by by a minimally witness I mean the minimal set of couples that satisfy the query so for all of these witnesses we created these low bit clauses and then we simply if you take the conjunction of all of these clauses we end up having the formula and then we can establish of factor in this formula is satisfied with an assignment of in which a particular variable might be in this case is set to 1 if n only if it is not a consistent answer and then this formula it can be can be translated into a maximum satisfiability instance where the goal is to satisfy maximal maximum number of clauses and that is used to identify all of such be a variable is low and then that translates to finding all of the consistent answers to the quiz we started out with 2 SPJ queries so with panicky constraints bot keys of course important but a limited class of indignity constraints so eventually vs succeeded in developing reductions
for broader classes of queries and constraints namely unions of SBT queries over databases that are inconsistent with respect to arbitrary denial constraints these a bit of a circuit having the level of complex seduction so I'm not gonna go through them
we ran our experiments in 2 phases the 1st reason reasonable generating us synthetic data with 1 million couples that were inconsistent with respect to a primary key constraints and in the 2nd phase we used the law datasets that was previously used in the literature to earlier data cleaning systems but especially of having a specifically the 1 was about forward and safety inspections of restaurant in Chicago and New York and this this data set has had a lot of denial constraints and of these are our results for the Askew of the edible queries of for these queries that our group use of approaches the state of the art being the could that is why the Nisqually that the indicate abuse familiarity of this is a very nice theoretical result but we found that in practice of this state's use amount of time when you when you got good and also so for all 7 goodies it took more than 2 loss to run I mean it it did not finish reset at the short on 2 hours of other approaches conquer which is a a scholarly writing for a specific subclass of queries that that the Colisee forest and of the 7 queries fullest forwarding see forest so we could run concurrent that and Congress pretty efficient but in out the knowledge system the query preprocessor detects the complexity of the query so if if you take the queries in the forest we can actually go for the Congo approach and then for so the queries we can go with essentially an approach of the time is in seconds so we added lost plus against under 10 seconds we also ran out experiments on harder queries and queries that are in polynomial time but not as and also so the queries that even harder let me skip through there so the results but it was to summarize our framework the frame of a bit of city pairs forager principled approach to go do consistently out that due to the youth inconsistencies of even though there has been progress in CQA there's more scalable and competences system pretty of we plan to we plan to be 1 and you and once went in fact concepts of what is seems to be a very promising approach right now we're working with this speed queries that involve group by clauses in aggregate functions this this actually involves even extending the semantics of the query answering for a for area could and so we are also planning to compare the performance of our system with existing introduction least secure systems who end of how people that's about to appear in the Sacranie 19 conference next week and the open source of system caps certain the coders demanded thank you open for questions and few