DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models

Cite

ACM SIGMOD

Ma, Qingzhi Triantafillou, Peter

Formal Metadata

Title

DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models

Title of Series

SIGMOD 2019

Number of Parts

155

Author

Ma, Qingzhi

Triantafillou, Peter

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/42909 (DOI)

Publisher

ACM SIGMOD

Release Date

2019

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

In the era of big data, computing exact answers to analytical queries becomes prohibitively expensive. This greatly increases the value of approaches that can compute efficiently approximate, but highly-accurate, answers to analytical queries. Alas, the state of the art still suffers from many shortcomings: Errors are still high unless large memory investments are made. Many important analytics tasks are not supported. Query response times are too long and thus approaches rely on parallel execution of queries atop large big data analytics clusters, in-situ or in the cloud, whose acquisition/use costs dearly. Hence, the following questions are crucial: Can we develop AQP engines that reduce response times by orders of magnitude, ensure high accuracy, and support most aggregate functions? With smaller memory footprints and small overheads to build the state upon which they are based? With this paper, we show that the answers to all questions above can be positive. The paper presents DBEst, a system based on Machine Learning models (regression models and probability density estimators). It will discuss its limitations, promises, and how it can complement existing systems. It will substantiate its advantages using queries and data from the TPC-DS benchmark and real-life datasets, compared against state of the art AQP engines.