Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/43057 (DOI)

Herausgeber

ACM SIGMOD

Erscheinungsjahr

2019

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Real-time data analysis and management are increasingly critical for today’s businesses. SQL is the de facto lingua franca for these endeavors, yet support for robust streaming analysis and management with SQL remains limited. Many approaches restrict semantics to a reduced subset of features and/or require a suite of non-standard constructs. Additionally, use of event timestamps to provide native support for analyzing events according to when they actually occurred is not pervasive, and often comes with important limitations. We present a three-part proposal for integrating robust streaming into SQL, namely: (1) time-varying relations as a foundation for classical tables as well as streaming data, (2) event time semantics, (3) a limited set of optional keyword extensions to control the materialization of time-varying query results. We show how with these minimal additions it is possible to utilize the complete suite of standard SQL semantics to perform robust stream processing. We motivate and illustrate these concepts using examples and describe lessons learned from implementations in Apache Calcite, Apache Flink, and Apache Beam. We conclude with syntax and semantics of a concrete proposal for extensions of the SQL standard and note further areas of exploration.