Flink's SQL Engine: Let's open the engine room!

Plain Schwarz

Walther, Timo

Formal Metadata

Title

Title of Series

Berlin Buzzwords 2024

Number of Parts

Author

Walther, Timo

Contributors

N. N. (Moderation)

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/70261 (DOI)

Publisher

Plain Schwarz

Release Date

2024

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Apache Flink aims to make stream processing easy and accessible for everyone. It should come as no surprise that a high level of abstraction puts more load on the core. Flink's SQL engine is the workhorse behind many on-prem and managed SQL platforms. Yet very few users know what is really going on under the hood when submitting a SQL query. In this talk, we take a deep look into the internals of Flink SQL. Let's take the stack apart! We start with some SQL text and go all the way down to Flink's streaming primitives. I will go through the individual optimizer phases. You will learn how event-time operations are tracked when declaring a watermark, how state is managed when using different kinds of joins, and how changelog modes and upsert keys travel through topology when reading from a Change Data Capture connector. After this talk, you may not be able to write an optimizer rule, but you should, at least, get a feeling for the power of a simple streaming SQL query.