Faster Spark SQL: Adaptive Query Execution in Spark v3

Cite

FOSDEM VZW

Poggi, Nicolas

Formal Metadata

Title

Faster Spark SQL: Adaptive Query Execution in Spark v3

Title of Series

FOSDEM 2021

Number of Parts

637

Author

Poggi, Nicolas

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/53665 (DOI)

Publisher

FOSDEM VZW

Release Date

2021

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Over the years, there has been extensive efforts to improve Apache Spark SQL performance. This talk will introduce the new Adaptive Query Execution (AQE) framework and how it can automatically improve user query performance. AQE leverages query runtime statistics to dynamically guide Spark's execution as queries run along. The talk will go over the main features in AQE and provide examples on how it can improve on the previous static query plans. Finally, we'll present the significant improvements we have seen on the TPC-DS benchmark with AQE. Examples of the new runtime optimizations include selecting the right join type (broadcast-hash-join vs. sort-merge-join), dealing with data skew, and automatically selecting the number of shuffle (reducer) partitions.