Faster Spark SQL: Adaptive Query Execution in Spark v3

Zitieren

FOSDEM VZW

Poggi, Nicolas

Formale Metadaten

Titel

Faster Spark SQL: Adaptive Query Execution in Spark v3

Serientitel

FOSDEM 2021

Anzahl der Teile

637

Autor

Poggi, Nicolas

Lizenz

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/53665 (DOI)

Herausgeber

FOSDEM VZW

Erscheinungsjahr

2021

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Over the years, there has been extensive efforts to improve Apache Spark SQL performance. This talk will introduce the new Adaptive Query Execution (AQE) framework and how it can automatically improve user query performance. AQE leverages query runtime statistics to dynamically guide Spark's execution as queries run along. The talk will go over the main features in AQE and provide examples on how it can improve on the previous static query plans. Finally, we'll present the significant improvements we have seen on the TPC-DS benchmark with AQE. Examples of the new runtime optimizations include selecting the right join type (broadcast-hash-join vs. sort-merge-join), dealing with data skew, and automatically selecting the number of shuffle (reducer) partitions.