Pessimistic Cardinality Estimation

Zitieren

ACM SIGMOD

Cai, Walter Balazinska, Magdalena Suciu, Dan

Formale Metadaten

Titel

Pessimistic Cardinality Estimation

Serientitel

SIGMOD 2019

Anzahl der Teile

155

Autor

Cai, Walter

Balazinska, Magdalena

Suciu, Dan

Lizenz

CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/42957 (DOI)

Herausgeber

ACM SIGMOD

Erscheinungsjahr

2019

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

In this work we introduce a novel approach to the problem of cardinality estimation over multijoin queries. Our approach leveraging randomized hashing and data sketching to tighten these bounds beyond the current state of the art. We demonstrate that the bounds can be injected directly into the cost based query optimizer framework enabling it to avoid expensive physical join plans. We outline our base data structures and methodology, and how these bounds may be introduced to the optimizer's parameterized cost function as a new statistic for physical join plan selection. We demonstrate a complex tradeoff space between the tightness of our bounds and the size and complexity of our data structures. This space is not always monotonic as one might expect. In order combat this non-monotonicity, we introduce a partition budgeting scheme that guarantees monotonic behavior. We evaluate ourmethods on GooglePlus community graphs~citegoogleplus, and the Join Order Benchmark (JOB)~citeLeis:2015:GQO:2850583.2850594. In the presence of foreign key indexes, we demonstrate a 1.7times improvement in aggregate (time summed over all queries in benchmark) physical query plan runtime compared to plans chosen by Postgres using the default cardinality estimation methods. When foreign key indexes are absent, this advantage improves to over 10times.