Synthetic data: when, why, and how

Plain Schwarz

Benton, William

Formale Metadaten

Titel

Serientitel

Berlin Buzzwords 2023

Anzahl der Teile

Autor

Benton, William

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/66606 (DOI)

Herausgeber

Plain Schwarz

Erscheinungsjahr

2023

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Data is essential to today's most interesting applications and systems, which learn from data, act autonomously in response to data, and make data digestible via search. Somewhat counterintuitively, as the importance of real data has increased, the importance of synthetic data has increased as well. In this talk, you'll learn when it's appropriate to use synthetic data (and when it isn't likely to help). You'll also learn about several circumstances in which synthetic data is especially useful, including dealing with personally-identifying information, load testing, and simulating system response to unlikely scenarios. The talk will conclude by providing brief, actionable introductions to several practical approaches to generating synthetic tabular data, each of which is appropriate for particular kinds of synthetic data use cases: we'll cover a simple way to simulate data-generating processes from first principles, basic and more sophisticated statistical techniques, and approaches based on machine learning models. You'll leave with a better understanding of the role of synthetic data in today's systems and a concrete toolbox of ways to exploit it in your own programs.