We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.

Synthetic data: when, why, and how

Formal Metadata

Synthetic data: when, why, and how
Title of Series
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date2023

Content Metadata

Subject Area
Data is essential to today's most interesting applications and systems, which learn from data, act autonomously in response to data, and make data digestible via search. Somewhat counterintuitively, as the importance of real data has increased, the importance of synthetic data has increased as well. In this talk, you'll learn when it's appropriate to use synthetic data (and when it isn't likely to help). You'll also learn about several circumstances in which synthetic data is especially useful, including dealing with personally-identifying information, load testing, and simulating system response to unlikely scenarios. The talk will conclude by providing brief, actionable introductions to several practical approaches to generating synthetic tabular data, each of which is appropriate for particular kinds of synthetic data use cases: we'll cover a simple way to simulate data-generating processes from first principles, basic and more sophisticated statistical techniques, and approaches based on machine learning models. You'll leave with a better understanding of the role of synthetic data in today's systems and a concrete toolbox of ways to exploit it in your own programs.