Distributional Robustness and Regularization in Statistical Learning

Cite

Related Material

Banff International Research Station (BIRS) for Mathematical Innovation and Discovery

Kleywegt, Anton

Formal Metadata

Title

Distributional Robustness and Regularization in Statistical Learning

Title of Series

Distributionally Robust Optimization (18w5102)

Number of Parts

Author

Kleywegt, Anton

Contributors

Gao, Rui

Chen, Xi

License

CC Attribution - NonCommercial - NoDerivatives 4.0 International:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/59919 (DOI)

Publisher

Banff International Research Station (BIRS) for Mathematical Innovation and Discovery

Release Date

2018

Language

English

Content Metadata

Subject Area

Mathematics

Genre

Workshop/Interactive Format Lecture

Abstract

A central problem in statistical learning is to design prediction algorithms that not only perform well on training data, but also perform well on new and unseen, but similar, data. We approach this problem by formulating a distributionally robust stochastic optimization (DRSO) problem, which seeks a solution that minimizes the worst-case expected loss over a family of distributions that are close to the empirical distribution as measured by Wasserstein distance. We establish a connection between such Wasserstein DRSO and regularization. Specifically, we identify a broad class of loss functions, for which the Wasserstein DRSO is asymptotically equivalent to a regularization problem with a gradient-norm penalty. Such relation provides a new interpretation for approaches that use regularization, including a variety of statistical learning problems and discrete choice models. The connection also suggests a principled way to regularize high-dimensional, non-convex problems, which is demonstrated with the training of Wasserstein generative adversarial networks (WGANs) in deep learning.