Balanced Policy Learning: DRO for Learning Causal-Effect-Maximizing Policies

Cite

Related Material

Banff International Research Station (BIRS) for Mathematical Innovation and Discovery

Kallus, Nathan

Formal Metadata

Title

Balanced Policy Learning: DRO for Learning Causal-Effect-Maximizing Policies

Alternative Title

Distributionally Robust Optimization for Learning Causal-Effect-Maximizing Policies

Title of Series

Distributionally Robust Optimization (18w5102)

Number of Parts

Author

Kallus, Nathan

License

CC Attribution - NonCommercial - NoDerivatives 4.0 International:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/59924 (DOI)

Publisher

Banff International Research Station (BIRS) for Mathematical Innovation and Discovery

Release Date

2018

Language

English

Content Metadata

Subject Area

Mathematics

Genre

Workshop/Interactive Format Lecture

Abstract

Policy learning from observational data seeks to extract personalized interventions from passive interaction data to maximize causal effects. The aim is to transform electronic health records to personalized treatment regimes, transactional records to personalized pricing strategies, and click-streams to personalized advertising campaigns. The task is made difficult by the observational nature of the data: only outcomes of the interventions performed are available and the distribution of units exposed to one intervention or another differ systematically. In such purely observational setting, existing methods adapted from experimental settings tenuously rely on unstable plug-in approaches and heuristic stopgaps to address ensuing complications. In this talk I will describe a new approach based on distributionally robust optimization that overcomes these failures and its application to personalized medicine. By showing that estimation error reduces to the discrepancy in a moment of a particular unknown function, the approach relies on protecting against any possible realization thereof. On the one hand, this leads to unparalleled finite-sample performance, as demonstrated by experiments. On the other hand, theoretical results show that the asymptotic optimality and convergence rates of plug-in approaches are preserved. Time permitting, I will also outline advances in handling continuous treatments and in representation learning for causal inference using deep neural networks.