Policy learning from observational data seeks to extract personalized interventions from passive interaction data to maximize causal effects. The aim is to transform electronic health records to personalized treatment regimes, transactional records to personalized pricing strategies, and click-streams to personalized advertising campaigns. The task is made difficult by the observational nature of the data: only outcomes of the interventions performed are available and the distribution of units exposed to one intervention or another differ systematically. In such purely observational setting, existing methods adapted from experimental settings tenuously rely on unstable plug-in approaches and heuristic stopgaps to address ensuing complications. In this talk I will describe a new approach based on distributionally robust optimization that overcomes these failures and its application to personalized medicine. By showing that estimation error reduces to the discrepancy in a moment of a particular unknown function, the approach relies on protecting against any possible realization thereof. On the one hand, this leads to unparalleled finite-sample performance, as demonstrated by experiments. On the other hand, theoretical results show that the asymptotic optimality and convergence rates of plug-in approaches are preserved. Time permitting, I will also outline advances in handling continuous treatments and in representation learning for causal inference using deep neural networks. |