Data-Pooling in Stochastic Optimization

Banff International Research Station (BIRS) for Mathematical Innovation and Discovery

Gupta, Vishal

Formale Metadaten

Titel

Serientitel

Models and Algorithms for Sequential Decision Problems Under Uncertainty (19w5231)

Anzahl der Teile

Autor

Gupta, Vishal

Mitwirkende

Kallus, Nathan

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - keine Bearbeitung 4.0 International:
Sie dürfen das Werk bzw. den Inhalt in unveränderter Form zu jedem legalen und nicht-kommerziellen Zweck nutzen, vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/60453 (DOI)

Herausgeber

Banff International Research Station (BIRS) for Mathematical Innovation and Discovery

Erscheinungsjahr

2019

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik Mathematik

Genre

Workshop/Interaktives Format Vorlesung

Abstract

Managing large-scale systems often involves simultaneously solving thousands of potentially unrelated stochastic optimization problems, each with limited data. Optimization intuition suggests decoupling these unrelated problems and solving them separately. Statistical intuition, however, suggests that combining the problems via some form of shrinkage may outperform decoupling, but does not offer a concrete non-parametric approach for doing so when solving complex, constrained optimization problems such as vehicle-routing, economic lot-sizing or facility location. We propose a novel data-pooling algorithm that combines both perspectives. Our approach does not require strong distributional assumptions and applies to constrained, possibly non-convex optimization problems. We show that, unlike the classical statistical setting, the potential benefits of data-pooling, in general, depend strongly on the problem structure, and, in some cases, data-pooling offers no benefit. We prove that as the number of problems grows large, our method learns if pooling is necessary and the optimal amount to pool, even if the expected amount of data per problem is fixed and bounded. We further show that pooling offers significant benefits over decoupling when there are many problems, each of which has a small amount of relevant data. We demonstrate the practical benefits of data-pooling using real data from a chain of retail drug stores in the context of inventory management.