Analysis of Gradient Descent on Wide Two-Layer Neural Networks

Zitieren

Zugehöriges Material

Institut des Hautes Études Scientifiques (IHÉS)

Chizat, Lenaïc

Formale Metadaten

Titel

Analysis of Gradient Descent on Wide Two-Layer Neural Networks

Serientitel

Journée Statistique et Informatique pour la Science des Données à Paris Saclay, 2021

Anzahl der Teile

Autor

Chizat, Lenaïc

Mitwirkende

Bach, Francis

Ngoc, Thanh Mai Pham (Moderation)

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/54729 (DOI)

Herausgeber

Institut des Hautes Études Scientifiques (IHÉS)

Erscheinungsjahr

2021

Sprache

Englisch

Produktionsjahr

2020

Inhaltliche Metadaten

Fachgebiet

Informatik Mathematik

Genre

Workshop/Interaktives Format

Abstract

Artificial neural networks are a class of "prediction" functions parameterized by a large number of parameters -- called weights -- that are used in various machine learning tasks (classification, regression, etc). Given a learning task, the weights are adjusted via a gradient-based algorithm so that the corresponding predictor achieves a good performance on a given training set. In this talk, we propose an analysis of gradient descent on wide two-layer ReLU neural networks for supervised machine learning tasks, that leads to sharp characterizations of the learned predictor. The main idea is to study the dynamics when the width of the hidden layer goes to infinity, which is a Wasserstein gradient flow. While this dynamics evolves on a non-convex landscape, we show that its limit is a global minimizer if initialized properly. We also study the "implicit bias" of this algorithm when the objective is the unregularized logistic loss: among the many global minimizers, we show that it selects a specific one which is a max-margin classifier in a certain functional space. We finally discuss what these results tell us about the generalization performance and the adaptivity to low dimensional structures of neural networks. This is based on joint work with Francis Bach.