Gradient descent for wide two-layer neural networks

Cite

Centre International de Rencontres Mathématiques (CIRM)

Bach, Francis

Formal Metadata

Title

Gradient descent for wide two-layer neural networks

Title of Series

Optimization for Machine Learning

Number of Parts

Author

Bach, Francis

License

CC Attribution - NonCommercial - NoDerivatives 2.0 Generic:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/54179 (DOI)

Publisher

Centre International de Rencontres Mathématiques (CIRM)

Release Date

2020

Language

English

Content Metadata

Subject Area

Computer Science Mathematics

Genre

Conference/Talk

Abstract

Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions.

Keywords

optimization

neural networks

machine learning