We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

7th HLF – Lecture: Automatic Step-Size Control for Minimization Iterations

Formal Metadata

Title
7th HLF – Lecture: Automatic Step-Size Control for Minimization Iterations
Title of Series
Number of Parts
24
Author
License
No Open Access License:
German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The "Training" of "Deep Learning" for "Artificial Intelligence" is a process that minimizes a "Loss Function" ƒ(w) subject to memory constraints that allow the computation of ƒ(w) and its Gradients G(w) := dƒ(w)/dw` but not the Hessian d2ƒ(w)/dw2 nor estimates of it from many stored pairs {G(w), w}. Therefore the process is iterative using "Gradient Descent" or an accelerated modification of it like "Gradient Descent Plus Momentum". These iterations require choices of one or two scalar "Hyper-Parameters" which cause divergence if chosen badly. Fastest convergence requires choices derived from the Hessian's two attributes, its "Norm" and "Condition Number", that can almost never be known in advance. This retards Training, severely if the Condition Number is big. A new scheme chooses Gradient Descent's Hyper-Parameter, a step-size called "the Learning Rate", automatically without any prior information about the Hessian; and yet that scheme has been observed always to converge ultimately almost as fast as could any acceleration of Gradient Descent with optimally chosen Hyper-Parameters. Alas, a mathematical proof of that scheme's efficacy has not been found yet. The opinions expressed in this video do not necessarily reflect the views of the Heidelberg Laureate Forum Foundation or any other person or associated institution involved in the making and distribution of the video.