6th HLF – Laureate Lectures: Equilibria, Fixed Points, and Computational Complexity: from von Neumann to Generative Adversarial Networks

0 views

Formal Metadata

Title
6th HLF – Laureate Lectures: Equilibria, Fixed Points, and Computational Complexity: from von Neumann to Generative Adversarial Networks
Title of Series
Author
Daskalakis, Constantinos
License
No Open Access License:
German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties.
DOI
Publisher
Heidelberg Laureate Forum Foundation
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
Constantinos Daskalakis: "Equilibria, Fixed Points, and Computational Complexity: from von Neumann to Generative Adversarial Networks" The concept of equilibrium, in its various forms, has played a central role in the development of Game Theory and Economics. The mathematical properties and computational complexity of equilibria are also intimately related to mathematical programming, online learning, and fixed point theory. More recently, equilibrium computation has been proposed as a means to learn generative models of high-dimensional distributions. In this talk, we review fundamental results on minimax equilibrium and its relationship to mathematical programming and online learning. We then turn to Nash equilibrium, reviewing some of our work on its computational intractability. We conclude with modern applications of equilibrium computation, presenting recent progress and open problems in the training of Generative Adversarial Networks. The opinions expressed in this video do not necessarily reflect the views of the Heidelberg Laureate Forum Foundation or any other person or associated institution involved in the making and distribution of the video.

Related Material

Loading...
Internet forum Neumann boundary condition Musical ensemble Scheduling (computing) Computational intelligence Computer network Point (geometry) Game theory Computational intelligence Computational complexity theory Computational complexity theory
Logical constant Axiom of choice Musical ensemble State of matter Direction (geometry) Equaliser (mathematics) Thermodynamic equilibrium Parameter (computer programming) Disk read-and-write head Expected value Maxima and minima Duality (mathematics) Sign (mathematics) Machine learning Video game Strategy game Computer network Convex set Matrix (mathematics) Aerodynamics Position operator Normal-form game Optimization problem Point (geometry) Instance (computer science) Thermodynamic equilibrium Functional (mathematics) Category of being Neumann boundary condition Arithmetic mean Hexagon Network topology Graph coloring Theorem Right angle Whiteboard Row (database) Open source Distribution (mathematics) Dynamical system Control flow Inequality (mathematics) Event horizon Simplex algorithm Linear programming Uniqueness quantification Theorem Game theory Mathematical optimization Linear map Window Compact space Interactive television Set (mathematics) Cartesian coordinate system Quadratic form System call Computational complexity theory Berechnungskomplexität Continuous function Convex set Interpreter (computing) Game theory Mathematical optimization Family
Computer chess Beat (acoustics) Parameter (computer programming) Computational intelligence Event horizon Computational complexity theory Video game Strategy game Information Data structure Position operator Form (programming) God Algorithm Information Expert system Computer Instance (computer science) Thermodynamic equilibrium Category of being Computational intelligence Smart card Theorem Complex system Game theory Arithmetic progression
Ring (mathematics) Thermodynamic equilibrium Prediction Thermodynamic equilibrium Plot (narrative) Prediction Video game Strategy game Integrated development environment System programming Negative number Game theory Complex system Game theory Multiplication Social class
Group action Randomization Boolean algebra 1 (number) Thermodynamic equilibrium Matching (graph theory) Parameter (computer programming) Mereology Thomas Kuhn Duality (mathematics) Strategy game Internet forum Aerodynamics Boolean satisfiability problem Social class Area Algorithm Programming paradigm Polynomial Product (category theory) Constraint (mathematics) Theory of relativity Software developer Electronic mailing list Parallel port Bit Measurement Functional (mathematics) Thermodynamic equilibrium Proof theory Category of being Neumann boundary condition Well-formed formula Bipartite graph Telecommunication Duality (mathematics) Simplex algorithm Computer science Theorem Code refactoring Quicksort Complexity class Polynomial Finitismus Existence Vapor barrier Divisor Algorithm Dynamical system Infinity Equivalence relation Number Well-formed formula Utility software Linear programming Theorem Software testing Maize Data structure Game theory Mathematical optimization Linear map Prime factor Set (mathematics) Existence Prime number Computational complexity theory Berechnungskomplexität Bargaining problem Integrated development environment Function (mathematics) Strategy game Dependent and independent variables Complex system Game theory Mathematical optimization Integer Capability Maturity Model
Point (geometry) Complexity class Polynomial Graphics tablet Vapor barrier Lemma (mathematics) Thermodynamic equilibrium Matching (graph theory) Web browser Complete metric space Computational intelligence Equivalence relation Computational complexity theory Linear programming Convex set Reduction of order Theorem Continuous function Game theory Social class Compact space Algorithm Point (geometry) Lemma (mathematics) Polygon Instance (computer science) Thermodynamic equilibrium Functional (mathematics) Equivalence relation Prime number Computational complexity theory Pointer (computer programming) Bargaining problem Network topology Convex set Bipartite graph Code refactoring Combinatorics Complex system Game theory Mathematical optimization
Randomization Scientific modelling 1 (number) Thermodynamic equilibrium Function (mathematics) Parameter (computer programming) Open set Computer simulation Mereology Medical imaging Machine learning Computer network Multiplication Descriptive statistics Software developer Sampling (statistics) Computer simulation Parameter (computer programming) Prediction Thermodynamic equilibrium Functional (mathematics) Category of being Arithmetic mean Digital photography Process (computing) Sample (statistics) Theorem output Freeware Data type Data structure Resultant Statistics Probability theory Computer-generated imagery Distribution (mathematics) Artificial neural network Virtual machine Control flow Dynamical system Power (physics) Twitter Goodness of fit Population density Hardy space Data structure Gamma function output Subtraction Game theory Distribution (mathematics) Task (computing) Computer architecture Differentiable function Artificial neural network Interactive television Variance Set (mathematics) Berechnungskomplexität Similarity (geometry) Population density Word Bargaining problem Function (mathematics) Universe (mathematics) Non-parametric statistics Complex system Game theory Distribution (mathematics) Family
Randomization Focus (optics) Electronic data interchange Touchscreen Real number Scientific modelling Computer-generated imagery Weight Wave packet Sample (statistics) Object-oriented programming Term (mathematics) Function (mathematics) Database Videoconferencing output Right angle Distribution (mathematics) Resultant
Digital photography Object-oriented programming Cellular automaton Database Computer-generated imagery Source code Set (mathematics)
Randomization Real number Computer-generated imagery Sampling (statistics) Bit Function (mathematics) Wave packet Database Source code output Fuzzy logic Boundary value problem Plug-in (computing)
Pixel Spacetime Geometry Sampling (statistics) Weight Event horizon Euklidische Geometrie Geodesic Electronic signature Medical imaging Goodness of fit Process (computing) Symmetry (physics) Term (mathematics) Right angle
Gradient View (database) Range (statistics) Function (mathematics) Parameter (computer programming) Computational intelligence Weight Expected value Maxima and minima Medical imaging Videoconferencing Bus (computing) Statistics Algorithm Real number Gradient Sampling (statistics) Bit Functional (mathematics) Thermodynamic equilibrium Inclined plane Metric tensor Digital photography Process (computing) output Lipschitz continuity Cycle (graph theory) Procedural programming Data type Point (geometry) Statistics Real number Computer-generated imagery Distance 2 (number) Wave packet Power (physics) Revision control Software testing Game theory Gradient descent Distribution (mathematics) Units of measurement Mathematical optimization Artificial neural network Distribution (mathematics) First-order logic Wave packet Pointer (computer programming) Hessian matrix Noise Game theory Gradient descent
Maxima and minima Asynchronous Transfer Mode Greatest element Numerical digit Computer-generated imagery Programmable read-only memory Distribution (mathematics) Weight Wave packet Oscillation Mixture model Case modding Authorization Subtraction Distribution (mathematics) Distribution (mathematics) Sampling (statistics) Category of being Circle Function (mathematics) Mixture model Vertex (graph theory) Cycle (graph theory) Digitizing Normal distribution Asynchronous Transfer Mode Row (database) Gradient descent
Point (geometry) Maxima and minima Momentum Gradient Direction (geometry) Dynamical system Thermodynamic equilibrium Parameter (computer programming) Oscillation Maxima and minima Polytop Mathematics Linear programming Convex set Negative number Subtraction Gradient descent Position operator Constraint (mathematics) Uniqueness quantification Gradient Trajectory Bit Functional (mathematics) Thermodynamic equilibrium Polytop Stokes' theorem Proof theory Linearization Equation Normal (geometry) Momentum Quicksort Game theory Thetafunktion Resultant Fundamental theorem of algebra Gradient descent
Computer chess NP-hard Building State of matter Gradient Model theory Primitive (album) Thermodynamic equilibrium Open set Computational complexity theory Expected value Maxima and minima Medical imaging Stochastic Pointer (computer programming) Cryptography Convex set Videoconferencing Statistics Pairwise comparison Position operator Social class Family Area NP-hard Theory of relativity Electric generator Smoothing Software developer Gradient Sampling (statistics) Parameter (computer programming) Prediction Term (mathematics) Functional (mathematics) Thermodynamic equilibrium Open set Proof theory Category of being Exterior algebra Network topology Right angle Cycle (graph theory) Data type Alpha (investment) Thomas Bayes Combinatorics Maxima and minima Domain name Topology Statistics Graphics tablet Identifiability Momentum Divisor Algorithm Parametrische Erregung Distribution (mathematics) Virtual machine Spiral Average Equivalence relation Hypercube Oscillation Social class Polytop Implementation Best, worst and average case Subtraction Game theory Combinatorics Mathematical optimization Metre Artificial neural network Distribution (mathematics) First-order logic Archaeological field survey State of matter Set (mathematics) Cryptography Theoretical computer science Computational complexity theory Stokes' theorem Summation Pointer (computer programming) Bargaining problem Function (mathematics) Complex system Momentum Negative number Game theory Mathematical optimization Family Gradient descent
Internet forum Musical ensemble Randomization Electric generator Process (computing) Inheritance (object-oriented programming) Mapping Volumenvisualisierung output Bit Quicksort
[Music]
so we are moving to the next speaker of this session whose Costas Tosca Lucas he's from Greece but he's a professor at MIT and he's one of the very fresh winners this year so he received a novel in a price on August 1st this year in Rio and we are very happy that he could fit it into his schedule to come here at the first opportunity for him to be at HLF he's working in computation of the rehab with the eye interfaces to economics game theory probability and we are looking forward to his talk so causes the floor is yours [Music]
hope you enjoy your time enjoying so I wanted to talk about the interaction of game theory computational complexity topology and also present some very modern applications of game theory to machine learning there are things you know you like so without further ado let me talk about game theory so let's see
look at this game so this game I found online a few years ago it's a check game so it's played on this six colored board [Music] and it's played by two players who alternate moves so there's a talking but you place in the beginning of the game at the start position and the girl of one of the players is to bring it to the bottom right position to get the finish known well the other guy wants to avoid getting the talking there so now how's the game played well at the upper right corner of this picture you see that there was something that was cut off it was actually a hexagon whose six edges are colored with the six colors and what you do is you cut it you cut it out you put you put a toothpick through it and you use it you notate it to select a random color okay so once your turn to move you rotate that hexagon you choose a random color and then depending on where the token is currently say the token is here if you get a yellow you have to move over here but if you get a black you can choose to move either here or there and so on and so forth okay and then it's the other players turn to move so now again one player wants to get the token from the upper left to the bottom right the other player wants to avoid getting a talking there okay so that's a two-player game and it's a constant sound game okay so that that you know the total probability of getting the talking here is one the total probability you have is one and that probability is gonna be divided between the two players so now the question is who wins in this game so as it turns out a little argument and I invite you to do it later in your head is that actually the invader loses with probability one there is a strategy for the you know the hunter the guy who wants to get the talking out to the bottom right position there is a strategy that guarantees that the talking will end up there probability 1 ok so this particular game is very easy to solve now not all games are so easy to solve and that was the motivation of for Norman who in the 20s started thinking about game theory and he proved this very beautiful theorem called the minimax theorem the minimax theorem states that if you have a function f that's continues and it's convex in one of the arguments and concave in the other argument and the sets where these arguments lie are convex and compact then you have the two optimization problems are actually the same here they get the same value so the min of our X marks over Y of this function is the same as the max min of that function that's a remarkable start okay that that usually does not happen one of the two directed one one of the inequalities is obvious okay the max is certainly smaller than the mean but the fact that it's equal it's actually very surprising now what does this mean for game theory so I would like to interpret the situation as follows I want to imagine I have a two-player game between a player that I'm going to call min and who's controlling X and another player whose I'm gonna call Max who's controlling Y now for every pair of choices FX Y is the payment from min to max alright so now with this interpretation what the theorem says is the following that looking at the right hand side okay if the right hand side is smaller than some B star then this means that for all Y there is an X such that this function value is smaller than B star now the remarkable fact that it falls by the Equality of the right-hand side of the left hand side is that if that happens there is actually a universal strategy for the mint layer so that the payment is bounded by this time right so if for every strategy of the max player there is a way for the min to pay at most of the star then there is actually a universal strategy for the min player so that no matter what the other guy does he pays at mostly star so that's pretty remarkable okay so in particular also if the X star is the optimum of this left hand side optimization problem y star the optimum of the right hand side and V star is the optimal value of both sides then the pair of strategies XR and Y star is an equilibrium of this game namely if min plays X star and march plays Y star a lot of the players can improve their payoffs so the min player cannot pay less the max player cannot gain more moreover the equilibrium set is convex and the payoffs under any equilibrium are the same movie star in particular so that's pretty cool and just to be concrete here's the game that you play you know like you played as kids the rock-paper-scissors game at least this was the game that was played before electronic games okay okay so that guess has two players three strategies for players very symmetric so you can think of X the the the strategy set of the min player okay as the simplex of rock of rock paper and scissors and similarly the strategy set for the max player is the same and the payoff function of this game is just this quadratic form where a is this matrix this matrix is characterizing the payments of the player who's choosing the rows of this matrix to the player who's choosing the columns of this matrix for instance if the row player chooses scissors and the column player uses rock with and rock breaks scissors so the this guy has to pay in mind let's see I see so this is the I guess is the favored forgets my signs are unfortunately reversed but okay so it's the payment matrix so in an event the liberal of this game unifrom strategy for both of the players and under unifrom strategy which are hoping familiar with from at least the schoolyard no player wins anything in expectation or games anything in expectation cool but there's more to the story so when the minimax theorem was discovered then you know you know like soon after you know people started thinking about mathematical programming so down chicken for norman were talking to each other and they realized that the minimax theorem follows from strong linear programming duality in fact very recently it was shown that the other direction is true so the two probes are actually equivalent computationally solving games is the same as solving linear programs and vice versa all right and as we know this can be done in polynomial time and it's more beautiful than that starting again with the 50s that have been that have been you know people have started if there are dynamics distribute dynamics for the players to arrive at equilibrium and we have discovered you know like several dynamics the whole family of dynamics if run by the mean in the max player of the game is going to get them to who liberal so not it's not just that equilibria have this nice properties being a convex set and being a you know solvable in polynomial time yeah but also distributed dynamics get you there of course when you go to the real world
you know a life is been harder okay and that has to do with the fact that games that are interesting usually have many legal positions for instance chess has ten to the forty three positions go has ten to the hundred seventy legal positions approximately and as you know this is very large okay and it's even worse than that in poker because you also have incompleteness of information you don't know what the cards were dealt to your opponent but in any event so
still the fact that minimax equilibria have this nice structure allows us to actually exploit the structure and beat human players in this very complex games all right as you know in this in the 90s computers were able to beat Kasparov in chess and more recently remarkably the computers were able to beat the champion in God and even more recently professional poker players okay to play Texas Hold'em and this is not happened by accident this progress has been enabled by our understanding of the structure and the algorithms for computing equilibria in these games roughly speaking to you assume that your minimax equilibrium has some nice property some lower complexity form and then you try using the starting expert games or by self play try to tune the parameters of that low complexity strategy and as it turns out if you do that enough you actually can beat humans or you can compute basically better strategies done Shobana T has computed over the last centuries I mean go goes back to ancient times okay so many centuries cool so that's some minimax so some games are you know have a very interesting structure and which allows us to compute good strategies yet when
you go when you move away from this nice class life is harder okay we are really bad in making predictions about more complex games right in fact if we started if you studied data from a
complex environment you soon realize that the behavior of players is not consistent with a Nash equilibrium so in this plot here I'm showing you beat bidding behavior in adoptions from from this paper by negative C garnets and Tardos where you see that players no disease are not consistent with independent you know something from a randomized strategy as you know naturally we would predict so
what's happening in the you know in the more complex environments well just to lay the foundations John Nash in the 50s extended for nine months minimax theorem showing that every finite game also has an equilibrium in randomized strategies so what do this term mean okay so a finite game is one where the number of players is finite every player has a finite strata strategies and a pair function that map's everybody's actions to as like a utility and you allow players to randomize over their selves it said when an equilibrium is a collection of randomizations over its players strategy set so that for every player his randomized strategy is the best response to the product measure of the other player strategies okay so now show that every finite game has such an equilibrium in randomized strategies on the other hand a few things that phone lineman show did not actually go through so the equilibrium set is not convex now equilibria are could be many but they could be disconnected and the payoffs of the players may change across different equilibria of the game so the structure is not at all the same in fact there is a parallel between sort of like the development of von Neumann and you know the the developments in linear programming and learning that followed and the developer not Nash and there our inability to actually extend these other properties so both on Diamond and Nash use not constructive arguments to prove their theorems on the other hand while for Norman's theorem can be reproving using linear program in duality okay then giving rise to algorithms and distributed dynamics on the in more general against we have been unable to actually get a you know efficiently effectively constructive proof of the of the theorem in particularly after 60 plus years of research they have we don't know of a polynomial time algorithm for this problem and of course if you don't have a centralized algorithm you also won't get efficiently distributed dynamic because efficient distribute dynamics are even harder than having a centralized algorithm you also have them communication constraints and so on so forth also as I mentioned earlier are practical ability to solve multiplayer games is also much less impressive and which you know for somebody like me points to you know the question is there a computational complexity barrier that is preventing us to tackle multiplayer games so it's an honor to be in this
forum with some of the you know legends of you know computer science and in particular computational complexity but you know as you know the way we study the complexity of problems is by looking at the P versus NP paradigm that's the common way to do that so the researchers over the past several decades have classified problems into complexity classes and the most prominent ones are P for polynomial time and then P for non-deterministic polynomial time and on the picture I'm showing you some of the famous problems that belong in these classes right so importantly matching primary testing and linear programming can be solved in polynomial time while related problems like three part ID managing factoring and number into its prime factors and integer programming are actually you know we don't have polynomial time algorithms for them all we know is that they belong to the class NP and in fact some of these problems are as hard as any other problem in NP so what you know Steve cook showed in 71 is that the boolean satisfiability problem which is I give you a boolean formula and I ask you to find a satisfying assignment is np-complete so as hard as any other problem in NP as any at least as hard as any problem in NP and Karp you know in next year provided the list of 21 and the complete problems including the Traveling Salesman into the programming tripartite matching and so on so forth so now the question is okay so the automatic response from the inability to extend the nice structure we know for zero-sum games to general games is is it that be complete to find a Nash equilibrium okay that's the automatic response for somebody in my area and here the thing gets interesting because of the existence theorem provided by now so the existence theorem by Nash is making it a little bit strange to study the complexity of Nash equilibrium under the prism of the P versus NP problem in particular if you try to prove the Nash is np-complete you quickly get into you know complexity class collapses like that NP and a related class called current be collapse okay and which we don't believe is actually the case so it's probably not it's probably Nash is not np-complete so it has a similar you know behavior like factoring which is like also an unclassified problem that sort of like it's hanging out between P and NP and we don't know how to classify so start starting this problem what so you know trying to get into you know get to understand the complexity of Nash equilibrium you know but you know 10 or
more years ago with Goldberg at the poly material what we showed is that the complexity of finding Nash equilibria is the same computationally as the complexity finding browser six points okay so you know I hope you've seen browser fixed point theorem it says that if I give you a continuous function from some convex and compact set to itself there is a point that it stays fixed under the map okay so a natural computational problem is you know if I give you such a function to compute a fixed point so what we showed is that the Nash equilibrium so as I said so so now CH proves his theorem using browser switch points so what we do is we can do the opposite we can you know given an instance of the browser problem we can produce a games whose Nash equilibria are related to the fixed points of the function so that is a computationally efficient reduction between the two problems that establishes these two problems are actually computationally equivalent now going back to the picture that I showed
earlier it was known before our work that Brower is not just a problem that is hanging out between P and NP but it's actually complete for a complexity class that's called PPD so in particular what I feel and says is that unlike factoring whose computational complexity is not clear yet Nash equilibrium by being equivalent to brow fixed point theorem is actually complete for this complexity class PPD unfortunately I'm not going to define it but I would I will give you some pointers in the end of the lecture if you wanna you know see what it is but importantly you know if you have that your problem is complete for some class you can use complexity theory machinery to prove it equivalent a bunch of other problems and as it turns out falling from work that happened before us and after us Nash equilibrium is you know not just complete for PPD and not just equivalent with flour but it's equivalent with a host of other problems in combinatoric topology and economics so for instance I'm missing some of them here so sperner's lemma skaars lemma fractional hypergraph matching market equilibrium are all equivalent problems in Nash's theorem so that's the beauty of complexity theory once you prove your problem is complete for a class you can now you know you can try it showing it equivalent to other problems okay it's a picture we don't have for factoring okay but also showing that the problem is complete for a class identifies precisely the computational barrier into developing polynomial time algorithms for your problem and our PPD completeness is now basically identifies a combinatorial barrier for that you know you have to to find to compute equilibria and polynomial time and so but you know I claimed that the fact that Nass is equivalent to breyer and also complete for this class explains why we have been unable to get polynomial time algorithm so the problem and also why we have not been able to remove the use of brown's fixed point theorem in establishing Nash equilibrium so what this theorem says is that Brower is inherent in showing that Nash equilibria exist while it's not inherent for showing that minimax equilibria exist and that's why it was you know this the other theory and minimax was shown also with linear programming theology okay so you know after we prove the are
result people last you know like okay is this over for game theory and I think it's absolutely not it just makes the story much more interesting okay what follows from our result is that in a computational respecting world the universality of the Nash equilibrium breaks down okay so part of the reason why Nash equilibrium so prominent in economics is that it always exists at Universal so any game can be analyzed using Nash equilibrium so what we're saying is that complex games may have a Nash equilibria but that's not computable in polynomial time so which you know breaks down the the universality of the concept and what we have been advocating and you know over the last years it's becoming more and more prominent is that if you are about to use Nash equilibria to make a prediction about again you have to be you have to understand what makes Nash equilibria and your game actually computationally tractable okay so if your game is two player zero-sum I have no problem with you using Nash equilibria but if you you know you don't know why a Nash equilibrium will arise then you shouldn't use NAS human predictions not equilibrium properties to actually argue about you know what will happen in that interaction now if your game fails to have good structure then you have to make it have good structure sometimes you can design your game so you can infuse your game with good structure that will allow equilibria to arise if you're unable to do that then you should switch the different solution concept okay maybe you should side dynamics of games or some other type of equilibrium but it's it's risky to actually design your you know analyze your game or design your game with a thinking that Nash equilibria will magically happen so this sort of like a brief overview of the complexity of Nash equilibria and now I want to turn the page and come to
something very very very recent okay as of you know four years ago and that's to do with what is called what are called generative adversarial networks where we have a very nice interaction between zero-sum games and machine learning and the motivation for this development is as you know the archetypical question in machine learning which is the end probability theory and statistics which is developing models for real-world distributions okay so that you know there are k typical question in machine learning is i give you samples from some distribution of interest and i want you to come up with a model that you know i can use to generate samples from that same distribution so in other words i want to learn a generator of the generate similar samples like the real examples now when it comes to you know like if you want to estimate a multi-dimensional Gaussian that's an easy task usually because you can collect enough data to estimate the mean and the variance of the distribution however many distributions of interests are nonparametric okay so if I wanted to learn distribution over images that's not an easy task because you don't have a you cannot postulate a parametric description your distribution to then estimated parameters okay so a big open problem in machine learning is how do you get good models of interesting high dimensional distributions and one approach that has been suggested over the recent years those are scholars first of all instead of maintaining an explicit density for your distribution give that up and instead try to only find a generator that is a function try to find a function that will eat boring randomness namely isotropic Gaussian randomness and will spit out interesting around ominous namely samples from the distribution that you're interested in okay so in my picture here suppose I want to learn a distribution of the faces of people okay so what I want to do is I want to come up with the function f that we're you know that will eat a Gaussian sample and spit out a photo of a person okay now what you know like like you know like the recent trends in machine learning what has been proposed to use as a function to do this thing is to plug in a deep neural network okay not just anything I mean of course you have to pick a function from original family to implement interesting stuff so you know they're you know the recent for inner functions that people have been playing with our deep neural networks they have enough explicit expressive power that potentially can do this job so you design an architecture for a deep neural network you put it in there and you know you have to tune the parameters of that function - you know achieve your goal which is to generate samples like the ones that you see in the Indian data set I'm not going to describe how neural networks are that's not super important for what I'm worth describe but for you if you haven't seen what it is before think of a neural network as a parameterised function the texas input some low dimensional input okay say a hundred dimensional outputs a high dimensional art could say a thousand or a million dimensional it also has a lot of parameters like a million parameters what's important is that it's differentiable almost everywhere but respect to both theta the parameters and its input Z now you you know design you know you you you you design an architecture for a neural network and it has this free parameters Tara your goal is to look at samples from the target distribution the distribution of a real faces and tune these parameters so that the output distribution of your neural net matches the true distribution so and
if you do that you get actually pretty impressive results okay so in this picture I'm showing you on the right hand side let's focus on the right hand side what happens if you train models on distribution on a database of faces of real humans after you do your training you're actually you know after training us you know it's over now you can plug in Gaussian inputs and get random faces these random faces are not real people they're hallucinating people buy this neural net but these need a little learn the distribution from which human faces come are coming from so it cannot generate new faces so these guys you see over there are not real humans these were these are completely hallucinating people now if if I manage I'm going to show you the best that there is okay in
terms of this technology of course I cannot somehow play this video because my screen is different than your screen let's see oops I don't know what to do
oops
all right and object duplicates okay
okay this is the latest and greatest okay so so I'm going to show you what happens if you train such a thing on what is called the celibate data set the cellular data set is a data base with photos of celebrities okay so I'm showing you now I'm going to show you first the data the data set that I'm going to train on okay
these are real celebrities okay and
Celebi is a database with a lot of
headshots of real celebrities okay so
here are some samples from the database
okay and then what I'm gonna do okay so
this is not me I think it's a paper by Nvidia yeah that's what I'm gonna do is
I'm gonna use this database to Train
generate another sale Network and now I'm gonna show you what happens if i plug in random inputs too and look at
the output of that thing so now these
guys are not real humans this were these
are these are you know some like the little never dreamed of this before okay
you can see that it did a bad job at
some places like some of the boundaries are a bit fuzzy in stuff like like you
know that guy like in the middle right
right but you know it's doing a pretty
good job that's pretty remarkable okay these are some of the samples that it
generates it's pretty cool huh and in
fact but I'm not gonna get into that it
learns a lot about the geometry in which this faces actually lie okay so these
faces have nothing Euclidean geometry if I take a face in another face and average them in terms of averaging the pixel values I'm going to get some garbage in the middle so you shouldn't average in the Euclidean sense but I can use this neural nets to actually understand the geodesics of the space of images I can start with an image and another image and I can interpolate in the geodesic on you know the geometry that contains the symmetries so that's really actually cool okay in an event so more recently and before I get into the mouth actually this week there was the first auction run by Christie's were again created painting was sold to somebody against a French collector but this image that was actually generated by neural network okay and let me show you the signature of the painter the signature of the
painter you know it's a bit fuzzy it says min max of a function okay so it's min marks of something okay let me let me tell you why it says min marks of something so the way you actually set up
the training procedure is you actually define a two player zero-sum game between two neural networks one you know network is the unilateral showing you earlier the one who gets garbage noise okay and and processes it and output pictures the other neural network is the judge he looks at inputs that were generated that were hallucinating by one of the two units I also takes as inputs in real real images and he has to distinguish you know is you know am I looking at the hallucinating image or am I looking at the elements okay so the judge is trying to distinguish between the outputs of the generator and the real distribution the generator is trying to fool the judge so the idea is that at the equilibrium of this two player zero-sum game the generator should be doing a good job so in particular you set up a minimax problem where you know one of the two players is controlling the parameters this deep neural net the other player is controlling the parameters of that neural net and you write down a function f that basically expresses how well this judge is distinguishing between the hallucinating photos and the real photos okay let me show you there are many examples for what function to choose there but let me give you one function that you know I like that gives us to what is called a Vassar sting gun okay so that function basically runs a test okay so it uses the discriminator this guy it computes the expectation of the discriminated on a real image versus a hallucinating image so now this hallucinating image is basically you sample garbage you plug it into the generator and you get this hallucinate image and you run the test on that image and you compare this to expectations now if the discriminator could implement every mixes one function this soup of that function would be the vastest in distance between the two distributions okay but because I cannot range the discriminator over all possible lips is one functions I'm restricting myself okay so I'm giving him some expressive power but not all expressive power so he finds a different type of the soup is a different type of metric but it's you know closely related to the bus resting distance now this parameters theta and w that the two players have to set are very high-dimensional there are a million dimensional okay or you know they're very high dimensional so you cannot solve this minimax problem you know centrally okay so you have to solve it in a very naive manner and what makes sense to do is to run a first order method so usually the way these things are trained is by having the inch player runs some version of gradient descent and the soup layer and some version of gradient ascent okay in tandem okay so you have the two players run a first order method only requires computing gradients you don't have to don't compute Hessians or anything and you hope that this will converge to the minimax equilibrium now the major issues that occur and I'm going to talk a little bit about one of the issues is you actually get cycling very calm if you run gradient descent ascent to solve the minimax problem it's unfortunately not stable okay so minimization problems running claim descent is you know you get your luck eloped but if you run gradient descent ascent for a minute max problem you actually cycle around the solution related to that the distribution that we're talking about here are very high dimensional a thousand you know ten thousand a million dimensional these are distribution of our images so you compute you know you use a bad training procedure okay to set the parameters of your of your gun you know like good luck you know that you get no guarantees so at least you'd like to evaluate the quality of your algorithm and you know a big challenge is you know getting rigorous statistical guarantees about what happened after training is over or you know it for statistical guarantees during training so recently I've been working on these two issues and I'm going to briefly talk about one of them about how that the optimization point of view how could you possibly remove cycling I'm going to give you some pointers to a video where I discuss also the second unit challenge alright so first of all let me show you
what happens you know what cycling is and how it comes about in training these things so here suppose that the training distribution is just a mixture of gaussians array whose lanes are arranged on a cycle okay suppose this is my training data a mixture of gaussians and what I'm showing in the bottom here a row is what distribution the generator generates are different steps of training and what you observe is that if I look at the distribution of the generator generates a different steps of training the generator is basically sampling different modes of the Gaussian distribution but it never samples from the mixture so you know like the sort of like the you know the discriminator is hunting the generator who's switching modes of the disc but in this example it never actually converges to the mixture in a more
interesting example so here I'm training in the mixture of hundred and digits ok so 10 you know have 10 distributions over 100 and digits and I'm training my generator discriminator on the mixture of those handwritten digits and again I'm showing you here what happens at different steps of training I'm not focusing on any particular node but I'm you know focusing on what the authors call pirata digits so you see like if I stop at different steps of training my distribution has you know this mod collapse property okay so this you know this training of solutions are very common here and again I emphasize that this is unlike minimization problems so gradient descent is the method reserved for training deep neural net classifiers and if you have a minimization problem you don't really descend ok you're not gonna get a global of but you will get to a LOC a lot on the other hand when you use gradient descent ascent in
tandem to solve a minimax problem all bets are off so you get cycling even if your function f is convex concave which is the fundament case which is the best case scenario for min max problems so we have been thinking about how to remove these oscillations and so we have some results interestingly so what we show is that if instead of doing gradient descent ascent you do graduate the center sent with with negative momentum and you know I'll explain what that means you you you know and if your function is convex ok so that's the best case scenario you will convert the minimax equilibrium even even if you have becomes if you're solving a constraint minimax problems where your parameters live in you know to given polytopes and so in particular even if that's us as basically a strong as doing linear programming yeah except now you have to do projected gradient descent because gradients tells me take you outside of your politics you have to project back but other than that if you do a claimed gradient descent with negative momentum it's actually good for you so the surprising thing is that usually in minimization problems positive momentum is good for you because it smooths out but possibly rugged terrain where whose gradients may be pushing you in all sorts of different directions if you maintain momentum you follow a smoother path to the local opt as it turns out for minimax problems negative momentum is good for you so that was the one of the surprising aspects here and you know like you know
I'm close to being over but let me try to offer you some cartoon I haven't given I guess any proof in this talk but let me give you one cartoon at least proof so I'm going to look at the most trivial possible case of a minimax problem so you have scaler theta and W and you have a linear function theta times W basically the only reason I'm centering it is that my pictures were centered at 0.5 s but other than that this theta times W so if you want to do gradient descent and ascent for the min and the max players of this game it boils down to these equations ok so the theta guy is doing gradient descents in particular at time step T plus 1 he updates his car and theta by subtracting the gradient with respect to theta of this function and this guy the su player does get a sense he adds the gradient to his current value ok the gradient with respect to his own solid you now if you like so this function the unique minimax equilibrium is the point five point five point okay in any other point you know the other player can exploit you so you have both players have to play 0.5 to be our equilibrium nevertheless if I do gradient descent and starting say from here I will spiral out of the to infinity okay in this unconstrained problem so great in December Sun has the spiraling out behavior usually in discrete time let me explain and that's my car to improve why negative momentum could be good for you and you can actually make that into math first what is negative momentum negative momentum is instead of following those dynamics here that I'm showing up there I'm actually undoing some of yesterday's gradient in my current step okay so you know this would be normal gradient descent but I'm doing a little bit that's one step size a little bit half the step size of my yesterday gradient and for the ascent player I'm also undoing some of the yesterday gradient which basically if would this would be the trajectory of normal gradient descent it would mean that basically instead of following the gradient step here I will undo a little bit of yesterday's gradient which is this one this is gonna push me slightly inside which you know like if you do the math will get you a nice trajectory converging to the minimax equilibrium this is roughly speaking Y negative momentum is good for you right so like if you if you're going around the star and he's spiraling out if I give you some negative momentum I can push you to convert to the star okay which is in the middle that's kind of the intuition okay
but in that leads me to a very interesting approach this is what happens if your function is not convex concave now we have no idea what's happening here okay so that's a big a very interesting open problem getting a first order method some variant of gradient descent that is guaranteed to get to a local minima solution for a minimax problem okay again it was just a minimization problem gradient descent will get you to a local opt but because the you know even even if the function isn't concave but here in a mini marts problem no method no first-order method variant of gradient descent that can get you to minimax solution so that's a very interesting in my opinion open problem so I would like to conclude I mean so we
can still apply this method despite not having properly no proofs of convergence in practice and we actually do get improvement in practical settings but I would like to conclude so what I've talked about today I guess I gave an overview starting from from Norman to now of the uses of game theory in you know game theory in machine learning so what I told you about is that Nash equilibria are people complete and in particular are equivalent to a bunch of other problems in game theory economics combinatorial topology and when Nash equilibria are intractable you should not trust their predictions okay chances are players are not going to be able to get there and chances are you are also not going to be able to find the equilibrium ok so it's important to identify families of games where a Calibri are tractable or you know understand you know how to do game theory using alternative solution concepts and emerging on the complexity side I'm going to state now but of open problems with a find very interesting so there's an emerging literature trying to show other its case hardness of PPG in other related classes in between and P and P N and P one very interesting open problem and there are reasons to believe this should be true is showing that factoring this problem that I showed you earlier that was floating around between P and P showing that is actually in PPD okay factoring actually ends up being included in two cousins of the class PPD whose natural in is you know class in the intersection of these two classes is PPD so it would be really interesting to show that factoring is there on the flip side of that so that would be sort of like using cryptographic primitives to argue about the average case come laxity of PPD on the flip side of that is assuming an average case complexity of PPD try to show try to develop cryptographic methods okay so the graphic methods should be founded on the complexity on complex on average case complexity assumptions and you know it would be very compelling if we could base cryptography on the average case hardness of you know PPD and one of the related classes on the game theory and deep learning style I told you that you know there's you know a lot of excitement in building games that big humans and also in creating these you know guns this generated other serial networks that you know learn intricate distributions of our pictures and I stayed a bunch of open problems there so one is understand the optimization landscape of minimax problems we have know between you know we don't know how to you know we don't have first order methods that get there and also to develop statistical guarantees for games as I told you you know the distributions that we're after are high dimensional so you know we would like to get statistical guarantees about the properties of our you know generators on the sort of like you know challenge if you one side you know like there has been a lot of excitement about beating human players in chess and go but you know and you know two-player Texas Hold'em but and this would be consistent with my theory you know I've I think that you know like moving outside of the class of you know nicely behaved two player zero-sum games is a whole different game okay it's a whole different ballpark and you know as a challenge to the community I would be super interesting interested if somebody could develop good you know three players you know Texas Hold'em or you know like you know deep learning Bay's or you know other methods that bit humans in three player games okay and you know here are some further pointers that I promised if you want to hear more about you know both optimization in the statistical if on guns you can look at you know a recent talk that I gave at MIT on the complexity Libya is a video where I expand on what PPD is and the developments and the complexity front there is a video from Simon's Institute of theoretical computer science and you can also look at my ICM proceedings article for you know more details and little and you know references and a lot of open problems around is in this area thank you very much thank you very much for a great talk we are running late but we have the possibility of questions [Applause] yeah okay yeah right okay good so right so you know as you said positive momentum is considered to be good for you why because it's smooth smooth and it's a racket domain so I thought the classical image coming down a mountain and following gradients you know if it's rugged then you know maybe you know one day gradient pushes you this way the next day that way and so so far so you'd be very slow going back and forth while getting momentum would get you down you know down the slope faster right so this method but you know like what I showed you you understood the hope why negative positive men would be bad for you right so you're already going around this you know a cycle or a spiral with positive momentum you're gonna get into a bigger spiral and negative is the opposite of that so intuitively it makes sense now if you're sort of like if if if if your function is strongly convex concave you can show that you don't need to do negative momentum that great you know just great like if you strongly if you strongly convex then you know gradient will take you to the bottom right and then you know I don't know like optimizing with respect to you know it sounds to me that you know negative momentum would be always good for you but you know now on the optimization front there have been cases where actually negative momentum is good for you and that arises when you're doing not any type of great descent but if you bring stochastic gradient descent so if you're you know training a big neural network you know you cannot take gradients with respect to the whole function but you you know you look at you know functions that are separable so it's a sum of a bunch of functions and usually you sample small sample of them and you do gradient only with respect to that sub sample the objective function and you know then you know there is some bias like you know like there is you know an expectation you get the right gradient but there are some biases that sometimes negative momentum can correct so it has been argued for different reasons that in SGD a negative momentum could actually help you it was another question over there so now you can try to throw it
it's good yes so thank you very much for this talk and quite nice to see these things connected I have anyway a simple yes/no question so is it the case that the a good metaphor for this face generation random face generation is that you sort of think of it as taking the faces as inputs reverse-engineer their DNA mutates the DNA and then regrow the person and render that face yeah yes and no I would say yes it's a it's a you know it's a it's a nuance yes yeah so we're trying to do essentially what imagination does or what genes do right so genes you know like map random random genome to a face okay but the random genome should be you know should be valid genome okay but as long as it's a valid genome it will give rise to a face and when you have we know sexual introduction or mutations you know like you know you inherit random face related to your parents and so forth right so but yeah this process or what your imagination does when you want to imagine you close your eyes you want to imagine the random celebrity or you know some other face this what you do you map random bits from your century you know inputs and you know somehow you use those to hallucinate random face so this is what we try to replicate here yeah thank you okay I think we will have to stop here cause this one's more of a great [Applause] [Music] you
Loading...
Feedback

Timings

  899 ms - page object

Version

AV-Portal 3.9.1 (0da88e96ae8dbbf323d1005dc12c7aa41dfc5a31)