We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Hacking Reinforcement Learning

Formale Metadaten

Titel
Hacking Reinforcement Learning
Serientitel
Anzahl der Teile
132
Autor
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Creating huge datasets of top performing examples for Reinforcement Learning (RL) has always been tricky, but if we allow ourselves to cheat a bit it can be done very easily. During this talk, I will present a new family of algorithms that allow to efficiently generate very high quality samples for any known RL environment. This new generation of planning algorithms achieves a performance which is several orders of magnitude higher than any other existing alternative, while offering linear time complexity and good scalability. This talk will be a practical example of how we can use new tools for hacking any reinforcement learning environment, and make it generate superhuman level games. Hacking RL, as any other hacking process will be divided in four phases: During information gathering, I will briefly explain what are the main ideas behind Reinforcement Learning. I will also talk about how our theory (FractalAI) came to be, and what are the fundamental concepts behind it. We will find an attack vector against the environment API, and explain how it can be exploited. I will explain the fundamental concepts needed to build a new generation of exploits, that will allow us to have complete control over the data the environment produces. This is the time to test the new exploits and to show a proof of concept. We will exploit the attack vector to gain access to the environment. Using only a laptop I will show how it is possible to sample data which surpasses human performance way faster than real time. Once we have gained control of the environment, we will measure how well the exploits work, and how well the techniques presented can generalize to other types of environments. I want the talk to be as simple and fast as possible, with a lot of graphical examples, videos, and a Jupyter notebook. The Q&A session is the time to apply some social engineering to get me to talk about the details that you find more interesting. I have prepared additional material covering the most common questions and concerns, but feel free to ask whatever you want, I love challenging questions ;)