Hacking Reinforcement Learning

EuroPython

Ballester, Guillem Duran

Formal Metadata

Title

Title of Series

EuroPython 2018

Number of Parts

132

Author

Ballester, Guillem Duran

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/44949 (DOI)

Publisher

EuroPython

Release Date

2018

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Creating huge datasets of top performing examples for Reinforcement Learning (RL) has always been tricky, but if we allow ourselves to cheat a bit it can be done very easily. During this talk, I will present a new family of algorithms that allow to efficiently generate very high quality samples for any known RL environment. This new generation of planning algorithms achieves a performance which is several orders of magnitude higher than any other existing alternative, while offering linear time complexity and good scalability. This talk will be a practical example of how we can use new tools for hacking any reinforcement learning environment, and make it generate superhuman level games. Hacking RL, as any other hacking process will be divided in four phases: During information gathering, I will briefly explain what are the main ideas behind Reinforcement Learning. I will also talk about how our theory (FractalAI) came to be, and what are the fundamental concepts behind it. We will find an attack vector against the environment API, and explain how it can be exploited. I will explain the fundamental concepts needed to build a new generation of exploits, that will allow us to have complete control over the data the environment produces. This is the time to test the new exploits and to show a proof of concept. We will exploit the attack vector to gain access to the environment. Using only a laptop I will show how it is possible to sample data which surpasses human performance way faster than real time. Once we have gained control of the environment, we will measure how well the exploits work, and how well the techniques presented can generalize to other types of environments. I want the talk to be as simple and fast as possible, with a lot of graphical examples, videos, and a Jupyter notebook. The Q&A session is the time to apply some social engineering to get me to talk about the details that you find more interesting. I have prepared additional material covering the most common questions and concerns, but feel free to ask whatever you want, I love challenging questions ;)