We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to Build a World-Class Rock Paper Scissors Bot

00:00

Formale Metadaten

Titel
How to Build a World-Class Rock Paper Scissors Bot
Serientitel
Anzahl der Teile
69
Autor
Lizenz
CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
This talk will teach you how to play rock-paper-scissors (no really, it’s more complicated than you might think) plus a little bit about AI history. It will also make you laugh — at least a little. You'll also learn some game theory, some mind games, and finally, how to build a world champion bot in the universe of Rock Paper Scissors. This talk is aimed at all levels of Ruby programmers - beginner or advanced. All the examples in the talk and are explained step by step, and many of them relatively basic.
Gebäude <Mathematik>Klasse <Mathematik>CASE <Informatik>QuaderXMLComputeranimation
GrundraumBitMathematikMultiplikationsoperatorOffene MengeLokales MinimumData MiningVerzweigendes ProgrammCodeSoftwareEDV-BeratungComputeranimationVorlesung/Konferenz
BitSpieltheorieSchlussregelBenutzerbeteiligungAutomatische HandlungsplanungRichtungArithmetisches MittelSchwebungComputeranimation
SchwebungVersionsverwaltungWasserdampftafelComputeranimationDiagramm
VersionsverwaltungSnake <Bildverarbeitung>SpieltheorieKategorie <Mathematik>DiagrammFlussdiagrammComputeranimation
TopologieSnake <Bildverarbeitung>Lesezeichen <Internet>SpeicherverwaltungEin-AusgabeRuhmasseComputeranimation
Web SiteSpieltheorieVersionsverwaltungKomplex <Algebra>Computeranimation
Web SiteInhalt <Mathematik>Web SiteSchwebungBitGraphfärbungLastRechenschieberKategorie <Mathematik>Endliche ModelltheorieComputeranimation
Elektronischer ProgrammführerUnrundheitComputeranimation
Wort <Informatik>RoboterStrategisches SpielComputeranimationVorlesung/Konferenz
Arbeit <Physik>RoboterBitrateOrtsoperatorComputeranimationVorlesung/Konferenz
Arithmetisches MittelTermVersionsverwaltungMomentenproblemSpieltheoriePuffer <Netzplantechnik>VererbungshierarchieAbstandComputeranimation
InformationSummierbarkeitSpieltheorieInformationStrategisches SpielComputerschachPerfekte GruppeKontrast <Statistik>MultiplikationsoperatorSummierbarkeitAbstandEndliche ModelltheorieComputeranimationVorlesung/Konferenz
RoboterMatchingEvoluteResultanteStrategisches SpielZahlenbereichUnrundheitMusterspracheEinsInterpretiererVersionsverwaltungQuaderGlobale OptimierungMetropolitan area networkComputeranimation
StichprobeZufallszahlenFrequenzZählenZeichenketteFolge <Mathematik>ZeichenvorratFolge <Mathematik>ZeichenketteStrategisches SpielDickeSchnittmengeOrtsoperatorRoboterAggregatzustandSummengleichungZeichenvorratMultiplikationsoperatorStatistikFrequenzSchaltnetzDreiecksfreier GraphElement <Gruppentheorie>Reverse EngineeringZählenDifferenteZahlenbereichRandomisierungTurnier <Mathematik>Ordnung <Mathematik>AnalysisHash-AlgorithmusMusterspracheKartesische KoordinatenSoftwareentwicklerTwitter <Softwareplattform>AbstandSchlüsselverwaltungMereologiePunktZusammengesetzte VerteilungDienst <Informatik>GravitationVerschlingungBenutzerschnittstellenverwaltungssystemVererbungshierarchieTermHinterlegungsverfahren <Kryptologie>Computeranimation
DickeZeichenketteTeilmengeLeistung <Physik>Computeranimation
MathematikZeichenvorratChi-Quadrat-VerteilungZeichenketteDickeCodeWeb-SeiteFrequenzRoboterStrategisches SpielMereologieZählenPunktQuaderTopologieComputeranimation
ZählenMatchingRichtungBildschirmfensterDickeStrategisches SpielMusterspracheComputeranimation
ZählenRegulärer Ausdruck <Textverarbeitung>FrequenzZeichenketteRichtungBildschirmfensterImplementierungDickeMatchingZählenStrategisches SpielTabelleVektorraumLokales MinimumWort <Informatik>SichtenkonzeptStichprobenumfangComputeranimation
Motion CapturingPoisson-KlammerGruppenoperationMatchingLokales MinimumRechenschieberZeichenketteMereologieBildschirmfensterUnrundheitCASE <Informatik>Kategorie <Mathematik>MultiplikationsoperatorWort <Informatik>Gerade ZahlGeradeComputeranimationVorlesung/Konferenz
BildschirmfensterRichtungBildschirmfensterStrategisches SpielMatchingMereologieRegulärer Ausdruck <Textverarbeitung>CodeCASE <Informatik>ZahlenbereichMailing-ListeWasserdampftafelComputeranimation
FrequenzBildschirmfensterZahlenbereichMatchingFrequenzRoboterGruppenoperationComputeranimation
FrequenzZufallszahlenZählenGerichtete MengeMusterspracheSpieltheorieRoboterCodeLoopTrennschärfe <Statistik>Hash-AlgorithmusRegulärer Ausdruck <Textverarbeitung>Strategisches SpielZählenBildschirmfensterMatchingTypentheorieMultiplikationsoperatorRechter WinkelEndliche ModelltheorieMehrschichten-PerzeptronComputeranimation
Strategisches SpielDelisches ProblemKontrollstrukturSystemaufrufWort <Informatik>GraphSoundverarbeitungStützpunkt <Mathematik>PunktDiagramm
FrequenzZufallszahlenGerichtete MengeMusterspracheMeta-TagKreisbewegungStrategisches SpielGrenzschichtablösungZellularer AutomatPunktResultanteGeradeMultiplikationsoperatorEinflussgrößep-BlockSpeicherabzugSpieltheorieComputeranimation
FrequenzGerichtete MengeMusterspracheZufallszahlenMeta-TagWechselsprungStrategisches SpielGeradeÄußere Algebra eines ModulsUnrundheitZahlenbereichMultiplikationsoperatorPunktMathematikVererbungshierarchieComputeranimation
RundungDifferenteStrategisches SpielUnrundheitDickeDatensatzTabelleMeta-TagMultiplikationsoperatorComputeranimation
BildschirmfensterMathematikStrategisches SpielLeistungsbewertungMeta-TagComputeranimation
Meta-TagStrategisches SpielUnrundheitTVD-VerfahrenTabelleComputeranimation
SchlussregelRoboterKomplex <Algebra>SymboltabelleMeta-TagStrategisches SpielStichprobenumfangRechenschieberAdressraumGesetz <Physik>Computeranimation
GammafunktionNormierter RaumCOMRechenschieberVerschlingungGemeinsamer SpeicherTwitter <Softwareplattform>GeradeComputeranimationVorlesung/KonferenzXML
Transkript: Englisch(automatisch erzeugt)
Thanks for joining me in here. Let's begin. So Ruby paper scissors, or how to build a world-class rock
paper scissors bot. So a little bit about me first. My name is Dot, short for Dorothy. I'm based in Brighton in the UK on the south coast. Hence the accent, you may have noticed. I'm a bit of a maths nerd. I studied it in my spare time for a few years with the Open University. I previously helped co-organize the Brighton
branch of Code Bar. If you don't know what Code Bar is, you should definitely check them out. They do some great work for under-representation in tech. And I currently work for Happy Bear Software, a fabulous rails consultancy. They helped get me here, so thanks, Happy Bear. Now to today. Today I'm going to be taking down a bit of a rabbit hole I went down recently, which is completely
unrelated to web dev and possibly even a little unrelated to Ruby. But trust me, it's going to be fascinating. We're going to be looking at rock paper scissors and rock paper scissors is a simple game played between two people. But just as a formality, let's briefly explain the rules. So each player simultaneously chooses one of three moves.
These moves are rock, paper, and scissors. If both players choose the same move, it's a draw. Otherwise we have rock beats scissors, scissors beats paper, and paper beats rock. But I've bloody no idea how paper manages to beat rock. In Malaysia, they have a far more sensible version where they have water instead of paper and bird instead of scissors. Speaking of other versions, here's my personal favorite,
Bear Hunter Ninja. The earliest Japanese version of the game, which was imported directly from China, has instead a frog, a slug, and a snake. Going up a gear, there's also rock, paper, scissors, lizard, Spock, which I'm sure you've seen Sheldon play on Big Bang Theory. But he never mentions this version,
rock, paper, scissors, fire, water, air, sponge. Or rock, paper, scissors, fire, water, air, sponge, gun, human. Or rock, paper, scissors, fire, water, air, sponge, gun, human, devil, wolf. And then you can add dragon, snake, lightning, tree. And then there's 25, but that's all nothing compared to rock, paper, scissors 101.
Pretty ridiculous. But there are some great moves in it. One of my favorite is the monkey move. He beats a lot of other moves by either flinging poop, climbing on them, scaring them, or ripping them up. But of course, monkey doesn't always win. Monkey is enraged by math,
which I'm rather fond of, not just because I like maths, but it makes me look kind of street. And another good one which actually involves movement is money. But you should be careful to remember that these versions of the game are outlawed within the official rock, paper, scissors society due to their unnecessary complexity.
Yes, there's a World Rock, Paper, Scissors Society, and this is their shiny new website. But I'm not sure how seriously you should take them. Even lizards are using rock, paper, scissors, but for them it's personal. This is the male side-blotched lizard. Now these lizards don't actually play rock, paper, scissors.
That would be ridiculous. But rock, paper, scissors does model their mating behavior well. The male lizard comes in one of three colors, orange, blue, and yellow. And these colors map to the move that they play. So the orange lizard is greedy and ultra-dominant and is the largest of the three. And the orange lizard establishes large territories
with loads of females. Whereas the blue male is a little bit smaller, he's quite content, a little bit of territory, one single female. But because of this, the orange is much larger and can beat blue and take over his territory in a single female. Now the yellow is Sly. He's actually the coloring of a sexually mature female.
So they have no territory but sneak into oranges and take their territory and mate with the females because the orange is too much territory to defend at once. So we have orange beats blue, blue beats yellow as blue only has to defend one female, and we have yellow beats orange as it can sneak into the territory. But going back to rock, paper, scissors,
you may be thinking, how much can there be to rock, paper, scissors? There's only one way to play and that way is random. It's the optimal move. Well, I would say there's more to it than that. And there are actually a couple of great techniques and ways to win given to us by the lovely World Rock, Paper, Scissors Society in their lovely World Rock, Paper, Scissors Society guide.
One such move is called the crystal ball. The aim is to confuse your opponent by declaring what move you will play next, ignoring the fact you don't actually know what they'll play. So you hope they get confused and won't play the move. You say that they will. So you say, you're going to play paper, aren't you? Then they won't play paper, but one of rock or scissors,
making rock a safe bet to play. A more difficult strategy for the seasoned rock, paper, scissors player is tattoo play. The aim is to distract with subtle clues to influence your opponent, such as wearing a t-shirt that says, I'm going to play paper or a belt with the word scissors across it.
Some may even go so far as to make body modifications with R, O, C, and K tattooed across their knuckles. Of course, bots are unable to employ these strategies. What techniques could a bot use? Well, there's this bot, which has a 100% winning rate. In one millisecond, it can recognize the position
and move of the opponent's hand and immediately play a counter move. But although impressive, I would say this is clearly cheating. No, we will be looking at techniques to decide what the opponent will play before they play it. So let's think about the problem we're tackling. In terms of gameplay, rock, paper, scissors
can be thought of as a scaled-down version of poker. In poker, you're unaware of your opponent's move the moment you're making yours, meaning you have to model your opponent. Are they frequent bluffers? Do they have a tick when they bluff? Not knowing your opponent's move when you're making yours is known as imperfect information, and this is also a feature of rock, paper, scissors.
And it is this imperfect information that makes rock, paper, scissors so interesting to model. In contrast, games such as chess are known as perfect information games. Each player can see all of the pieces on the board at all times. Furthermore, in rock, paper, scissors, there is only one winner and one loser,
and this is known as zero-sum. Zero-sum games are called such as the outcome of each game is zero. So let's say we have a win is plus one and a lose is minus one and a draw is zero. We have one winner, one loser, one plus minus one is zero, and two draws is zero. So in contrast, where participants can all gain
or all suffer together are known as non-zero-sum. Now, let's start looking at competitions and strategies to crush our opponents. In 1999, and again a year later, a rock, paper, scissors bot competition entitled the International Rochambeau Program Competition was run by Darce Billings.
He invited people to enter a bot to compete in a round-robbing competition. Each bout between two bots consisted of 1,000 rounds of rock, paper, scissors. They had an open round where every match result counted, and they had a best of the best where the greatest number of wins of a bout gave the winner of that bout. To encourage people to create bots
that actively tried to beat each other, they added dummy bots into the mix. Now, not all the players were optimal, and this changed everything. This meant that to beat these dummy bots, you had to and could, indeed, detect patterns of the dummy bot's play and employ an appropriate counter strategy. TLDR, if we have some not-so-good bots,
we can have some really awesome ones to beat them. So iCamePowder was the winner of this competition, and we're going to step through the evolution of that bot. So first, let's look at the most well-known strategy. This is random bot. So FYI, this is what all our examples will look like.
Each bot has a state containing R-pass moves and our opponent's pass moves, and we have a next-turn method which will return our next chosen move. We can see here that we are randomly returning one of R, P, or S. R for rock, P for paper, and S for scissors.
So how does this bot fare against other bots? This is the always-rock bot, and I think any one of us could beat this bot. However, the random bot would still only win around 50 percent of the time. So random is often considered optimum as an optimum strategy as it is impossible to gain advantage
over a truly random opponent. Equally, however, random won't be able to exploit opponents that do use strategies. Quite clearly, random is far from optimal, not losing is not the same as winning. So let's look how we would beat this bot.
This is the frequency-counting bot, and it is a bot that is actively trying to beat other bots. It's using frequency analysis to determine which move to play. So here we increment our counter with their last move. We take the move with the highest count,
and we return our counter move. Against the always-rock bot, only the rock count in our hash would increase. This means this bot would beat it every time. In fact, this bot would beat any opponent that chose rock, paper, and scissors with different statistical frequencies. This bot doesn't choose rock, paper,
and scissors with different statistical frequencies. This is our fixed-string bot, and it cycles through a set string here of moves. This string is a statistically-balanced string, which is a fancy way of saying the same number of Rs, Ps, and Ss. So we have our current position starting at minus one, and we increase our current position,
wrapping at the end of the string, and we get our move by taking the value of our current position in the string. So how might we generate a balanced string of RPs and Ss? If we did it ourselves, it would either take ages, or it would be full of detectable patterns. This is where de Bruijn could come in, which was used in the official tournament. De Bruijn is a cyclic sequence of order N
on a size-k alphabet in which every possible string of length N on A occurs exactly once as a substring. Simply put, it's a sequence consisting of a set number of different elements in which every possible N length combination of these elements occurs. So looking at the string we just used,
we know we have three different elements. We have R, P, and S, and we are saying we want all substrings of length two. We can represent this like so, where A is our set, and B represents both the length of our set and the length of our substring. In this string, there's every possible two-length combination of our set,
R, R, P, S, P, and even SR wrapping around at the end. Now, de Bruijn strings can be made of any set you'd like. So, for example, we could have the set zero, one. So we have the set length two, and let's say we want all substrings of length three, and there are actually only two distinct strings where one is actually the reverse of the other.
Now let's say we have a set one, two, three, four, and we want, say, all substrings of length four as well. Well, that would actually give us this string. The lengths of the string are the length of the subset to the power of the substring, length of the substring. So note, though, these strings are actually optimally short.
They are as short as they can be while still containing all the substrings. So going back to our RPS set, let's say now we want all substrings of length six. We get this so we can see how quickly the length of these strings increases. You can generate the Bruijn strings quite easily, even that super long one.
This would allow you to use several or such as that super long one and make them quite hard to detect. So this is the code I use to generate the Bruijn strings. I translated the Python code taken off the Wikipedia page into Ruby code. Now, using a fixed string is a pretty good starting strategy, and now we have a strategy to beat the frequency counting bot.
But how do we go about beating these balanced bots, such as our fixed string bot? We need a new method. We could employ direct history. Direct history is a pretty solid strategy for beating an opponent. We can store their move history and look for patterns, and let's say here is our opponent's last 30 moves where this is their latest move
and this is their first move. Now, let's say we take a window of length 10. So we take the opponent's last 10 moves, and we search their history for a match. We find it once, giving us a count of one. We also get their following move. This would be the move we'd want to beat, and we can see it's P. We could also look at a window size of nine,
which gives us a single match, with the same following move to beat, and we could keep decreasing our window size and checking for matches. Once we get to four, we actually have two matches, and we can see that we have the possible following move P or the possible following move R, and we get this same situation with a window size of three.
So I've put all of this in one of my favorite things, a table, and this now gives you several strategies. You could either A, take the matches of the longest length that gives you this row, so you'd be trying to count to P. You could take the matches off the highest count, which would give you these three rows, and you'd either count to P or R, or you could take all the window sizes and take the highest frequency following move,
giving this column. Now, there's a bunch of ways you could implement direct history, but let's look at a most basic example first. So here's my pretty basic implementation. It's dirt simple and doesn't work for the first seven moves, but it is a basic working example implementing direct history with a window size of three. We first append the latest move onto our string of moves.
Then here we have some regex to find matches of length three that match the end of the string. It also captures the following character to the match so we know what move we'd like to counter. We use this regex to search our opponent's history. We take their second capture, and we return our counter move.
Let's look close about that regex. So to understand this sequence, it's probably a good idea to start at the end. So the dollar character means the end of the line, and the backslash one represents a previously captured group, in this case, the first capture group. So we're looking from here backwards. And this is our first capture group.
Capture groups are in round brackets. We're capturing any character of R, P, or S exactly three times. And we know from the last slide we also want this to match at the end of the string. So we can see we have the match R, R, R. Next again, we are capturing any one of R, P, or S. And this is the part where capturing our opponent's following move,
the one we want to counter, which we can see is S here. And lastly, we say let there be any one of R, P, or S zero or more times. And now you might be able to see why seven is the maximum this would work with. Well, we've got our window size of three, so we want the last three moves. Plus we want the three to match it, and then we want the following move.
So all that makes seven. So let's look at some examples implementing some of the other direct history strategies. Let's start with the biggest window size, i.e. the longest match. This code implements biggest window size. We look through window sizes from our largest to our smallest. So from here, five down to two.
And we use our regex we had previously to search for a match. If we find one, we break up the loop and return their following move. We're not interested in any smaller window sizes. We also call a fallback method, which could return whatever you'd like, but it's just in case we don't find any matches. There was also the greatest number of matches,
and this is where taking the match occurs the most. It's similar to our previous bot getting the biggest window size, but instead we get the last X number of moves and use this to scan the rest of their history. We count how many matches and take the following move. So once we've looked through all the window sizes, we cancel the move with the greatest number of matches. Again, we call our fallback method if we don't find any.
And then there was the frequency of following move, and this takes every match of all window sizes and groups them by the following move. We then cancel the following move that occurs the most. This code starts by setting up a hash of all the following moves, starting with zero, and the count is increased within the loop where we are again looping from our biggest window size to our smallest,
and it uses regex to find a match and again captures the following move. And it uses the following move to add one to the matching move in the hash. After capturing all matches, it takes the highest-occurring following move and returns the counter move. So we have a selection of strategies now, and we could make our bot play all of them, but with only one move that we can play,
how do we choose which strategy? Well, we can make it play them all and keep a tally of which is doing best over time, but let's not worry about that for now. What we really need to worry about is our opponents guessing what we're doing and countering against it, i.e. opponent modeling, aka mind games.
Let's say we choose one of those strategies. Let's call it strategy A to play, and let's say that strategy A tells us to play rock. Now, let's assume our opponent is quite smart. Suppose our opponent knows the strategy we're using and thinks we're using it, so they will play a counter to the move
our strategy will tell us to play. They will counter with strategy B and play paper. Now, we think they may know our strategy and counter it. In other words, we think that they know that we have this strategy A that is telling us to play rock, so we counter their counter and call bluff with strategy A prime, so we play scissors.
Now, they may also think that we may bluff them. In other words, they think that we know that they know that we have this strategy A, so they counter our bluff A prime with B prime, and they play rock. However, breaking this down to the last two bluff moves,
we think that our opponent has counterbluffed our bluff. In other words, we think that they know that we know that they, though, we have this strategy A, so we double bluff with strategy A double prime and play paper. But surely, our opponent may counter double bluff
our double bluff. In other words, they think that we know that they know that we know that they know that we have this strategy A, so they may use strategy B double prime, causing them to play scissors. Now, some of you might be slightly ahead of me and be thinking, aha, next will be the triple bluff,
but that would be pointless, as that gives us the same strategy as the straight move. For all the talk about mind games, these are just simple rotations. If our straight no-bluff move returns rock, our bluff is scissors, and our double bluff is paper. And each of these bluff strategies can be combined with one of our previous strategies
to make a meta strategy. So meta strategy is a strategy and a rotation of one of our moves. And each one of these cells is a meta strategy that we could apply. Now we have all the strategies we had before multiplied by our bluff strategies. We need another strategy to decide which meta strategy to employ.
What we can do is store all the results of every one of the meta strategies. So for each meta strategy that wins, we add one to its score. Several meta strategies will win at the same time, and when it loses, we subtract one from its score. The meta strategy that is doing the best
at any given point is the strategy we choose. This is our strategy to choose a meta strategy, our meta meta strategy. Now let's jump to much further down the line. We've played hundreds of rounds now, and we've got a winning meta strategy here that we are playing. And it continues to do well.
But what if our opponent suddenly changes strategy? Just removing one from the value would take a long time for the numbers to change and for us to change our meta strategy. An alternative would be to reduce by percentage of the value, e.g. 10%. So the values could decay. Here we can see the values quite quickly reduce and we can adapt to any new strategies
our opponent uses. And we are now at the point where our original meta strategy is much lower than our new winning meta strategy, and it didn't take that many rounds to adapt. We now understand ways in which we can manage our table and select our winning strategy.
We might also want our table to take into account amounts of past moves, so we could use different lengths of move history when applying each strategy, our rows. For example, we can run each of our strategies over the last five rounds, over the last 50 rounds, and over the last 100 rounds all at the same time. Then we can use the general meta strategy
to figure out which one does best. So in our new table here, we have the past 10 rounds for every strategy and the past 100 rounds for every strategy. We then have the same situation we had before, giving scores as they win and lose, but we are applying our strategies twice,
the past 10 and over the past 100. And this could allow us to adapt to new strategies more quickly. For example, if our opponent changes strategies, we could be evaluating their past 100 moves and their past 10 moves, but our evaluation of their past 10 moves is less influenced by their previously used strategy, allowing these meta strategies to do really well. On top of this,
we can apply the same variation history of past moves to each of the meta strategies. So here we have our table duplicated, one table covering the past 1,000 rounds and one covering the past 500 rounds. Then we can use the meta strategy as before to decide which meta strategy to use. This is how I, Cane Powder by Dan Eggnall,
won the 1999 World Rock Paper Scissor International Bot Competition. And I say, I told you it was gonna be fascinating. And as Darce Billings stated, simple rules do not preclude a deep strategic complexity. And as a final note, I've built these strategies, meta strategies and the meta meta strategy into a slack bot.
You can play the slack bot and see how it reacts. So here it starts learning my move and then when I change move, it adapts and starts beating me again. I built this mostly for fun, but also as an example of these strategies working. So you too can have this in your slack bot.
If you go to the above address, you'll find the add to slack button. So you too can have this in your slack. And the second link is the GitHub link here. PR is welcome of course. I'll share these slides later on Twitter as well. Thank you.