We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Beating Go thanks to the power of randomness

00:00

Formal Metadata

Title
Beating Go thanks to the power of randomness
Title of Series
Number of Parts
66
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production PlaceSan Antonio

Content Metadata

Subject Area
Genre
Abstract
Go is a board game that is more than 2,500 years old (yes, this is not about the programming language!) and it is fascinating from multiple viewpoints. For instance, go bots still can’t beat professional players, unlike in chess. This talk will show you what is so special about Go that computers still can’t beat humans. We will take a look at the most popular underlying algorithm and show you how the Monte Carlo method, basically random simulation, plays a vital role in conquering Go's complexity and creating the strong Go bots of today.
Beat (acoustics)NeuroinformatikTournament (medieval)Multiplication signGame theoryCondition numberBeat (acoustics)Computer chessSeries (mathematics)Computer animationMeeting/Interview
ComputerNeuroinformatikTournament (medieval)BitDifferent (Kate Ryan album)Computer programmingSpacetimeMultiplication sign
Multiplication signRight angleFormal languageProgramming languageGame theoryWhiteboardComputer animation
Game theoryRight angleCore dumpComputer programmingDifferent (Kate Ryan album)Programmer (hardware)Series (mathematics)Meeting/InterviewComputer animation
Artificial neural networkPattern recognitionMessage passingFacebookBlock (periodic table)CausalityPattern languageVirtual machineBitBlogWebsiteDreizehnHacker (term)SoftwareComputer animation
TwitterWhiteboardSheaf (mathematics)Ultraviolet photoelectron spectroscopyOrder (biology)Open sourceComputer animationLecture/Conference
Game theoryState of matterGroup actionPoint (geometry)WhiteboardFilm editingLine (geometry)PlanningSheaf (mathematics)Torus19 (number)Context awarenessTowerComputer animation
Term (mathematics)QuicksortArithmetic meanWhiteboardMotion capturePoint (geometry)CountingGame theoryDivisorComputer animation
Point (geometry)WhiteboardGroup actionGame theoryState of matterVideo gameRule of inferenceMotion captureCASE <Informatik>Computer animation
Game theoryPoint (geometry)WhiteboardReal numberPosition operatorSet (mathematics)
Right angleGame theoryQuicksortPosition operatorNeuroinformatikBitPoint (geometry)Computer animation
Point (geometry)Group actionCASE <Informatik>Position operatorGame theoryWhiteboardComputer animation
Group actionPoint (geometry)Computer programmingPosition operatorLevel (video gaming)WhiteboardMarginal distributionGame theoryComputer chessResultantDifferent (Kate Ryan album)Rule of inferenceSet (mathematics)BitVideo gameRootMaizeComputer animation
Universe (mathematics)Computer chessRule of inferenceNeuroinformatikQueue (abstract data type)Rule of inferenceVideo gameBitRankingComputer programmingForm (programming)CodeDifferent (Kate Ryan album)Position operatorTournament (medieval)Computer chessLevel (video gaming)Universe (mathematics)Range (statistics)Shared memoryWikiCore dumpComputer animation
Game theoryComputer programmingLevel (video gaming)Tournament (medieval)WhiteboardNetwork topologyMultiplication signComputer animation
WhiteboardSpacetime19 (number)AveragePoint (geometry)Game theoryNumberUniverse (mathematics)Position operatorRule of inferenceWhiteboardState of matterAtomic numberPower (physics)Inheritance (object-oriented programming)DivisorMultiplication signField (computer science)Combinational logicElectronic mailing listSpacetimeComputer chessValidity (statistics)MathematicianCodeComputer animation
QuicksortNumberAtomic numberMathematicsCombinational logicArithmetic meanValidity (statistics)Position operatorPower (physics)Universe (mathematics)Point (geometry)Multiplication signState of matterGroup actionRight angleComputer animationLecture/Conference
2 (number)Computer programmingPerformance appraisalPoint (geometry)Position operatorState of matterFunctional (mathematics)Service-oriented architecturePerfect groupNeuroinformatikMedical imagingMultiplication signComputer chessMaxima and minimaGraph (mathematics)MereologyWhiteboardDifferent (Kate Ryan album)Alpha (investment)Beta functionAlgorithmArithmetic meanDeep Blue (chess computer)Computer animationLecture/Conference
Performance appraisalPerformance appraisalFunctional (mathematics)Game theoryPosition operatorInstance (computer science)Level (video gaming)Goodness of fitOcean currentComputer programmingBitNeuroinformatikComputer animationLecture/Conference
MereologyCircleNumberSquare numberEstimatorDot productPersonal identification numberMathematicsPiPhase transitionComputer simulationSystem callPoint (geometry)Theory of relativityComputer animationLecture/Conference
SpacetimeEstimatorCASE <Informatik>SimulationPiComputer simulationBounded variationFunctional (mathematics)Computer programmingInformationSurface of revolutionQuicksortNetwork topologyTournament (medieval)Computer animation
SimulationThermal expansionSimulationThermal expansionSelectivity (electronic)PropagatorGraph (mathematics)InformationResultantWhiteboardGraph coloringState of matterHypothesisMultiplication signNetwork topologyPhase transitionRandomization
View (database)WhiteboardHypothesisPosition operatorMultiplication signBitResultantPropagatorPerspective (visual)Meeting/Interview
ResultantPropagatorSelectivity (electronic)QuicksortPerspective (visual)Uniformer RaumWhiteboardMereologyGodBitMultiplication signCausality
Selectivity (electronic)BitMultiplication signMereologyVirtual machineGame theoryQuicksortNetwork topologyResultantPoint (geometry)Logic gateExploit (computer security)Planning
Confidence intervalDivisorQuicksortNetwork topologyLogarithmRootExploit (computer security)SpacetimeStandard deviationNumberCoroutineSquare numberNatural numberComputer animationLecture/Conference
Selectivity (electronic)NumberDivisorSlide ruleRight angleSpacetimeExploit (computer security)Point (geometry)
Multiplication sign2 (number)ResultantGoodness of fitVirtual machineCountingWhiteboardPattern languageExpert systemCharacteristic polynomialCovering spaceFigurate number
QuicksortAlgorithmCharacteristic polynomialExpert systemNetwork topologyComputer animation
CodeMessage passingRule of inferenceGame theoryLogic gateFigurate numberVirtual machineArmTournament (medieval)PlanningQuicksortMultiplication signMeeting/Interview
Game theoryGame theoryTournament (medieval)Multiplication signDepth-first searchNetwork topologyAlgorithmComputer fileBookmark (World Wide Web)Rule of inferenceData managementDecision theoryCharacteristic polynomialGroup actionLatent heatGraph coloringDoubling the cubeComputer animation
Game theoryRobotQuicksortCharacteristic polynomialMultiplication signNumberResultantPoint (geometry)Arithmetic meanComputer simulationOrder (biology)Row (database)Greatest elementInstance (computer science)Graph coloringOnline helpRight angleInformation security40 (number)Disk read-and-write headMathematical optimizationLine (geometry)Endliche ModelltheorieComputer animation
Game theoryPoint (geometry)Water vaporComputer animation
System callRow (database)Group actionGame theoryOrder (biology)AlgorithmPhase transitionSelectivity (electronic)RandomizationMultiplication signStatisticsPoint (geometry)ResultantBackpropagation-AlgorithmusOffice suiteState of matterComputer animationLecture/Conference
2 (number)Selectivity (electronic)SimulationLibrary (computing)Negative numberSign (mathematics)Inheritance (object-oriented programming)Network topologyMultiplication signComputer programmingPosition operatorTotal S.A.Pattern languageNeuroinformatikView (database)Matching (graph theory)CausalityPhase transitionData miningGoodness of fitComputer animationLecture/Conference
Artificial neural networkComputer networkWeightConvolutionAngleConvex hullGame theoryHacker (term)Artificial neural networkAuthorizationWebsiteOutline of industrial organizationWave packetSoftwareGame theoryRobotUltraviolet photoelectron spectroscopyNeuroinformatikPredictabilityPhase transitionConvolutionMultiplication signMachine learningBit rateSet (mathematics)AlgorithmComputer simulationVirtual machineNetwork topologyTraffic reportingSpring (hydrology)Video gameMoore's lawWhiteboardSelectivity (electronic)Graph coloringGraphics processing unitComputer animationLecture/Conference
NeuroinformatikArtificial neural networkNetwork topologyMoore's lawSoftwareAreaBit rateComputer simulationCore dumpSemiconductor memoryGraphics processing unitComputer animationLecture/Conference
QuicksortComputer programmingVideoconferencingOffice suiteBoss CorporationLevel (video gaming)NeuroinformatikQueue (abstract data type)Electronic mailing listEmailLattice (order)Point (geometry)AuthorizationFacebookRobotGoodness of fit
Correlation and dependenceMathematical optimizationVideo gameProjective planeElectric generatorVector potentialTraffic reportingGame theoryScripting languageInstallation artLecture/ConferenceComputer animation
Network topologyTime domainComputerTablet computerCompilerComputer fileLine (geometry)Computer programmingNeuroinformatikMultiplication signGame theoryDifferent (Kate Ryan album)Power (physics)RoutingSocial classCellular automatonLibrary (computing)Connectivity (graph theory)Process (computing)Image resolutionSquare numberComputer animation
Storage area networkVideoconferencing
Transcript: English(auto-generated)
Thank you everyone for showing up.
So, the year is 1997, the computer of IBM, Deep Blue, beats Garry Kasparov on the tournament conditions in, I think, a six or seven game series. This is the first time ever that a computer
has won against the premier human player, and Garry Kasparov is considered to be one of the best players of all time in chess. At around the same time, in computer go, we had a tournament from 1985 to 2000, which was called the World Cup,
with a total price of a million dollars if someone would be able to beat 10 to 14 year old Chinese children that were go-proteges in an even game. So it was not for the lack of trying. But in the same year, 1997, there was this tournament, and the winner of the tournament plays against these Chinese so-called inseis,
and the program won, but by a huge handicap, for those of you that know go, I think 11 handicap stones, which is pretty much the difference of a very amateur player and maybe an intermediate player, so to say. So let's dig a bit more into the current history,
in the more recent history of computer go, and also in why is go so hard to master as compared to chess, why are we so much behind? This doesn't work, okay. Yes, 2001, an anime by the name of Ikaru no Go
is launched in Japan, revitalizing go and making it more popular among the youth. It has been going downhill in Japan for some time before that. But overall, it's very popular in Eastern Asia, so there are professional players that get paid to play, you can watch it on TV and everything.
Years 2009, Google invents a language called go, saying, oh, there's a game that's like, what, 4,300 years old? Oh, we're just gonna name Steamroll right over that old board game. And of course, there's also another programming language by the name of go, and they're also just streamrolled right over that, because what the hell, with Google, we can do whatever we want.
Year 2014, on the right is say, Remi Colón, he's the programmer of a program called Crazy Stone, and it's one of the two strongest programs that we have in go, and he plays against a professional that's called Yoda, he's a ninth professional player,
and Crazy Stone won with a handicap of four stones, which is a lot less than the other handicap, but it's still a real big difference in playing strength. 31st of October, 2015, in the code-centric go challenge
between the program Zen, which is one of the two most strongest programs, played in an even game against one of the strongest amateur of the world, Franz Josef Dickhut, a German 5-10 player, the German, the human prevails in a five game, in a best of five series, feature one,
this is the last move played in that series. 3rd of November, 2015, Facebook publishes a blog post that's like, oh, we're doing all this cool machine learning stuff, and by the way, we wrote a go engine, it's awesome, and we don't really tell you what we're doing exactly, but pattern recognition, and their CTO did not answer
my Facebook message about what they're really doing, quite to my surprise, but we're gonna get a bit into that, and how strong that engine really is, for all that we know. 13th of November, 2015, this site hits Hacker News, it says, play Go against a deep neural network, and we'll also have a look at what's up with that,
what's the science behind this, we're gonna look at a lot of stuff that is scientific papers, but don't be afraid, it's all cool, and if you don't know Go, all cool, we're gonna go over the basics of Go, shortly, so, this talk is Beating Go Things to Power of Randomness, hi, I'm Toby, you can find me on Twitter, GitHub, and everywhere else, as Pragtop,
I'm out of Berlin, all the way here, I organize the Ruby User Group Berlin, in Berlin, the React User Group in Berlin, I'm a Reddit Go coach, I do open source, I do shoes, who knows shoes, like, that's whee, amazing stuff, yeah, and I work at a great company called Bikro,
which is an agency, so we help startups build things, it's lots of fun to work there, right now we're actually hiring, so if you want to relocate to Berlin, one of the best cities, just saying, and they paid for me to go here, so, pretty great. So, some of you might be concerned about my t-shirt, because it's like, oh god, it's Yoda,
like, would Yoda allow him to wear the t-shirt of Yoda? Is that physically okay for him? I can tell you, you know, there's a saying that Superman wears like a Chuck Norris t-shirt, so Yoda obviously wears a Toby t-shirt, so we're buddies, you know, we made that out, so everything cool there.
So, continuing with the basics of Go, and Go is played on a board with lines in the sections, and the normal size is a 19 by 19 board, then we also have 13 by 13, and nine by nine. Normal professional games are played on the 19 by 19 board, but the nine by nine board
is also still relevant and interesting. So, how does Go go? We start playing move, black starts, and we play on an intersection, so that is now a black move, then white moves, black, white, and whoa, too far, sorry. That's the basic thing. So, taking further from that,
a stone has so-called liberties. Those are the adjacent cutting points that are empty, so this stone has four liberties. If I play another stone, they form a group together, and they have six liberties. When white plays next to the black stone, both of these stones have three liberties,
because they take one liberty away from each other. So, if white continues to play, now I have two liberties, and in the end, three liberties, and now we're in a state which is called Atari, and Atari's company name was really chosen by that name of that Go term, and Go term and that term means before capture, sort of.
So, if white plays another move, that removes the last liberty from the black stone, and that means the black stone is captured and taken off the board, and counts as a point for white. Of course, black isn't stupid. Black knows how to play and can stretch out, so now black has three liberties again,
and it's all good. So, I told you right now that there is a point which white gains from the capture, so how do you win in the game of Go? The game of Go is basically about getting more territory than your opponent. You try to encircle territory and claim it for yourself, and then territory is then one point
of these intersections, and then at the end it is counted who made more territory, so it's basically a game of dividing. I try to get more of the cake than my enemy does. So, in this example, we would have encircled one point, so that would be one point for black. If we go further, white could now encircle
black, and now black is at one liberty, so we know what's probably gonna happen. White can capture, thereby making eight captures, so eight points, and of course this now also counts as white territory, so it's gonna be eight points of territory, so 16 points made here in total, which is pretty much a lot on a nine by nine board.
But is it an endless game of capture and recapture? No, because there's also something in the state that a group is alive, so we say, and this is a typical group that it's alive. It has two eyes, which means we have two points of territory here,
and white can simultaneously occupy both of those liberties, because if white would play in there, it would be immediately recaptured, the so-called suicide move, which is forbidden under Japanese rules, but not under Chinese rules. So this group cannot be captured, and in the end, all of them are alive.
So what does a game look like in the end? This is the end position of a game played between a professional player and one of the strongest Go engine, I think it was Zen, but I'm not quite sure, and so how do we count that? Well, we remove the so-called dead stones
that have no chance to live, and then count the territory. And this game was drawn, and if you see now, it doesn't look like a draw, like black has way more points. Why was the game a draw? Well, there's a concept called Komi, and which is a set of bonus points that white gets because black makes the first move.
And so this game was a draw. So, but what does a real game look like on the 19 by 19 board? This is the game, this is the first game between Franz Josef Dikut and Zen this year. And you usually start to occupy the corners first, and we can't go through the full game,
so I'll fast forward. And so we can sort of see how the game starts forming here like black claims sort of the lower right corner and the upper right corner. And there's also something very, like for me interesting going on that you cannot see the immediate value of in the lower left corner. Black starts a bit like sort of a wall
that faces upwards, and that is called influence. It is very difficult even for really good players to see like who is ahead here and who is not. And if you see the game commentary, there's lots of discussion about like who is in a better position at this place. So you can imagine how difficult it must be for a computer to assess the current situation.
The game moves on, we see more moves played, and even more moves played, and at this point I wanna focus your attention to the lower right corner. And this looks at the first look, if you wanna go, it looks like okay, black and white are battling it out out there. Unfortunately that is not at all the case.
That black group is practically that. It is still on the board, but it only has two liberties left right down there, and so it can be captured at any moment. A Go engine that would have to play and understand this would have to understand that this is not actually a not played out position, but those stones are actually lost.
If you fast forward to the end of the game, we can see that eventually white did capture that black group, and there are lots more, many much more complicated positions on the board that I would not be able to assess because those programs and the player played a much higher level than I do. Okay, so now we count this game,
and this game was the only game that then won, and that was by a margin of one and a half points. So when we talk about Go versus chess, what's the difference? In Go, you start building something. You start with an empty board, and you build. The stones don't move.
They might get captured, but you can't move them. You don't flip them. You build something. You have to plan strategically ahead, like where do I want to be? Where do I want to stand in my game? In chess, you go out and destroy, basically. Your goal is to destroy the other king, and so you get less and less of what you really have.
When I first looked into this, which was in 2010, also an interesting concept came up, the concept of the difference between what is complex and what is complicated. Go is very simple at its heart. There is a set of formalized rules that are just 10 rules that basically define the game of Go,
and I've taught you almost all of them by now, and still Go is very complex because there are many, many possibilities which results into many things that could happen, which makes Go very, very complex. On the other hand, chess is more complicated and complex. Chess has many rules. Which piece can move where?
Like then there's the rojada, and then the first move, like the pawns can move too, and so it's much more complicated of a game, and some of its complexity stems from it being so complicated. So this is a code by Edward Lasker, who is a chess, who was a chess grandmaster, and he discovered Go a bit later in his life,
and he said, while the Baroque rules of chess could only have been created by humans, the rules of Go are so elegant, organic, and rigorously logical that if intelligent life forms exist elsewhere in the universe, they almost certainly play Go.
So when we want to talk about how strong programs are, I'm gonna quickly introduce you to the ranking system used in Go. You consider the beginner when you're at 30 queue, which is basically you learn the rules and you played your first game, so that's 30 to 20 queue. A casual player, you can become after maybe a couple of months of play, or if you're really hardcore
then maybe in the first week or so, then you're between 19 queue and 10 queue. The intermediate amateur is from nine queue to one queue, and it's very different what people say, like if you look at a Go Wiki, they say, oh, you can be shit in a month. I don't think so. If you look at computer Go papers, they mostly say, oh well, it takes a player
years to reach that level, but I think they just say that to make their programs look stronger when they reach the queue level. And then there's also the advanced amateur and the professional. Now for a quick show of hands, who has already played Go, like who at least reached the beginner state?
Okay, at least reached casual player? Intermediate amateur? Okay, advanced amateur? Oh god, there are people here that are much better at Go than me. Like I... I didn't play, I don't play actively that much anymore, but I was around in eight queue, so a weak intermediate amateur.
It took me about one and a half years to go there despite like playing tournaments and online and everything. So you might talk to me while I said wrong about some of the positions later on. That would be nice, thank you. Why am I showing you this? It also gives you a hint of what the worth of those handicapped stones is, because in the amateur level, it is considered the difference
of one handicapped stones between the ranks. So when a nine queue plays against a five queue, the nine queue gets a handicap of four stones, which then looks like this. The stones are already placed, so black essentially gets four free moves.
So this is the handicap at which Tracey Stone won in 2014. This is the handicap of nine stones that is like the highest handicap that you give in official games at tournaments as far as I know. And this is the handicap at which
one of the strongest Go programs lost in 1998 to the strongest, to the Chinese and say the 10 to 15 year old children. This is a handicap of 13 stones. I played once at that handicap at the Blitz team tournament. We were around 10 queue, we played against the team of Dan players.
And this handicap is just unfair. So we bet the Dan level players, although making very bad mistakes. And we were the only team to beat the team of Dan players. They went on to win the tournament. So that shows you how unfair that handicap is. This is a handicap of 29 stones,
which is just, I don't know, if you just picked up Go and played on the 19 times, 19 board against me, you would most certainly win, most certainly. In 1998, a German five Dan player bet the strongest Go program at that handicap. And that was one year after Kasparov was defeated. So that's just meant to show you
how far the programs were behind. So why is Go so hard? What makes it that much more complicated and difficult than chess? So, at first we have a much, much larger board. Our board is 19 by 19, as opposed to just eight by eight for chess.
Then almost every move is legal. Like, as I said, I told you almost all the rules are left out of code rule, but basically you have to move inside the field, you can't move at a place where somebody else already is. And then you can't play co-moves and not the so-called suicide moves, and that's it. So that leads to a very high average branching factor.
What is an average branching factor? If you have, it's basically the average number of moves that are legal at any point in the game. So for Go we have 250 and chess just 35. We also have a very high state space complexity. State space complexity is the number of valid board positions
that exist, and I think some mathematicians calculated those, so for Go it's two to the power of 171 and chess 10 to the, did I say two? 10 to the power of 171, and for chess it's 10 to the power of 47. So those are huge numbers.
What does that even mean? The number of atoms in the observable universe is estimated to be around 10 to the power of 80, which sort of means that we could take each of these atoms and make a combination with also all of the other atoms and have that as a combination list, which would turn out to be 10 to the power of 160
possible combinations if my math doesn't fail me, and that would still be less than the number of valid Go positions. So at this point you might get a sense of the scale, how big that problem is. And also moves have a very global impact. You can't just go like, okay, I'm just gonna play locally here where my enemy played last.
That doesn't work due to multiple reasons. One is oftentimes it is the best move not to answer the move of your opponent, just play somewhere else because the move somewhere else is much, much bigger. And the other thing is that a move that I play down here in the bottom right can have severe influences on a position to the top left.
It might break things there and actually defeat, decide the state of a group. So let's finally, after we've gotten our interim to Go and why it is so hard, let's talk about artificial intelligence for a second here. This is an image from Wikipedia of the alpha beta pruning
or alpha beta search, which is one of the, which is the algorithm basically to play chess. Like Deep Blue used it, and I think modern chess programs still use it. It's that first search in that graph where you go to leave and then you have to evaluate the state of the board
at that point in time because you can't play all the way to the end because even for chess, that would be way too much to do. And then you always assume perfect play, which means that's the min-max part. In the first move, that's our move, we try to maximize the value of the evaluation function to say like, okay, we wanna have the best position for us.
But that's the first step. And afterwards, it's the minimize because it's our opponent's move. And of course, our opponent is clever and will try to minimize the overall value of the evaluation function because our opponent wants to be a good move and then we maximize and so on. So what's the problem here?
As I tried to show you before, it is very hard to write an evaluation function for the game of Go. If you remember this position, for instance, it's not very clear where territory will be of whom in the end and how much it will be. And there are other concepts involved like Moyo and influence that is like, okay, black has very much influence there
but white has more territory. That is something that you commonly hear in game commentary and it's not easy to decide what is worth more and especially not for computers. So, so far, we have been unable to find a good evaluation function. There's one Go program called GNU Go that uses alpha-beta search, alpha-beta pruning, but it plays around the level of 5q, 5-4q,
and current strong Go programs play about the strength of a 5-10 or 6-10 amateur player. So we came up with this great thing which is called the Monte Carlo method. So to tell you what the Monte Carlo method is, I want to ask you a question first.
What is pi? Like, what is the value of pi? 3.14 something, cool. So, how do you know that? How do you determine the value of pi? Come on, people. No one?
So it's some circumference, blah, blah, blah, math things. There is another way to do this. You can draw a, I think it's called union square on a piece of paper and then do a circle in it and then you throw random pins at it and then you count how many pins land inside the circle,
how many land outside of the circle, multiply that relation by four and then you get an estimation of pi, which is an example of the Monte Carlo method applied and you can see it here. And right at the top, I think it's cut off, but you can see with the number of simulations, with the number of more random dots that we throw in there,
our estimation of pi gets better and better. So this is the Monte Carlo method. It is, pi is not its best use case, but it's usually used in spaces where it's very hard to do a full simulation of something, so we just do a random simulation and in the end, we just assess what is the outcome and by running multiple random simulations
and then assessing the outcome, we can get an estimate or a guess of what the final value is. And in Go, that is basically our variation function. So this, like the Monte Carlo method was proposed before in 1993 in the paper, but it was not made to work.
It was not until the year of 2006 until the Monte Carlo tree search was developed and successfully applied to Go or was actually developed for Go and at that year, the program that implemented Monte Carlo tree search also won the Go tournament. So ever since, all Go programs that are successful
use Monte Carlo tree searches, so that was sort of a revolution and it's been rather recent. Alphabitta search has been around since 1957 or something. So this is the basics of the Monte Carlo tree search. We go through a selection phase first to select the note we want to play at, then we do an expansion,
then we do a random simulation and afterwards, we do a back propagation of our results. So what does it look like in detail? In this graph, we say each note has at first like the wins that have been at that note and then last, the visits, how many times have I visited that note and the notes represent the board state and then afterwards, it's like we play a move
and then the next note is like the board state that results when we play that move. So at first, we go to the selection phase, we select the note, we want to play it, we want to, okay, we want to see here what happens here, we want to gain more information here. Then we do an expansion
that says like okay, this is the move we're going to try here at that specific note, then we're going to do a random play out and you might be right now, you might be like okay, what random play out, like what are we doing? It seems like not very smart to do a random play out but the hypothesis is this, if I'm standing better at the board in general,
if I have a better position, if I do a random play out and I win in the end, like I'm more likely to win in the end if my position was better and if we do that lots and lots of times, we get a pretty clear view of is our position ultimately better or worse. So in the end, we do a back propagation
that says okay, we have one here, so we propagate that result back to the top and here, you got to be careful a bit if you remember the min max. So it's also a matter of perspective. So up here, the sort of darkish thing is like the black move and down here, it's the white move and of course, we can't do a back propagation
that when white has won, then of course, black has not won, black has lost and so you either have to take that into account during selection or how my GoBot does it and how I know other GoBots do it is you always alternate how you report the results. So you say okay, white won, then black lost, white won, black lost, white won, black lost
and so you always alternate the result that you report so that the selection is sort of uniform. So let's talk a bit about the selection because it's a crucial part so we could just take like every move all the time and play each move equally all of the time but that's not very smart
and since we're already doing like the random play out stuff, we at least want to be smart about the selection. And this is a so-called multi-armed bandit problem which is one of the most interesting problems I've come across there and it's that scenario you're sitting at a casino and you're playing at those bandits and you're playing at your machine and it's going good and you're like okay, I make a lot of money, that's cool.
But then you start to wonder like maybe one of these other machines yields better results. Maybe they give me more money and that's sort of the thing that you want to balance also in the tree. Like you can't always play at the game at the point where you're winning or where you think your winning percentage is best but there might be another node that might yield better results but you were just unlucky, unlucky, sorry, so far there.
So that is called exploitation versus exploration. Exploitation is like okay, I'm playing at the node with the best winning percentage and exploration is okay, is there something else out here to try and balance what we're doing? And for this, the standard solution that we use, it's the upper confidence bound or in Go it's UCT,
the upper confidence bound applied to tree search and this is the general formula. And this value is calculated for every node to then select which node we're gonna see. So we take the win percentage and then we add to that some sort of exploration factor that you can set to your liking
and there are multiple papers about like which is the best exploration factor, of course. And then you take the square root of the logarithm, the natural logarithm of the total visits divided by visits so that is sort of balance like if I haven't visited a node a lot, let's try and visit it again. So how does that work?
This is like now a real example that I took from my Go engine. Right now, at the top here, the number is just number of visits because not enough space. So anyway, first selection. We select this node because it has the best winning percentage of all the nodes that we have right now. So I cut out lots of other nodes to make it fit on the slide. But then what was interesting for me
when I looked again into this is on the second selection, at least for the exploration factor that I use, we do not play on the free of free node that we would say that would be the exploitation. It's a node that's going good, we're gonna play there again. But it's like, okay, this node has like a pretty good win percentage, 50%, we haven't visited a lot,
so let's try playing there again. Let's try what's happening when I play there. So at this point, you might ask like, what are we doing here? This is not at all like human-like. We're like, humans don't play like that. Humans don't go out and be like, okay, I start the game, so I'm just gonna play random moves in my head
and at the end I'm gonna count it perfectly and then I'm gonna decide on that, how I play. Humans look for patterns. They look for like how does the board look and what would I normally play and there's a lot of intuition involved a lot of times in human play. And I chose the picture because I wanna remind you
of how humans try to fly in, I don't know, the 16th century. We try to imitate birds. We try to have feathers and wings and fly and that didn't work out quite well. And what we do right now is we have these huge, like much bigger than birds, these huge metal cages with some wings
that sort of resemble them but not quite and helicopters are even more different and that works out better for us. Why does it work out better for us? Because we choose the strength of humans to build these mechanical things and we have our engines and everything. We use that and not what birds are good at. Similarly in Go, we don't do what we are good at
but we're doing what machines are good at. They're good at playing random moves and then in the end, counting the result perfectly and doing that lots and lots of times per second. So Go engines play, do like 50,000 of those playouts per second. Not mine because it's in Ruby but the others, you know. So what are the characteristics of this algorithm
or the Monte Carlo tree search? At first, it is aheuristic which means, I've talked to you about it and we're just playing random moves. We don't have to put any expert knowledge into it. So all that we need to know is how do I generate a valid random move which is basically what I teach any beginner
if I teach them Go. I tell them, okay, this is valid, you can do this but not that and we have to teach it who has won. We don't even really need to teach, well yeah, we have to teach them to score to figure out who has won but we're only interested in who has won in Go and for me, that sort of blew my mind because it means that I can be like a bad Go player
and I don't have to know a lot of Go. I don't have to encode it into the machine. The machine really plays by itself. It figures out what is the best move by itself by just teaching, okay, this is valid and then in the end, you can see, have I won, have I lost and that is really amazing because that way, we're also not limited by human knowledge
because the machine might at some time surpass us and think of moves that we could never think of and otherwise, we might limit it by our human knowledge or our human assumptions that are encoded into the game which means also that the Monte Carlo Tree Search is the go-to algorithm for general game play
which is a discipline that I did not know much about before but it's basically tournaments where before, you get like a rule set encoded in a file and then the engines have to play games that they never knew before and due to that characteristic of the Monte Carlo Tree Search that it does not need to know game-specific knowledge,
it's the go-to algorithm for general game playing as well. Also, the algorithm is any time. With the Alpha Beta Search, it's a depth-first search so I always have to search through the whole tree until I can make my decision on the best move. With the Monte Carlo Tree Search, I can stop it at any time and say like, okay, this is a good enough move. Like, I can play this move. It might be really bad but I can do that
which is good if you're doing time management and Monte Carlo Tree Search is lazy. This is my favorite paper about Monte Carlo so I have to tell you this. So imagine a game. This game is called Double Step. It's stupidly simple. In Double Step, you can make one move or two moves.
Oops, too far. Oops, one move or two moves and you win when you reach the end of that thing so it's pretty easy, you know. So as a human, you say, okay, two moves is always the best move, you know. So, yep, we win when we're at the end.
So what does a Monte Carlo Tree Search do here now? And so on the top here, we have the handicap. Handicap of minus two means like I'm two steps behind. Handicap of two means, okay, I get a handicap of two. I'm two steps ahead. On the left side, we have the number of simulations, of Monte Carlo simulations that I ran.
So eight random play-outs, 16 random play-outs, 32, 64, 100, et cetera. And what we have in the middle here is the percentage of that black for its first move selected two steps. And so I ran this 10,000 times each for each of those
and we see the winning percentage here. So we can see at the bottom that we're doing pretty good. If we do 100 play-outs, we're at 99.8% or even at 100% if it's an even game. And we can see many characteristics here. So for instance, especially with the lower play-outs, so let's look at the first line, the eight play-outs.
We can see that when we're behind by two, we only have a percentage of 86% to pick the optimal move because that is, Monte Carlo engines get sort of desperate. When they're behind by a lot, they often play nonsense moves, especially also in Go. They play moves that's like, okay, here I play that and then my opponent would have to not answer that move
for two or three turns in a row in order for that to yield any value because they get really desperate. But what's even more interesting is the number in the bottom right. As we get more handicap, as we have more, our moves get worse at least for the 100 play-outs. So we just take the best move 98% of the time.
And that's because most Monte Carlo engines just care about if they're winning or losing. They're not caring about by much how many points they win because that yields worse results as we showed shows. And you can often observe that in play when a Monte Carlo bot that you're playing against is ahead by a lot. It will just play very, very safe moves according to the motto better safe than sorry.
And by the end, the Go bot will win by maybe two points or something. You'll be like, oh, I almost bet that Go bot, yes. Not at all. The Go bot will say, oh, I've won either way. So I'm just gonna play these really safe moves because I don't care because I'm gonna win either way. So a move that sort of showcases this is the last move of the game
that Zen won against Francis of Dikut. In the commentary and also for every other World Tour plays, that last move it's marked with the X in the middle and also green highlighted now. Most human players consider that move unnecessary because it's pretty clear that middle is fully black territory.
There's some cutting stones but there's not much danger coming from them. Nevertheless, Zen still played there because why? Zen knew it was two and a half points ahead so wasting a move in the territory, it was still one and a half points ahead and it would still win. So it played better safe than sorry.
So what are some of the enhancements that we can do to play better in the Monte Carlo Tree Search algorithm? Well, my favorite enhancement is all moves as first and also has a variant which is called Rave,
Rapid Action Value Estimation. And the basic idea is this, if we play random moves all the way out all the time then does it really matter in which order we play them? And no, it really doesn't if the moves are pure random. So here in the back propagation phase when we propagate the result of the game back,
we also look at all of our sibling nodes and look at if any of those nodes is a move that we made at any point during the game and then we also update the win stats for that move. What the Rave algorithm then does is it says like okay, that's cool that we do that but it's not the most accurate thing in the end.
So it gives those extra moves that we update so the call of Armaf wins and Armaf wizards for all moves as first. It gives them a diminishing value. So at first they're taken into account a lot during the selection phase but when we reach like 2,000 Armaf wizards or something then it ignores those values
because we've set a border and just takes a look at the real winning percentage of moves that were really played at that specific node. Of course we can also encode expert knowledge into the Monte Carlo Tree Search and there are two phases where we can do that. First is the selection phase
so where we select at which node do we want to play, which node do we want to try out and therefore we can use patterns, we can use Atari solvers and we can also use Yuzekis or opening libraries and the second is the play out policy so our random simulation. We can also use patterns there to try and make the play outs not purely random
but more realistic to get a better value out of them and this is very dangerous especially if you use it with all moves at first which almost every engine does. I think every engine does, every good engine at least. Mine doesn't do yet but we're getting to that. It is dangerous because then it's not purely random anymore
and you could get like false positives and false negatives when you update the tree so you gotta be very cautious with that and it is through improvement times that if you do a heavy play out that biases or a selection or a play out that biases the tree too much your problem actually gets worse although you add knowledge
but nevertheless all strong programs these days do these so called heavy play outs and so they're only doing 5,000 play outs because the heavy play outs of course take more time to calculate. Oh by the way I forgot to say before in the match between Francis Likert and Zen, Zen played I think on a cluster that had 80 cores in total and like 60 gigabytes of RAM
so it's also a pretty high computing thing. So this gets us to that thing that I was mentioning that probably some of you saw at Hacker News. It's playing Go against a deep neural network and this site was done by the author of this paper
which they took a deep convoluted neural network. I don't really know much about like what that really is to be honest but they trained it with professional games and they managed a prediction rate of the professional moves on their training set that had a value of I think 44% in total so it's very good at predicting professional moves
and this paper, it was also referencing this paper. This paper's from 2014 and they did about the same thing with a 12 layer convolutional neural network and they managed a prediction rate of 51% in there as well and which is great. Like this approach has been tried before and it's only like this paper is to my knowledge
the first one that made it useful because before I think it was tried in 2007, 2009, was always deemed not useful at the time and it's a great success. It's one of the newest things in computer Go but these engines have a big problem. They only know what professional moves are and how professionals react.
So if you play moves like trick moves or very bad moves, sometimes they react really badly because they're not used to what you're doing there and so you can also get them into complicated fights and not like professional fights and they will play very, very badly and if you've seen the keynote on the first day by Karina, that's also what like all their machine learning
and their fuck ups were in those algorithms because then they fuck up badly and that is also reflected here so the strange varies badly between those. But what's really great about this is we can include this into Monte Carlo. So these are pretty strong but they still lose against strong Monte Carlo bots.
So against ForEgo which is about a one down strength bot, the bot from the other convolutional for bot lost 86% of the time so they're not better than Monte Carlo bots but we can incorporate the neural networks into our Monte Carlo bots
and we do that at the selection phase. So we don't have much computing power left because we're always using all our cores and everything to do the simulations and the tree updates and everything but we have a little nice thing in our computer that's a graphics card. So what we can do to incorporate that is to asynchronously push the computation of the recommendations of the neural network
to our graphics cards, let it compute those computations and then bring them back and incorporate them into our selection so that we try to play the moves that the network proposes to us first and for me that's kind of crazy. You use all of our cores, you use the graphics card, you use tons of memory and I hope you have a good cooler in there.
So this basically bumps the strength. I think Aya implemented it and just by adding convolutional neural networks it bumped its rating on KGS by one down from like three down to four down. So it's pretty cool and it's a cool area of research.
So that Facebook thing that I made sort of a buzz around we think we found the bot on KGS playing and it plays around the level of 4Q or something and in the video that they showed where they totally bet another engine they just bet CanoeGo at a very low level so it's not much to be excited about yet but interesting to see what they come up with
and if they at some point gonna share their findings with ComputerGo and the ComputerGo mailing list that would be nice of them. Oh yeah, if you're interested in this thing there's a very cool ComputerGo mailing list where all the authors of the popular programs are there and they share and you can ask and you sometimes see me asking silly questions or good questions depending on my mood.
So of course I wrote a Go engine in Ruby which is called RubyCon. You can find it at github.com, praktop, rubicon and you can also do gem install rubicon and then you can invoke rubicon to play a little game in the CLI. It just implements Monte Carlo research yet
there's still lots of performance optimization potential for me to do the move generation and the scoring first of all I'm onto it. I also have another project which is WebGo which of course looks beautiful which we implemented the whole thing in CoffeeScript and using WebWorkers to outsource the work
but I still have to do some work on that. Another thing that you might want to look at a much better engine is Michi. It's a Python engine that is in like 500 lines of Python and it is very readable, it's all just one file but it's like a minimalistic Go engine and I take lots of inspiration from there
and for those of you that are into Rust because Rust is a cool thing to be in, there's also a nice Rust engine who's also a German. So now in the end let's see how much time do I have left. Ooh, damn it. Okay what have I learned? Real quickly, so there's a huge difference
between making X faster and doing less of X and all the time we just try to make something really fast and not go the road of doing less of it. Programming Ruby, doing less of things costs nothing and so it's the route I rent lots of times. Modernizing small components, Monte Carlo Tree Search
is completely independent of the game of Go and so mine is as well. You can plug in any game into my Monte Carlo Tree Search library and it will play it if you have a facade class to get into there and benchmark everything and there's a huge difference between solving things
where we try to solve problems the human way and not the computer way which is what we should actually be doing most of the time as I try to show you with the Monte Carlo Tree Search and then of course I recommend everyone the jar of creation like seeing an engine that you built yourself and then just oh it made that move, that's cute or it made that move, that's actually pretty good and pretty fighting spirit. So thank you for talking to you,
thank you very much for listening to me ramble for so long this talk was beating Go thanks to the power friend in this, I'm Praktor, thanks.