The Joy of Simulation: for Fun and Profit
Formal Metadata
Title 
The Joy of Simulation: for Fun and Profit

Title of Series  
Part Number 
40

Number of Parts 
169

Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2016

Language 
English

Content Metadata
Subject Area  
Abstract 
Vincent Warmerdam  The Joy of Simulation: for Fun and Profit In this talk discusses some joyful exercises in simulation. I'll demonstrate it's usefulness but moreover I'll discuss the sheer joy. I'll discuss how to generate song lyrics, I'll discuss how to get better at casino games, how to avoid math, how to play monopoly or even how to invest in lego minifigures. No maths required; just a random number generator.  In this talk discusses some joyful exercises in simulation. I'll demonstrate it's usefulness but moreover I'll discuss the sheer joy you can experience. I'll go over the following points (the short list):  I'll show how you can avoid math by simulating; I'll calculate the probability that two people in the live room have the same birthday.  I'll show how simulation can help you get better at many games. I'll start with simple card games and with the game of roulette. Most prominently I'll discuss how to determine the value of buying an asset in the game of monopoly.  I'll demonstrate how you can simulate Red Hot Chilli Pepper lyrics. Or any other band. Or legalese.  I'll demonstrate the results of a scraping exercise which helped me to determine the value of investing in Lego Minifigures. Depending on the level of the audience I might also discuss how biased simulation can help you solve optimisation problems or even introduce bayesian statistics via sampling. I'll gladly leave this decision to the EuroPython committee.

00:00
Laptop
Simulation
Theory of relativity
Sampling (statistics)
Maxima and minima
Bit
Computer programming
Inference
Category of being
POKE
Computer animation
Core dump
Figurate number
Mathematical optimization
01:01
Turing test
Randomization
Turing test
View (database)
Inverse element
Bit
Inverse element
Neuroinformatik
Order (biology)
Computer animation
Lecture/Conference
Blog
Formal grammar
Website
Software testing
01:38
Metropolitan area network
Turing test
Random number generation
Line (geometry)
Mobile Web
Inverse element
Port scanner
Food energy
Entropy
Number
2 (number)
Subject indexing
Computer animation
Lecture/Conference
Software testing
Key (cryptography)
02:32
Web page
Turing test
Asynchronous Transfer Mode
Histogram
Randomization
Greatest element
Random number generation
1 (number)
Realtime operating system
Inverse element
Counting
Discrete element method
Area
Neuroinformatik
Data model
Inference
Robotics
Data acquisition
Pattern language
Musical ensemble
output
Predictability
Metropolitan area network
Turing test
Electric generator
Mapping
Real number
Code
Inverse element
Bit
Port scanner
Density of states
Markov chain
Computer animation
Endliche Modelltheorie
Software testing
Pattern language
03:46
Point (geometry)
Randomization
Electronic program guide
1 (number)
Counting
Host Identity Protocol
Variance
Neuroinformatik
Mach's principle
Estimator
Mathematics
Form (programming)
Predictability
Computer icon
Algorithm
Turing test
Forcing (mathematics)
Sampling (statistics)
Inverse element
Bit
Probability theory
Markov chain
Computer animation
Software testing
Pattern language
04:49
Process (computing)
Sample (statistics)
Computer animation
Mapping
Mathematical singularity
Sampling (statistics)
Computer
Bit
Entropy
Neuroinformatik
Task (computing)
05:20
Probability distribution
Histogram
Digital electronics
Characteristic polynomial
Knot
Set (mathematics)
Perspective (visual)
Likelihood function
Proper map
Event horizon
Rule of inference
Number
Inference
Mathematics
Negative number
Physical system
Mapping
Sampling (statistics)
Ext functor
Port scanner
Sample (statistics)
Event horizon
Computer animation
Sampling (music)
Physical system
06:25
Direction (geometry)
Point (geometry)
Sampling (statistics)
Knot
Time series
Bit
Inference
Mathematics
Computer animation
Blog
Blog
Quicksort
Task (computing)
Library (computing)
07:01
Inference
Service (economics)
Computer animation
Blog
Point (geometry)
Knot
Bit
Game theory
Bit
Task (computing)
07:37
Tesselation
Multiplication sign
Characteristic polynomial
Plastikkarte
Bit
Shape (magazine)
Rule of inference
Likelihood function
Web 2.0
Type theory
Mathematics
Computer animation
Energy level
Game theory
Whiteboard
Hydraulic jump
Oracle
Physical system
08:20
Point (geometry)
Computer animation
State diagram
Tesselation
Model checking
Multiplication sign
Formal grammar
Likelihood function
Arm
Number
09:07
Randomization
Computer animation
Line (geometry)
Workstation <Musikinstrument>
State of matter
Data mining
Bit
Whiteboard
Table (information)
Likelihood function
Spacetime
09:56
Point (geometry)
Curve
Sparse matrix
Computer animation
Computer file
Set (mathematics)
Pattern language
Game theory
Quicksort
Line (geometry)
Mereology
11:07
Point (geometry)
Mechanism design
Computer animation
Internetworking
Real number
Blog
Structural load
Physical law
1 (number)
Game theory
Quicksort
Physical system
12:15
Inference
Computer animation
Software
Strategy game
Line (geometry)
Blog
State of matter
Cuboid
Bit
Game theory
Figurate number
12:49
Point (geometry)
Computer animation
Website
Figurate number
13:28
Graph theory
Histogram
Randomization
Bootstrap aggregating
Computer animation
Distribution (mathematics)
Average
Computergenerated imagery
Moment (mathematics)
Average
14:00
Programmer (hardware)
Computer animation
Mapping
Average
Plotter
Multiplication sign
1 (number)
Set (mathematics)
Series (mathematics)
Figurate number
Mass
Port scanner
14:44
Multiplication sign
Set (mathematics)
Total S.A.
Line (geometry)
Number
Expected value
Optical disc drive
Mathematics
Arithmetic mean
Computer animation
Causality
Average
Moving average
Cycle (graph theory)
Quicksort
Buffer overflow
Buffer overflow
16:08
Inference
Computer animation
Lecture/Conference
Multiplication sign
Likelihood function
16:39
Inference
Simulation
Computer animation
Personal digital assistant
Cuboid
Set (mathematics)
Website
Mereology
Field (computer science)
Probability theory
17:11
Point (geometry)
Simulation
Mapping
Parameter (computer programming)
Port scanner
Valueadded network
Neuroinformatik
Goodness of fit
Computer animation
Personal digital assistant
Square number
Triangle
Square number
Mathematical optimization
Physical system
17:53
Point (geometry)
Axiom of choice
Area
Randomization
Functional (mathematics)
Port scanner
Dimensional analysis
Arithmetic mean
Computer animation
Triangle
Quicksort
Family
Square number
18:31
Area
Point (geometry)
Greatest element
Distribution (mathematics)
Computer animation
Average
Triangle
1 (number)
19:03
Point (geometry)
Metropolitan area network
Distribution (mathematics)
Estimator
Population density
Computer animation
Plotter
Sampling (statistics)
Triangle
Number
19:37
Distribution (mathematics)
Computer animation
Plotter
Triangle
Right angle
Figurate number
Quicksort
Cartesian coordinate system
20:13
Area
Histogram
Distribution (mathematics)
Distribution (mathematics)
Point (geometry)
Sampling (statistics)
Port scanner
Arm
Area
Sample (statistics)
Computer animation
Triangle
Ideal (ethics)
Quicksort
Form (programming)
20:46
Distribution (mathematics)
Parameter (computer programming)
Area
Inference
Latent heat
Natural number
Average
Term (mathematics)
Area
Histogram
Algorithm
Distribution (mathematics)
Optimization problem
Point (geometry)
Sampling (statistics)
Port scanner
Data mining
Data management
Sample (statistics)
Computer animation
Sampling (music)
Ideal (ethics)
Whiteboard
Quicksort
Game theory
Form (programming)
21:40
Histogram
Distribution (mathematics)
Point (geometry)
Projective plane
Moment (mathematics)
Sampling (statistics)
Area
Sample (statistics)
Computer animation
Personal digital assistant
Sampling (music)
Triangle
Quicksort
Physical system
Form (programming)
22:22
Randomization
Distribution (mathematics)
Multiplication sign
Sampling (statistics)
Ext functor
Bit
Bit
Trigonometric functions
User profile
Process (computing)
Computer animation
Term (mathematics)
Personal digital assistant
Profil (magazine)
23:01
Scripting language
Word
Lecture/Conference
Profil (magazine)
Structural load
Energy level
23:38
Service (economics)
Lecture/Conference
Protein
Row (database)
Library (computing)
24:11
Web page
Point (geometry)
Robot
Service (economics)
Presentation of a group
Dynamical system
Server (computing)
Link (knot theory)
Online help
Port scanner
Mereology
Sturm's theorem
POKE
Computer animation
Software repository
Robotics
Speech synthesis
Library (computing)
25:02
Trail
Token ring
Point (geometry)
Token ring
Term (mathematics)
Limit (category theory)
Sequence
Entire function
Valueadded network
Word
Data model
POKE
Word
Sample (statistics)
Computer animation
Endliche Modelltheorie
25:43
Probability distribution
Word
Data model
Distribution (mathematics)
Sample (statistics)
Trail
Computer animation
Lecture/Conference
Chain
Token ring
Sampling (statistics)
Endliche Modelltheorie
26:19
Asynchronous Transfer Mode
Greatest element
Token ring
Length
Direction (geometry)
Knot
Rule of inference
Mach's principle
POKE
Endliche Modelltheorie
Metropolitan area network
Beat (acoustics)
Distribution (mathematics)
Theory of relativity
Weight
Sampling (statistics)
Special unitary group
Binary file
Greatest element
Demoscene
Wave
Computer animation
Pi
Chain
Endliche Modelltheorie
Quicksort
28:35
Logical constant
Asynchronous Transfer Mode
Context awareness
Divisor
Token ring
Graph (mathematics)
Multiplication sign
Mathematical singularity
Water vapor
Open set
Formal language
Data model
Optical disc drive
POKE
Machine learning
Musical ensemble
Endliche Modelltheorie
Covering space
Domain name
Electric generator
Graph (mathematics)
Mapping
Sampling (statistics)
Instance (computer science)
Sequence
RotheVerfahren
Frame problem
Probability theory
Word
Computer animation
Network topology
Different (Kate Ryan album)
Endliche Modelltheorie
Quicksort
Musical ensemble
Library (computing)
32:12
Domain name
Focus (optics)
Distribution (mathematics)
Algorithm
Electric generator
Weight
Computergenerated imagery
Artificial neural network
Open set
Product (business)
Data model
Computer animation
Endliche Modelltheorie
Endliche Modelltheorie
Quicksort
32:47
Laptop
Asynchronous Transfer Mode
Graph (mathematics)
Divisor
Regulator gene
Code
Computergenerated imagery
Characteristic polynomial
Fitness function
Frame problem
Wave packet
Data model
Category of being
Computer animation
Different (Kate Ryan album)
Chain
Endliche Modelltheorie
Musical ensemble
Quicksort
33:41
Data model
Medical imaging
Asynchronous Transfer Mode
Arithmetic mean
Building
Computer animation
Token ring
Endliche Modelltheorie
Quicksort
Line (geometry)
34:35
Axiom of choice
Code
Sampling (statistics)
Set (mathematics)
Line (geometry)
Port scanner
Equivalence relation
Entire function
Formal language
Computer animation
Lecture/Conference
Order (biology)
Electric generator
Library (computing)
Physical system
35:24
Computer animation
Integrated development environment
Lecture/Conference
Game theory
Perspective (visual)
Formal language
36:01
Point (geometry)
Complex (psychology)
Trail
Token ring
Weight
Projective plane
Electronic mailing list
Time series
Bit
Lattice (order)
Mereology
Event horizon
Sequence
Computer programming
Crosscorrelation
Software
Personal digital assistant
Drum memory
Pattern language
Quicksort
Whiteboard
Musical ensemble
Endliche Modelltheorie
Library (computing)
38:01
Triangle
Dimensional analysis
38:49
Greedy algorithm
Algorithm
Exterior algebra
Computer animation
Lecture/Conference
Personal digital assistant
Randomized algorithm
Hill differential equation
Dimensional analysis
Field (computer science)
Mathematical optimization
39:24
Data model
Asynchronous Transfer Mode
Electric generator
Momentum
Computer animation
Lecture/Conference
Personal digital assistant
Token ring
Right angle
Extreme programming
Sequence
40:08
Point (geometry)
Data model
Asynchronous Transfer Mode
Group action
Computer animation
Bit rate
State of matter
Weight
Sampling (statistics)
Bit
Quicksort
Sequence
41:11
Red Hat
Computer animation
Energy level
Event horizon
00:00
check by Vincent when so everyone
00:10
thanks for having me over the US
00:12
legal status some about um that act against their about the joy of simulation homonyms Vincent come from Amsterdam over going to a new beginning there is a need to get together to laptops and others 1 website while you're doing
00:26
that Morgan discuss various programs is and what isn't that I'm going to explain you guys have sampling can actually used to you know do a bit of inference this nice I will then demonstrated couple experiments them with sampling and then explain to you how I derive some better tactics for monopoly using simulation I will explain to you how I found out that you can sell it goes on and that like many figures in the very core of the property that I could go on and talk about how sampling can be used as optimization tactics and I'll conclude by talking about how we can outsource creativity that by using sampling and then I'll talk about Symposium on related subjects
01:02
which somehow blends everything together so grammars before talking about what randomness this is going to be
01:08
sampling we should be sure that we understand what grammis this isn't because you know we're humans and
01:15
computers nowadays 10 view of a better understanding members and we so it's a good old please go to the website that I
01:21
just told you about ever going to go ahead and do a bit of an inverse Turing test so
01:28
this is the the website this to be my blogs and then this is 1 of a blog post called human entropy please go there right now agency website that's somewhat similar to this
01:39
and you could read it but the idea is we're gonna go ahead and try to generate random numbers so put your index finger on 1 of the index finger on 0 or use these 2 buttons you will notice that if you click the number will increase I just go ahead and do this and I'm just generate a bunch of random numbers and try to generate them as randomly as you can but let's generate about 100 and I'm going to go out and generate a few more energy than there is a couple more seconds in the experiment itself so
02:21
I almost got about 200 members we consider JavaScript's slowing down from the top to that 1 so it is generated by 2 numbers ideas you and said
02:32
OK this 0 0 1 1 1 0 0 0 1 1 1 0 let's see what I've if I scroll down what I will then C is also all these histograms of how often I think the 1 and how often a bit 0 but also often I picked a 0 after 0 and 0 after 1 etc. etc. etc. and what you know this is even on trying to become kind of random I'm trying to make as many ones as I am trying to make as many zeros that you will notice that I usually fall into this pattern where do 1 after 0 1 0 or 1 and is very normal as a human being to that feels random even though it totally is and this small whether you know you can read of the map you want it tries to do a real time prediction of what you're going to type in next and you can also track how often with the accuracies of just below the page that would be more of a realtime
03:21
thing so 0 0 1 on your original you that that the the mother animal and uh then you at the bottom you can see the probability of me being the humanists of me being a robot is quite low so you know I like this idea of human inference gets in the way this is an inverse Turing test by taking if you can actually generate random numbers I'm actually elicited you're definitely not a computer and so this is a
03:49
useful way I hopefully will quickly explain the guide how randomness works and how it doesn't work
03:54
and it is useful to have some form of randomness available to you but you as a human simply uninterpreted generating it therefore we're going to use a computer instead for the whole thing in the
04:06
playing with this to 0 0 1 this really 1 after the other you see
04:12
that there is this estimator tries to predict and what I'm going to generate next and you will see that at some point a probability of me getting in operating 1 this no forces someone and 0 probably switch but if I now switch my algorithms are moving and generate zeros now you will notice that there's a bit of confusion but after a while it has picked up my new pattern and the accuracy goes up again the only ones now you will be displayed with the same pattern the cool thing about this is because I have these laws of probabilities my disposal or use a little bit of math all these predictions but still useful to have a random sample so
04:52
I have a small demonstration hopefully it's obvious that you mentor be generally is quite terrible is where prefer use computers to help us think about probability and because we have a circular available to us because we can sample quite fast and we can kind of avoid doing a little bit of map and this you know matters
05:09
hard and even though but it is very useful we do like just get the job done so the goal of this talk is to convince you that you can do a lot of these tasks just by getting um sample and I guess the
05:23
easiest way to explain so the from all in perspective samples are useful and sometimes we can show we know the characteristics of a system that we want to know the likelihood of a certain event happens and again might be easier to use sampling set of maps the inference for us and simplest example I could come up with this suppose you have a lot of dice you roll the dice and then there is a probability that a certain number of ice proper and I could calculate that I mean you know this log probability
05:49
is applicable to just go and do is I can draw are uh with . histograms and this is the histogram about 1 guy is this histogram provide 2 guys this is required for you guys etc. and circuit is nice probability distribution of that I don't have to do any math for negative sample the nice thing about this is I can ask this thing to questions suppose a for guys what is the probability of getting a certain number of eyes some like also ask a different question I can also ask given this number of ice how likely is it that I have been rolling a dice I know the rules of the system I can describe them from
06:27
there like sample that means that I don't have to do math but I can still do inference on the so it's sort of like I look at it from this direction but most look at from this direction this is a powerful thing it is something of agents and the like a lot
06:42
introducing this kind sampling by the way uh please consider looking at this library called finds the 3 quarters other library called the and see the sampling methods for inference of very powerful so to theoretical for this topic the interested there's a very nice tutorial my blog which explains how you can use some sort of time series analysis with the sampling techniques as well as
07:01
service doing inference and dies particularly inference and but anyway
07:06
and let's consider a fun example of how actually got better at doing something because I had this computer available like due to the fact that do we know this game
07:16
yes do we also always play game during Christmas for you as a from my dad always makes me play this game during christmas and I absolutely despise again I don't see any joint whatsoever so I figured you know how about it might be fun I enjoyed playing the game I could at least enjoy being my dad so the idea would be can sampling a little bit to get better
07:37
at this game because if I think about it uh every tile on this board is worth something and I can calculate the expected value with only new with probability was that it actually such a time OK so math wise this would be a little bit hard to shift the foremost oracles of just do what I can say OK let's just for 10 thousand times I know the rules of the system and the rules of the game together from the Web was set 1 start here and just start rolling dice and use the rules of the game this shape for the long jump over the years as a lower level of certain types is a very interesting characteristic of this game because the likelihood of being around this corner of the board is quite high Because his go to jail so that is
08:18
in a simple way than including the cards but I didn't know what this
08:21
grammar that looks like this so at the xaxis you'll see I to the number of the tiles is at the time
08:28
of the 0 this is time number 3 9 and you can see the likely you
08:33
landing somewhere and you'll notice is all the spike in which coincides with the jails of the year 1 of the
08:40
most interesting thing that you'll notice is after jail that seems to be a slightly higher likelihood to be at 2 steps away from jail 4 steps away from jail 6 8 10 and 12 steps away from you and the reason for that is because it's unlikely that you get in dual you wanna get out it's really after all the same guys points this why is more likely to land 1 of these areas and if you know that beforehand you can change a tactical of for example and
09:08
I know there's a lot of randomness in the game but if I were to choose a station on this board it becomes it seems more relevant that I would take this station and that station and I were given the choice and
09:17
now I can calculate how much likelihood is directly land there right in the 1st station this is the 2nd 1 is the 3rd 1 you want 1 and you can see that is actually it's not think twice as likely but there's a bit of um by experimenting in have potentially use and again this
09:38
is something that for the spaces is stationed generates amount of revenue but you can scrape monopoly of comments on that and actually get the amount of money out and that you can get if you land on 1 of these places and this is an ice table which is sort of cool but and obviously you just go and plot this so for every tology combined you'll see
09:57
at points listed here uh this is the probability that you will land on sets file and is the rancid you can charge of someone lands on the pilots the size the point would be the expected value so the point is very big that means on average and after the game that the trial will generate more revenue from here and I think that you need we notice is there seem to be sort of an efficient ISO curve that everything on this imaginary line seems to be worthwhile called down at the bottom there's a couple of you know not really performing all sparsity combined can anyone guess which part
10:32
of problem that these problem of these that would be added to big dust so those would be these 2 Bosnia landing they're quite small but if you actually land boy are you in trouble and so the risk by its kind of thing it's very risky which approach has a lower probability of landing here soon because the go to thing happens for and economic drawing with the with the point but you see this pattern also if you bought a house there about a 2nd so I'm not
11:08
saying that I actually got much better at playing this game and the but uh I don't understand the game a whole lot better and I did didn't really measure how often I want and but I like the fact that just by using a load of sampling actually understand the game all of that this I know the mechanics of the system but then I I can know collect data by sampling and suddenly I understand the game a lot but it turns out this is also on the blog if you're interested in this blog posts during I can use for a while and the kind people on the Internet and point out all the weaknesses of and there was 1 guy and that's sort of the floor at your modelling always I think this is all all start with the obviously it doesn't encompassed everything in the game so 1 person in particular was very adamant and said look if you wanna when you should buy these places these are the ones you
11:55
want to have and the reason is there's this mechanic in the game that if you buy all house 1st no 1 else can buy houses to you're the person actually owns all the houses here no 1 can invest in houses on many of these laws and then suddenly do become more valuable and in week we could go a little further on we could go and you know going depth and how that would work and a colleague of
12:16
mine actually builds a bit of software around
12:19
this and you can send your own genetic box to play this game with a
12:22
certain strategy that talk to me on about this topic and this was definitely that I thought was fun and I understood the gain better turn out to be a nice blog posts and was a nice example of when sampling of something that's yeah but it also
12:36
this is where the crazy example but there's also some places where in practice I wanna make a living and I might actually make more money if
12:45
I just do the inference by sampling and the best example that I have bodies legal many figures
12:50
that are very familiar with Lego many figures a show of hands on the great so it was a smart company uh Legos at some point realize like a if we combine star wars and Legos they've got to collectors items in 1 and there will be more people will be willing to buy so collectors items you can make a lot of money and that is the original Lego
13:11
many the disease and like I mean think is a kind of like the king the boy no you open up the packet but you don't know which little many figures in there but there 16 and sets and after 2 months of so they never going to produce that again I OK that sounds interesting and would there be a market for this so we go to this 2nd hand website
13:29
newspaper little data and you make a small little histogram you wanna try to see if you supposedly we invest in of many things at the moment it will be profitable suddenly this is again you get out which isn't that pretty so I don't really have an impression of what the
13:45
average price might be there seem to be a lot around here but there's a thick tale of there as well so that we get a good impression what the average might be so although the bootstrapping say that I have 60 these prices and it's going to graph theory of random and calculate the average repeated over and over
14:00
and over again more smooth curves like in visually interpret plot with that and turns out if I this is for the Simpsons there many figures that I was looking at these are for the other ones at the time the centers like many Figure 1 was the most recent 1 and I can imagine that as the series gets older it'll be worth more so that this might be a good habit to look at and you should always look at averages but this seems reliable enough so these are the figures
14:28
that I can buy a little mini figure for 3 years of peace and I can still awful sense for 100 euros later how likely is it to get the full set ignorant I figured I do this with mass and when I say I am going to use of matter the programmers where map so you go to
14:44
math overflow and said the question and they get an answer back what is even more complicated turns out that this thing called Sterling said numbers which is no insurance in this in theory and solve the problem so instead of looking at math or even cycle for the comes to the rescue I should probably just in our simulated so it
15:03
it you see the number of packets uh that I would buy in the line itself shows you know the probability of getting a full set but the bell that line shows the expected number of total sets that I would have but obviously if I buy 100 packets you the odds of getting at least 1 full satisfied great but I might actually 2 sets so rather likely and again whenever doing these sorts of things is always nice to visualize once in a while cause if you visualize something they can get surprised is that when I was looking at this and it sort of made wonder G. if I have 1 said and I start clicking even more from the 1st set I collected probably have some spare legal meaning which I could probably used to make sure that the 2nd set that by is actually easier to collect so I think that thought that I similar is over and over and over again and the blue line you see here is again the number of packets there's been a number of sets and I would have this shows the average amount of time that you might need to get 1 that theoretically you need
16:09
about 50 genes alright but the time it takes here is much more than the time it takes
16:15
here which in turn is more than the time it takes here which in turn more than 1 this here so the moral like I the higher the likelihood is a lot more sense and again uh this seems intuitive but it's because I've done the inference that was able to figure this out so again sampling is very useful and in Amsterdam often
16:39
give this course in probability theory so this seemed like a nice example and and the given this cost you a bunch of bankers and in the latter part of the afternoon we have we opened 1 of these boxes to see the inference was correct and it turns out if you open the box up and you will always
16:54
have 3 sets even randomly distributed was still from quality sites so that means that the
17:04
obvious use case we think of simulation use simulation because 1 of the probability theory but it's actually other fields and just probability theory that can now nice benefit from
17:13
doing a good simulation exercise so this
17:15
talk about more general use cases let's talk about optimization in general and give the idea of how it works and going to give a slightly
17:21
silly example so this is an example where we know the correct answer beforehand but suppose we have a 1 by 1 square and we want to find the largest triangle in this 1 by 1 squared and again this is a very silly example we know what the largest triangle is just now making actual you're done but let's say this is a system that we want to optimize the computer has no notion of what the best parameters are registered in finding 3 points and the area in between these 3 point has to be the biggest as only thing that's known to be computed this nominee map so what's
17:55
a person can do well how about we just you know generate a whole bunch of random triangles and pick the best 1 see how good that just
18:05
wandering in from the cold that sort of means you just generate a whole bunch of random values in 6 dimensions x 1 x 2 x 3 y 1 y 2 y 3 those are the 3 points and then this function called a choice function which then calculates what the area between all these points and you do that for a whole bunch of things brother itself and you know the largest triangle the 0 . 5 and the probability of actually sampling a
18:32
triangle from this region is rather low and this doesn't surprise so what can we possibly do well
18:39
what I could do I could say gee um I have an area and I have all of these x coordinates how about I throw away all the bad triangles that but I can I can take the average of all the triangle sizes and is there a way that the happens at the bottom I could keep the ones that are employed and instead of looking at the area how held the look at the distribution of the points there
19:05
is a distribution there and as luck would have it and if you have a sample size is rather big what you can do is you can give I could learn this I could learn a nice density estimator so that's what you visualized here this is the density plots for
19:19
the points performing well and to just sort of assuming that these
19:24
are this X 1 and X 2 and X 3 and it seems that if and when it gets to its triangle the idea that x 1 has to be alone number or a high number and if X 1 is the low number the next to has to be a alone number as well or a high number
19:38
this coincides with my belief would be triangle should be used this sort of nice but only if they do this not only can I maybe get a sampling technique to give me better triangles but I can understand the problem of the better as well just like a monopoly in just like the Lego example that is what the X and Y
19:57
distributions look like again the figure
19:59
plots this is just the x is 0 otherwise the z axis and the whites
20:02
together and when I see sort of makes sense I like it if my X is smaller than what was that makes complete sense of that right so I
20:14
have a distribution of sample from this 1 so the idea is on and
20:18
I've selected areas of big winner so and sample from that distribution instead this is what
20:24
we had before and if I do a new sample that I that I just learned all suffer from this distribution
20:29
and again would you will notice it's sort of like the year uh and then I'm gonna sample larger triangles so how about I just repeat this example of a sample of a sample of examples and the and the nice
20:42
thing is we can repeat the same idea some sort of convergence pops note that the mathematically
20:47
I've said they were going to select the areas in a larger than some some sample some M and you
20:52
could take be the average you could pick whatever other metric political bonuses that from the you like this that use inference on a simulated data to learn more about the nature of optimization problems I have a familiar of genetic algorithms you may notice that they actually work in quite similar fashion except here I'm trying to look at the distribution of the parameters resident algorithm use a similar tactic to do a proper search the entire grid to be interested in learning more about this and there is this colleague of mine he's sitting there we'll talk about this sort of thing more in detail later this week at a specific we'll talk about how you can win the board game Risk for how to conquer the world has the final say it's Thursday at 12 o'clock and term room and I'll talk more about genetic out general manager applies to many many things is what probably have noticed
21:40
as well this is triangles but I can sample anything it's continues on discrete is just a sampling exercise which means have a very flexible way of optimizing any system as not the same that I always find
21:52
the best solution is a proper way to do a form of search and
21:59
hopefully by now I've convinced you that sampling is indeed useful and it can be a little bit surprising and its use cases was going to talk about now is sort of my hobby project for the next
22:08
year I think and it's the thing I'm most interested in that moment and those generative methods and the thing I like about them is they should allow you to outsource creativity and entropy has a small role to play in this so so for this
22:22
reason the static and in the next little introduces the more Markovian way to think about randomness so far I've said this distribution given a sample in the sample and it wasn't really the case that the sample it I just got the terms a little bit in the next step
22:39
and this is literally what is listed all my LinkedIn profile and if you please spend a little time actually reading it some of you may understand the job it seems surprisingly relevant few people people with with what you
23:06
couple people understand the joke so if from and you wanna write down what what's the stuff you do so I got Popeye and script and so the key word being a little place and with this you know nice but to get these nasty recruiters on your profile which which tend not like so I
23:23
figured I'd be fun to just add a couple Pokémon in there as well so there is no load lowtax level as actually think and text gets are so again the main the main reason is I like to give you this on the mean promised in the that there's nothing more
23:39
fun recruiters says they would you come to work for my corporate bank and you can and say well you guys you spot it you and applied and and recruiters says yes obviously all signal to the recorder in question would be entirely wrong reddening occurs the terrible classifiers for protein 1
23:57
and recruiters can really extinguisher put my name presenting of technology so I figured making a Python library that can generate putting my name's might actually be fun and so the libraries called gravel it's totally not done yet but the idea is to have techniques for service and before you start thinking gee that's ridiculous Vincent
24:13
and speaking of Pokémon segmenters is again repo
24:17
turns out a lot of people have useful tumor names for their original get help projects and there's a link in the presentations if you look at the fine but the latter part is actually a promissory begin technology but it's a templatebased robot dynamics library and there's this little web page that for every 750 Pokémon tells you link to get out of with no improvement package
24:41
practices the reason I wanna a global is I've always been a user of libraries that are really with myself from the problem seemed interesting enough uh almost like you learn from doing this so of how we would like to about some some called the Gnostic painters servers or something and but so the idea is to generate names that sound like poking and to put it bluntly the whole point of gravel
25:04
was to come up with name and then sad
25:09
thing about OK so I have this knowledge in a Pokémon entity with interesting problems um but it involves generating a believable sequence of tokens and you think about it I could do this a Pokémon names but there are many other things I could do it well like reduction couple limits or ikea furniture names or notes on the piano um so the simplest model that you could possibly think of say OK suppose I have some tell all this is independent of its a letter in the word or if it's a word in a sentence or of its amino in entire arrangement or if it's
25:44
um and you know it's the sound in a few furniture um but the idea is once I know the previous token I might have some other probability distribution for the next Markovian thinking is all about depending on the state of and now I will sample from a different distribution for my next token
26:03
and this is the simplest model um if I see the letter A. beforehand uh probably the chance of seeing that evolves the somewhat smaller as constant might be somewhat larger and but you can also do it just
26:15
for the last over the coming before the use of generating a Markov chain into gonna learn then you gonna
26:20
try what and the basic way of doing is Cervantes Markov chain it I can look forward maybe I look back at that as well so if i'm generating the 3rd tokenize should look at the 2nd token and the first one so it's a Markov chain of length 2 in sense um but if you really want to this is just a way of thinking about it uh and if I know t 1 and this solution for T set this of the sampling rule if you will but who says it has to be ordered in 1 direction I can also model this in a way that you can model it from 2 directions if you think about it the uh and it seems fine to start with the 1st letter of the Pokémon name because is like prior belief that you could start with a certain letter but is probably a different distribution for the last letter so it doesn't seem entirely and sensible that document probably don't end with a letter I for example so maybe we should have a Markov chain that goes 1 way for the Markov chain that goes both ways in it seems very sensible and sort of the same probabilistic model so I'm playing around the so does look like if you preview your sample came on and I came up with a real clean cool racist who really got you tail recently until all my elected as the scene rather possible gratitude that results this hilarious uh can you believe hold me please by the way I wonder what the wave man YTD screening and the there's been when I was fortunate I know you must've in fact this year in the sun and the bottom dollar foxhole of applying your house not the spin further like but you can do do this and this the sort of the the lovely joy of doing things like making these sorts of models isn't that this is accurate what is somehow actually again I somehow do
28:05
believe that this came from the culture that and let's say not or something which just sort of noticed is that the corporations in the corpus of relative ability somewhat limited so if I say by the way this obviously always gonna be together with the weight when I think of switching between 1 of the songs so the corpus does have an influence with this sort of works in this exciting sending for Ikea furniture this and I know what you guys but I would definitely like to have debate couch already Europes thing you go
28:39
about but the problem is the reason why this happened was I was talking to my girlfriend about this problem of and she was obviously not super impressed uh really understand time on this instead of making cooking my name's how about we make ikea furniture names this axle an idea anyway uh um the thing is the we have so a nice there's a lot of these are actually quite long and these were a lot of samples that didn't really make and so I really think a is a terrible name for furniture or Pokémon and they all seem to be order and and the answer to that use of the so this is still work to be done and explain on thing about maybe tackling the problem instance of violence from this thing about this and in the Machine Learning Tree very normal to make ensembles of models and so he's making 1 model of ultimate 10 of them and combined in some way it seems sensible that if this model work somehow with this 1 works also evidenced by the way we should be able to combine them in the laws of probability certainly allow for this so that seems like a thing I can do to make this better and I think that I could do was my library has this notion of a lexicon we sort of like a data frame model the model that is having a dataframe need some sort of a bag for sequences of tokens what I could also do is say hey yeah covers lexicon of all the Pokémon names train the covers lexicon of the English language maybe on both the monarchs and sounding and I could go out and combine the 2 and then these are the 2 models that are just 2 different models but also trained on different data but they can actually you know be glued together maybe get sequences and it's a very fun experiment my in my mind to generate Pokémon names of some French or put money into that some German and like also do was maybe do something that transcribers on user the put names it seems very obvious that instead of just focusing on this letter came after that letter I might actually give some domain knowledge and say this letter it happens to be a constant and the odds of getting the constants after each other is rather low so then I would translate uh so that's a mapping uh the lexicon something else into another model of coincides this seems like a viable tactic is you can sort the lecture domain knowledge among which can
30:52
also do is you try that judges um how about generate 100 samples on this 1 was based on this 1 model and that is the model that would judge this model to fund hundred samples I will sort them all on the take the top 10 also seems like an appropriate way of thinking about that because I think more this is the sort of hardcore but can also think about it not in a summer cold in way but they need to generate a factor graph instead this has some nice benefits and even we can even take these Levenstein each approach where I started with a Pokémon name and 1 by 1 randomly I change a letter also way to generate something it's the the sun possible and the nice thing about doing it this way is I not necessarily limited to just probability theory but I can use jensen's water that cover them here and then you can sort of wonder hey uh what a word is in the sentence there some context and what the letter and it's like a opening sequence most of the interesting way in which you have like talking to that because of severe these deep models which is fashionable nowadays and the way that workers you would say either starts tokens and absent prior beliefs to generate that then sample all the way down to a season should stop token that this is nice but it has some sort of have some features that I would like to have which is this way of thinking of necessarily support and suppose I wanna generate Pokémon name with 6 letters and I know the 2nd 1 and 6 1 pomegranate
32:13
generate the this is the so this is what makes the problem hard too much you're going to give support to this but if you are deep learning specialist I think you are you know the solution to this we can talk to you about here what you think about it so I'll focus on the following domains
32:28
for was a graphical models seem alright heuristic approach indeed learning seems right it's very interesting to see how the people converging is that nowadays you can use a neural net as a generative algorithm which takes some sort of Gaussian and and generate some sort of generative distribution out this from the opening I hate the approximately what we have research on product like this and the roughly the
32:49
planets so the main lesson ahead of designing this instead just writing
32:52
code in a notebook it seems like a better idea before you write any actual code um you should wonder what does the U I look like I'm going to be the user angrily using is a lot of different lexicons of the models of the train was the best way for me as a user to play around with this so didn't you just copy so I can learn um the there's some notion of a data frame and is scholar lexicon there are these different models to assess factor graph to a Markov chain etc. This is what defines the properties of the model regulator fit that on a separate lexicon and once those models are trained I can generate from them and the hopeful ideas I can somehow make the ensemble based on 2 different models maybe even given some waiting and then use and sort of the the outcome of this was some sort of judging characteristic of model simply writing this down for myself personally been working at this middle of clear for me to
33:43
know what I should building how should we design and hopefully made by the end
33:49
of next year high come back and talk about a model that can make Pokémon names where you say hey I don't know this token and another 1 and that 1 by 1 the tokens and with base especially in the market for making at the Technion very useful and another thing is in sense
34:07
of this is the sort of the deep neural meaning
34:09
dream of building something that works tokens for art as well
34:13
um this is actually something that doesn't use any entropy but this is something that you can generate fairly easily in just a couple of lines and JavaScript that you're interested in there is going to come on friday to my then the talk will make pretty image is sort of like this but in 3 D that the goal of the talk will just simple human if you interested in talking to build pretty stuff that looks like this so concluding
34:38
essentially can be a lot of fun and sometimes actually profitable Lego example Getting Started super easy manage be surprising how often it can help you out so again people don't necessarily always understand finds 3 because it's a little less Streep fought on the side which are very flexible you can describe the system a sample where you can actually solve a lot of problems python is a great usecase language for this it's actually surprisingly easy to be very flexible just a couple of lines of code um and also quite fast as 1 of the great and and it never considering thing about API is something equivalent from our community and try to optimize for choice you're going to be using the library maybe and like the it's
35:16
better to spend an entire day things be in the i would like to do you most want to start using set of having amounting to climb in order to come
35:25
from the language we use perspective just fall into the France was the idea that take any
35:41
questions but anything poking unrelated that's not about the game promoters talking environments and really since no
35:57
question born president that thing the
36:03
lecture on the on the on the also if you done something similar to this also come talk to me this is sort of the project but I'd be happy to hear someone other than to points in similar I thinking is that this beautiful little bit yesterday and I question is how much of the program question but if you had a simulated 10 network of cues hence the human you know and how would you do that so there's too few
36:37
selected each other citizen network can maybe this like yeah that so the use case for me is definitely a problem right because it during a meeting track portal talking and then this is the point of a little there main gist of what I'm trying to get to at some point is this is using the music of music would be more difficult because then let's say you have a base that's playing of drums that's playing and then you have the melody for example when you have 3 sequences actually have to correlate somehow um but in this case don't really fit into this model and what you could do then is uh there's a couple of libraries have a bit of support for this but you can have going to probabilistic graphical modeling in the weights and the trick is events say that's just to try to you this 1 trick we can say these are the 3 tokens that I can see as 1 separate token and that will go into this 1 lists and I'll try to generate that uh and this can be you might need a lot of data before you actually get the pattern right little expensive unless the other ways not having the complexity in the data but that's been put into the model where say these are 3 time series and somehow correlated learned that and you could do that with sampling approach by 3 are some examples of that then you get sort of correlated time series land and I defined like a whiteboard I get more easily explained it used come to me upwards of that something but it is sort of way part the
38:01
anyone else yes not because of the
38:27
presence and correctly from wrong but I believe the question is hey a this is a nice talk but um obviously when you have more dimensions to be harder right so for the triangle example is reasonably easy can only 6 dimensions and the right uh and so the dimensionality is always initially and believe of the graphic pictures so we suppose we had
38:50
this but in the 12 thousand dimensions obviously it's a whole lot harder and what I would be more to same year is considered as as a approach it actually has a couple of years we use cases and you can actually solve something that especially in optimization field where you have a lot of hills and many many
39:05
dimensions and the reason why we used genetic algorithms is best because the best thing we have the necessary because that's the best thing we could have when you go into random algorithms when you do that because uh and there's no
39:18
alternative this greedy methods for those of them yeah so I didn't
39:25
quite get was why couldn't do like analyst to so there's the thing with the STM is usually you can say hey generate lots lots of sequences that in issue I sort of have with this 1 use case so suppose that you say I wanna have a Pokémon starts with an H then the name
39:42
and then after 3 tokens comes the the ensuing place yes no and I don't want to finish
39:50
sequences but did you like just like you would want to portable so this of the Wilkinson problem with like judges just like just like a bunch of people who will nuisances welcomes of now and and and I have been the thing is um suppose I extremal momentum right so
40:09
but when I put my name learning Learning internal states that the usually I would generate a new sequence would be to say start go and then at some point in sample and something that is done that has a Pokémon knowledge as I could say generate all of them
40:24
until at some point you have a proton in the start of the H then a then freedom and whatever and then the but that feels like over sampling a bit of this and happened this is the 1st and so that this is more of an open problem this is very fun to good academic conferences and ask this to professors that and began this should be a sort of more generic so of 1 neural weight also approach this problem and this might be a generative thing from opening i we actually at the end of the yourself like worthwhile venture and this isn't never really heard anyone say here's an obvious solution to this problem is and isn't like if it's not the most important problem in the world by the rate of occurrence and so I'm not going to angry about the like it is given the group
41:12
also the level of of events of