Probabilistic Programming in Python

Video in TIB AV-Portal: Probabilistic Programming in Python

Formal Metadata

Probabilistic Programming in Python
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place

Content Metadata

Subject Area
Thomas Wiecki - Probabilistic Programming in Python Probabilistic Programming allows flexible specification of statistical models to gain insight from data. Estimation of best fitting parameter values, as well as uncertainty in these estimations, can be automated by sampling algorithms like Markov chain Monte Carlo (MCMC). The high interpretability and flexibility of this approach has lead to a huge paradigm shift in scientific fields ranging from Cognitive Science to Data Science and Quantitative Finance. PyMC3 is a new Python module that features next generation sampling algorithms and an intuitive model specification syntax. The whole code base is written in pure Python and Just-in-time compiled via Theano for speed. In this talk I will provide an intuitive introduction to Bayesian statistics and how probabilistic models can be specified and estimated using PyMC3.
Keywords EuroPython Conference EP 2014 EuroPython 2014
Bayes-Entscheidungstheorie Mathematical model Algorithm Decision theory Multiplication sign Decision theory Web browser Computer programming Web browser Bookmark (World Wide Web) Wave packet Strategy game Computer programming Computing platform Right angle Computing platform World Wide Web Consortium
Statistics Probability theory Well-formed formula Computer programming Energy level Bit Computer programming
Group action Variety (linguistics) Virtual machine Bit rate Black box Mathematical model Mathematical model Wave packet Measurement Inference Machine learning Bit rate Term (mathematics) Computer programming Cuboid Contrast (vision) Data conversion Nichtlineares Gleichungssystem Website Predictability Source code Mathematical model Inference engine Group action Computer programming Measurement Open set Virtual machine Single-precision floating-point format Type theory Prediction Personal digital assistant Inference Revision control Website Speech synthesis Software testing Data conversion Right angle Cuboid
Point (geometry) Statistics Real number 1 (number) Maxima and minima Bit rate Parameter (computer programming) Disk read-and-write head Event horizon Mathematical model Arm Measurement Revision control Causality Arithmetic mean Website Algorithm Letterpress printing Group action Type theory Proof theory Process (computing) Sample (statistics) Maximum likelihood Estimation Revision control Software testing Data conversion Right angle Mathematician Resultant Booting
Beat (acoustics) Random number Confidence interval Infinity Letterpress printing Event horizon Sequence Number Degree (graph theory) Estimator Arithmetic mean Sample (statistics) Personal digital assistant Right angle Pattern language Series (mathematics)
Statistical hypothesis testing Centralizer and normalizer Personal digital assistant State of matter Different (Kate Ryan album) Software testing Software testing Mereology
Point (geometry) Area Statistical hypothesis testing Multiplication sign Binary code Letterpress printing Statistics Event horizon Variable (mathematics) Process (computing) Different (Kate Ryan album) Personal digital assistant Uniform resource name Software testing Right angle Software testing Procedural programming
Statistical hypothesis testing Group action Confidence interval State of matter INTEGRAL Decision theory Multiplication sign Set (mathematics) Parameter (computer programming) Mathematical model Variable (mathematics) Medical imaging Thomas Bayes Estimator Single-precision floating-point format Personal digital assistant Core dump Physical system Metropolitan area network Algorithm Constraint (mathematics) Closed set Sampling (statistics) Infinity Bit Statistics Measurement Well-formed formula Sample (statistics) Chain Phase transition Website Right angle Point (geometry) Probability distribution Laptop Statistics Beat (acoustics) Clique-width Algorithm Mathematical model Hypothesis Centralizer and normalizer Well-formed formula Computer programming Posterior probability Condition number Random variable Distribution (mathematics) Mathematical model Core dump Database Line (geometry) Computer programming Markov chain Logic Personal digital assistant Posterior probability Video game Cuboid
Histogram Distribution (mathematics) Closed set Multiplication sign Sampling (statistics) Mathematical model Markov chain Monte Carlo Personal digital assistant Computer programming Posterior probability Right angle Software framework Posterior probability Form (programming)
Point (geometry) Complex (psychology) Algorithm Graph (mathematics) Information Algorithm Distribution (mathematics) Multiplication sign Keyboard shortcut Gradient Sampling (statistics) Computer programming Mathematical model Mathematical model Mathematical model Software framework Cuboid Alpha (investment) Social class
Point (geometry) Algorithm Distribution (mathematics) Code Multiplication sign 1 (number) Bit error rate Similarity (geometry) Mereology Inversion (music) Mathematical model Mathematical model Formal language Number Thomas Bayes Goodness of fit Causality Term (mathematics) Computer programming Authorization Software framework Random variable Standard deviation Algorithm Dependent and independent variables Distribution (mathematics) Electric generator Fitness function Infinity Bit Computer programming Compiler Mathematical model Causality Inference Software framework Right angle Bernoulli number Arithmetic progression Alpha (investment)
Context awareness Beta function Code Multiplication sign Bit error rate Parameter (computer programming) Special unitary group Mathematical model Thomas Bayes Bit rate Data conversion Beta function Sampling (statistics) Amsterdam Ordnance Datum Bit Maxima and minima Mereology Statistics Inflection point Uniform resource name Inference Personal area network Right angle Bernoulli number Landau theory Alpha (investment) Probability distribution Functional (mathematics) Link (knot theory) Algorithm Perturbation theory Inversion (music) Mathematical model Number Thetafunktion Sic Causality Well-formed formula Program slicing Computer multitasking Data structure Posterior probability Random variable Distribution (mathematics) Mathematical model Ultraviolet photoelectron spectroscopy Causality Algebra Synchronization Object (grammar)
Manufacturing execution system Inheritance (object-oriented programming) Quantum state Algorithm Mathematical singularity Data dictionary Special unitary group 2 (number) Thetafunktion Structured programming Summierbarkeit Posterior probability Modal logic Beta function Algorithm Point (geometry) State of matter Sampling (statistics) Density of states Causality Uniform resource name Synchronization Inference Right angle Bernoulli number Alpha (investment)
Statistical hypothesis testing Functional (mathematics) Statistics Histogram Group action Algorithm Confidence interval Plotter Cellular automaton 1 (number) Combinational logic Similarity (geometry) Perturbation theory Shape (magazine) Mathematical model Number Local Group Estimator Markov chain Monte Carlo Radio-frequency identification Average Different (Kate Ryan album) Energy level Posterior probability Algorithm Distribution (mathematics) Mathematical model Mathematical analysis Sampling (statistics) Parameter (computer programming) Ultraviolet photoelectron spectroscopy Letterpress printing Hypothesis Mathematical model Hierarchy Category of being Arithmetic mean Fermat's Last Theorem Exterior algebra Integrated development environment Personal digital assistant Software testing Right angle Figurate number Computer-assisted translation Library (computing)
Algorithm Group action Distribution (mathematics) Mathematical model Multiplication sign Parameter (computer programming) Mathematical model Mathematical model Hierarchy Category of being Estimator Structured programming Hierarchy Right angle Software testing Task (computing) Random variable
Axiom of choice Group action Beta function State of matter Confidence interval Plotter Shape (magazine) Parameter (computer programming) Special unitary group Variable (mathematics) Formal language Array data structure Estimator Bit rate Gammaverteilung Vector space Beta function Point (geometry) Binary code Gradient Sampling (statistics) Menu (computing) Special unitary group Variable (mathematics) Trigonometric functions Open set Type theory Arithmetic mean Process (computing) Beta distribution Vector space Uniform resource name Quadrilateral Inference Personal area network Right angle Landau theory Alpha (investment) Row (database) Point (geometry) Transformation (genetics) Mass Mathematical model Hexagon Latent heat Thetafunktion Object-oriented programming Arithmetic mean Well-formed formula Term (mathematics) Average Hierarchy Gamma function Alpha (investment) Random variable Supremum Robot Scale (map) Mobile app Scaling (geometry) Mathematical model Information Graph (mathematics) Expression Variance Coma Berenices Letterpress printing Shape (magazine) CAN bus Subject indexing Causality Synchronization
Algorithm Algorithm Complex (psychology) Sampling (statistics) Black box Computer programming Mathematical model Plot (narrative) Element (mathematics) Mathematical model Inference Estimation Term (mathematics) Computer programming Inference Cuboid Right angle Reading (process) Condition number
Reading (process) Algorithm Bound state Data analysis Variable (mathematics) Statistics Computer programming Twitter Programmer (hardware) Bayesian network Hacker (term) Alpha (investment) Laptop
Complete graph Functional (mathematics) Statistics Code Multiplication sign Virtual machine Parameter (computer programming) Parallel computing Streaming media Heat transfer Disk read-and-write head Likelihood function Mathematical model Subset Template (C++) Codierung <Programmierung> Addressing mode Extension (kinesiology) Alpha (investment) Social class Area Pairwise comparison Projective plane Sampling (statistics) Fitness function Parallel port Likelihood-ratio test Instance (computer science) Variable (mathematics) Vector potential Website
so much for coming out to me
is talk about my favorite topic probabilistic programming of the introduced myself real quick and recently wrote relocated back to
Germany after starting Brown where did my PhD on Bayesian modeling in which the decision making and therefore a couple of years have also been working with talking which is a Boston-based start-up and as a quantitative research and there we're building the world's 1st with MCE training platform in the web browser the top will be tangentially related and so on and say it was going to show you that screenshot of what it looks like so this is essentially what you see when you go on the website it's a based IDE way can call Python caught up in training strategy and then we provide historical data so that you can test holy on would have done and it was 2 thousand 12 and then on the right you see how well did didn't often and that's what I refer back to you are interested in whether you beat the market or against the market and also show that it's completely free and everyone can use it OK so I think that every time they should be the main question
is well why should you care about probabilistic programming and
and so it's not really easy charges because probabilistic programming each have at least a basic understanding of some concepts
of probability theory and statistics so the 1st 20 minutes I will just give a very quick primer focusing on 1 and to the level of understanding I can get a quick show of hands like cool so the understands on an intuitive level Bayes formula works OK so most of you so maybe you wanted need that primer but the of interesting and 2 is the and then we have a simple example minimal more advanced exam that should be interested even if you know already quite a bit about business statistics so to motivate this further I really
like this In contrast that alleviate gave it this talk about machine learning and that is
chances are you are a data scientist maybe use I could learn to train your machine learning classifiers so what this looks like this on the left you have data in the that used to train of M and then you that I will make predictions and of those predictions all you care about then that that might be finite but 1 several problem that most of these items have is that in the very bad at conveying what they have learned very difficult to inquire what goes on in this black box right here so on the other hand probabilistic programming is inherently open box and using the best way to think about this is that it is a statistical toolbox for you to create these very rich models that's a really tailored to the specific data that you're working with and then you can require that model and we see what's going on and what was learned so that you can learn something about the data rather than just making predictions right and the other big benefits I think and we'll see that there is that these the only type of models work with them so the black box inference engine which our sampling islands that work across the world huge variety of models so you don't really have to worry about the inference that all you have to do is basically build that model and then the inference spot and in most cases you just get the answers that you looking for so there's not really much in terms of solving equations which is always so throughout this talk only use a very simple example that most of you will be familiar with and that is that they be tested as you know when you have to websites and you wanna know which 1 works better and some measure that uses and maybe conversion rate or how many users click on an ad where you to test that so but the users into 2 groups and give group 1 right so that they can give group to websites the and then you wanna look which 1 had the higher measured that problem is course which would general and since some coming in from finance background and so you switch back and forth between the statistically speaking and then the problem we have 2 training
algorithms and you wanna know which 1 has a higher chance of being the market on each day so she understands that
generate some data and to really see what the triple ancestral answers that you might come up with the yield
and how we can improve upon that and then of you might be surprised about using real data but I think that is actually a critical step as before you applying model on real data you should always use simulated data we really know what's going on in the parameters that you want recover so that you know that the model works correctly and only then you can be sure that you get correct answers by applying it to really right so the date the data that I work with is it will be binary of the sporting events and that type of statistics statistical process called really and that is essentially just the coin right the probabilities of conflicts and I can use that's from sigh price that's and that's apparently and might pass and the probability of that the point of coming up heads for that algorithm of beating the market on a particular day on that website and that of converting the user and here I'm sampling 10 trials so this will be the result right just a bunch of binary 0 and what zeros and ones some generating 2 rhythms and 1 with the 2001 the 60 % so you wanna know which 1 is better the easiest thing that you might want do that you might come up with is just well let's take the mean right and actually statistically speaking there's not a terrible idea is called the maximum likelihood estimate and so if you ask an applied mathematician from which to do then that might be the answer to the cause in applied math and and the proof so we're very similar way because we have this problem and then you say well OK
let's have all data go to infinity and then you solve it and then you get the estimator works correctly in that case and that's great but what do you do if you don't have an infinite amount of data that's the much more likely case that UBM and the that I think is where degree work well so what happens in our case now where I just take the mean of the data just generated and she can see them in this case we get the we
estimated that the chance of a sovereign beating the market is 10 per cent and 40 per cent for the other 1 so obviously that's completely minus 50 and 60 % are generated and the obvious answer of why this goes wrong it's just I was unlucky and the observant members in the audience will have noticed that I used a particular random see here so I found that we did that to that random seed to produce this very we're sequence of events and that is the produce this pattern but certainly that can happen with really the right you can be unlucky and that 1st 10 you visitors of the website just just complex and the central thing that I think is missing here is the uncertainty that has the the right 10 % for that and that's just number but we're missing whole confidence behind the number so for the remainder of the talk will be recurring topic is really trying to quantify the uncertainty and then you
might say and say well there is this huge profit just equipment
frequentist at the states on which designed a statistical test to decide which 1 of those 2 is that all there is a significant difference than you might want to test and that returns a probability value that indicates how likely I to observe the data it was generated chance and that you that's certainly the practical to do but 1 of the central part of frequentist artistic is that it's incredibly easy to use it for example and you might on the you might collect some data and the test doesn't have anything and then on the next day and more data so what you want you just run another test with him with all the data we have now right you have more data so the test should be more accurate I'm putting that's not the case and you can see that you just created a very simple as
an example where do that procedure generates 50 random binaries with 50 % probably both so there is no difference between them and then I start with just 2 events are to test if that is not do 3 other entity test right and that just that process of continuously adding data and testing whether there's a difference and if there is a difference of smaller than point 0 5 10 I refer it is and then return false and then every time that a thousand times and I look at what the probability is that you know there is no statistical there's no difference at all between those 2 it was point 5 what is the chance of this test you an answer it's that it is a significant difference and it's 36 . 6 per cent in the case which also absurdly high right so this procedure really fails of use it in that way and granted I and is used to test the right it's not designed to work in that specific area but it's extremely common if people that and up for me 1 of the central problems is that frequentist test really I depend on the intentions of collecting the data
so if you use a different procedure of collecting the data for example say what I just did I just had data every day then you need a different statistical test the if you think about this more actually pretty crazy right if you just I you data sizes and
you just get data from a database have no idea what intentions were of going back to right so and you want to be very free in exploring the dataset and running all kinds of statistical tests to see what's going on so I think was proposed to this is certainly not wrong it's often very constricting and what it allows you to do any if you don't do things correctly you might you should use of the and I think that's really not a good set of a statistics and use that very quickly so the core
we have based formula and if you don't know what that is essentially it's just the formula that tells us how to update our beliefs when we observe data that implies that we have prior beliefs about the world that we have to followers and then few we apply then we see data and we apply for these funds to update of beliefs in light of the new data to give us all posterior and in general these beliefs are represented as random variables and also the very quickly talk about what what goes on with 2 ways of thinking models so decisions like to call their parameters the random variable state so this 1 menus here and let's define prior for a random variable fade and fade out will be the random variable about going algorithm beating the market as single Ivan beating the market or the websites converting user right so what's the chance that that happens groups so I didn't show that I just want to show that and so the best way to think about that random variable it is as opposed to the variable that might have from Python programming which just has a single value say i equals 5 is here we don't know the value right we want to reason about the value we have some ideas some rough idea what that value so rather than just having 1 we have we allow multiple values and assigned each possible value of probability and what that what chosen so on the that such as we have possible states that the system can be a for example that I wouldn't can have a chance of 50 % of the market and then assume that that is the most likely case just that's my personal prior belief without having seen anything and then assume that on average 50 % is probably a good estimate but I wouldn't be terribly surprised to see something with 60 % you know it's less likely 80 % considerably less likely but still possible 100 % that's like beats the market on every day that that I think would be next impossible right so then assigned very low probability that so that's very intuitive way of thinking about that so now let's see what happens if I have observed data and for that I created this and literature and where can add data when I use this letter and then it will update that probability distribution on him and so that will be of history right currently there is no data available so all posterior will just be all prior so that is just believe we have without having seen anything and now and then and a single data points from success so we just ran the item for a single day and beat the market so now as you might have seen that the distribution is shifted a little bit to the right side right and that represents all updated belief that it's a little bit more likely now that the algorithm it is generating positive returns so now that's reproduce that example from before we had 1 success and 9 and failures right so there was I was may then we estimated has a 10 percent chance of beating the market so and that was that was ridiculous right with that amount of data no way we could say that and also without prior knowledge way we would assume that 10 % is actually the probably Sinatra that his updating probably distribution here which is not updated belief that's certainly with 9 assume that there is lower chance of success of that I which is represented by this distribution moving to the left and but still knowledge that if the 10 % is still extremely unlikely under this condition right and that is the influence of the prior we set 10 % is unlikely so that would influence or estimates away from these very low values the other thing to note is that the distribution is still pretty wide so she and now we have our uncertainty measure in the width of the distribution of the wider it is the less certain I am about that particular value so now wanted to you had just imagine what the distribution look like if I move this up to 90 and the success of 210 right so basically now we observing data that is in line with the hypothesis that it has a 90 % failure probability so as you can see the main thing that happens is that most select but also gets much narrow and that represents all increase confidence with having seen data we have more confidence in the past and that's exactly what by the way close is it that I can use these images and life notebook but as well as the catch with all of that right this sounds a that's too good to be true it just like created model and you update your beliefs and and you're done possibly it's not always that easy and 1 of the main difficulties is that this formula in the middle here can In the most in most cases cannot be solved the case that I just showed you it's extremely simple you just apply the following to cancel that and then you can compute a posterior analytically but even with like just a tiny bit more complex models you get multidimensional integrals stole infinity that will make your eyes leaving no sane human would be able to solve so and I think historically that's 1 of the main reasons why phase which has been around since the 16th century has not been used up until recently now it's kind having renaissance is just people when able to to solve for it and the central idea of probabilistic programming is the logic cancel something then we approximate and luckily for us this this classifier remains the most commonly used called Markov chain Monte Carlo and instead of computing the posterior analytically that curve that we've seen it to draw samples from it and that's about the next best thing I we can do the so just due to time constraints I won't go into the details of the MCMC so we'll just gonna assume that it's a pure black magic and it works and it's the solid is intended as a very
simple but the fact that it works in such cases is still mind blowing to me and and the big benefit is that yeah it it can be applied very widely so often you just justify model we say go and then will give you a so what is MCMC sampling like as we've seen before this is the posterior that everyone right this needs a closed form solution
which we can get in reality so instead we gonna draw samples from the distributions and we have enough samples we can do histogram and then it'll start resembling but it's OK so let's get 2 times the 3 as if we had a promise program framework written in Python and for Python and allows
for contractual was models using intuitive syntax and 1 of the reasons for during times 3 rather than 2 maybe some obvious you use binds
to actively is actually complete rewrite uses no called from times to there were couple of reasons 1 is just new technological death at the coalface of attention to this pretty complexes requested compiled Fortran college always causes huge headaches for users to to get working so I C 3 is actually very simple cold and 1 of the reasons that is the reason the on off for all things and for the whole compute engine so how basically just computing the weirdest creating that compute graph and then shifting everything off to the other and the other benefit we get from the is that it compiled that it can give us the gradient information of the model and there's this new class of algorithms called Hamiltonian Monte Carlo that work that and advanced samples and those works even better in very complex models so they're much more powerful but they require that extra step and that's not easy to get luckily for us the Arnold provides that out of the box so we don't really have to do don't have to do anything the other point I was stresses that Apache 3 it is very extensible and and also it allows it to interact with model much more freely so maybe a used drags on winbugs or a stand which is a
part of the very interesting recent progress in programming framework which and why those a really cool 1 problem I personally have them is that they require you to write a probabilistic programming this specific language and then the compiler you have some wrapper code to get the data into standard and then you have some radical to get it out of standard results and for me there's always very cumbersome so you can really see what's going on in the divided sometimes if we you can right model in Python code and then we interact with it free so you never have to leave essentially Python and that for me is this is a very very powerful and so if you can think of the much was a library and we'll see that in a 2nd just the authors so john saw that use the main guys came up with a
response back also programs quite a bit of Kurdistan outside it still works so it's part which directly well already the main reason myself as and mainly that we're missing good documentation and we're currently right those but if you are operated and would like to have what was that that something more than PCA OK so let's look at that model from early example that we want to and see how we can solve it now any times 3 and for that and understand write down the model how right in statistical terms so we have these 2 random variables right that 1 reason about say a infinity and that will represent the chance of the algorithm beating the market and she we say this toll means it's distributed as something of working with numbers but with distributions so this is at the distribution and that is the distribution that we have been looking at the at the beginning right just from 0 to 1 if he is the 1 with probability is the latest edition is the 1 to use and and so far this is the thing that we want to reason they want lot about given data and then we how do we learn about well we observed data and the data that assimilated was finally so that came from the newly distribution so we have to assume that it's that the data is distributed according to the new distributions of zeros and ones and the probability of the Bernoulli distribution before I just fixed point 5 right here now we're actually wanted for that so since we don't know that value we replace it with a random variable and that is the random variable fit a that we had about the so that is how commonly these these models look like and the other point I wanna make here is that here you really see how you know about they creating a generative model right so you you might wonder like how can I construct no model and the thing goods path for that is to just think of how the data would have been generated right here I know well this this probability and it generated randomly data so that similar mimicry that you can get arbitrarily complex and then so I have all these hidden causes that somehow related
complex ways to the data and then you can invert that model using this formula to the inferred these hidden causes so he understands
again generated a little bit more now so again 50 and 60 per cent probability of being the market or conversion rate
and 300 dollars and this is what the model looks like in 23 so 1st we just import ESPN and we
instantiate the model object which will have all the random variables and then what and the other improvements over time C 2 is that everything you specify what model you do under this with context and that's what that does is that everything you do underneath here will be included in that model objects that you don't have to pass it in all the time so underneath you know this is look pretty familiar from before we just had these random variables right theta a distributed as they distribution so here and now write the same but in Python code as so often a is a better solution we given a name and we give it the 2 parameters and of beta are 2 promise of this distribution takes the number of successes and failures so this is the prior that shortly before that was centered around 50 % and I do the same thing be and and then relates those random variables to my data and as I said before that certainly which instantiated in the name and instead of the a fixed pw now I give it the random variable right that we wanna linked together and since this is an object of node we get it is that the rate of 300 binary numbers that are generated just before so this links to the data and links it to the random variables and the same for the so up until here nothing happens we just basically plot together a few probability distributions that make up the whole I think about my data structures now it's often a good idea to stop sample from a good position and for that we can just optimise the log probability of the model using 5 map for find the maximum a posteriori value and then and then instantiate the sample I want use the various you can choose from him using a slice and which is the 1 which works quite well for these simple models and now I actually want to draw all the samples from the posterior right and for that of a simple function and I to tell it how many samples of 110 thousand I provided with the 1st method that I give it the starting out and when I do
this call it'll take a couple of seconds to run the sampling algorithms and then it really
would return on the structure to which I call trace here and that is essentially dictionaries for each random variable that have assigned I will get the samples that are drawn and now that around that I can inquire about my posterior right so the amusing seaborne which just as the size is also employed in
library on top of that of and you should definitely check out creates very nice statistical plots for example the nice this plot function that is the discrete histogram but 1 that looks much nicer and has for example this once shape climb and I give it the samples that I drew that my MCMC sampling I enjoy off the main thing to be and then it will plot the posterior now that I created and that it is again the combination of prior belief updated by the data that I've seen and now I can reason about that and the 1st thing to see is an the CMB B the probabilities all of the chance of that what I was beating the market is 60 per cent and that's what we used to generate the data so that's good that we get that back and again there's where you selected data to know actually that we're doing the right thing and the other 1 is long 50 % or 49 % the other thing and to note is that she and now instead of just having that single number that simply fell from the sky that we would get if you just take the mean we have our confidence passes right we know how wide the distribution is we can answer many questions about the like how likely is it that the the probability that the chance of successful that I really missed 65 per cent and then we we get a specific number out that that represents a level of certainty and we can do other interesting things like hypothesis testing to answer initial question which of the 2 actually does better and for this we can just compare the number of samples that would for today to the samples of the data being so we just can well how many of those of larger than the other ones and that will tell us well with probability of 99 . 11 % Arabism B is better than a and that is exactly what 1 right so by consistently having confidence estimate carries through from the beginning to the end gives us the benefit of everything you said who has that confidence and probably estimate associated with it OK so that was boring up until now hopefully gets a little more interesting and so consider the case where instead of just 2 urban might have 20 analysis were but we haven't token many users have use the rhythms and maybe we wanna know and only each individual items the chance of successful also the algorithms overall the the group average but they also doing the other also consistently beating the market or not so the easiest modeling can probably build is just the 1 we did before but instead of to set a and B we have 20 right and 1 that's fair and this is called non-pooled model it's somewhat unsatisfying right because we probably assume that then these are not completely separate if there's a lot of work in the same market environment some of them will have similar properties some similar algorithms the using so they will be related somehow right there will be there were differences but they will also have similarities and this
model does not incorporate that right there is no way of what a lot of Figure 1 I would apply to the 2 the other extreme alternative would be to have a fully pooled model where instead of assuming each 1 has
its own random variable I just assume 1 Random aerial for
all of them and that also unsatisfying because we know that there is that structure in our data and which are not exploiting and also even though we might get group estimates we could not say anything about a particular algorithm how well that was the right so the solution which I think is really elegant is called a partially from a hierarchical model and for that
we add another layer on top of the individual random variables right up until here we only have the the model we had before with all these independently but we can do is instead of placing a fixed prior on that we can actually learn that prior for each of them and have a group distribution that will apply to all of us and those models are really tall from the very and many nice properties 1 of them is well what I like about theta 1 from the data and well shake my group distributions and that in turn will shape the estimate of data to so everything a lot about individuals about the group and well and but the group I can apply to constrained individuals and another example where this where we do this quite frequently from my research on say psychology we have the behavioral task that we need will be test 20 subjects on and off and we don't have enough time to collect a lot of data so the subject by itself the estimates we would get and if we fit a model to to that guy it will be we're very noisy and that is a way to the hierarchical model to basically learned from the group and apply that back to the group so we'll get much more accurate estimates for each individual that's very very nice property of these the hierarchical models so here I understand that generates again and the he then essentially the data will be just an array of 20 times
200 20 subjects 100 trials and will just be hero is the binaries of each individual rights and then for convenience i also create this this indexing mass that will use in a 2nd that might not make sense right now and but just keep in the back of your mind and base and indexing the 1st row will be just an index for the 1st subject and indexing into that random variables but this is the data that are going work with OK so for that model of latent 23 so here going to 1st create my group variables to meaning scales so how what's the with the average rate to the average chance of beating the market of tolerance and whole various variables are then distributed the scale parameter and this is the choice of making modeling with price 1 of you is here used a gamma distribution and for the variance values the story of the beta distribution for the group mean and I use a gamma distribution because variance can only be positive with the Sun promise but the details of that are not that critical then unfortunately the beta distribution is parameterized in terms of alpha and beta parameters and not in terms of the mean and variance of fortunately there is this very simple transformation we can do to these mean and variance parameters to convert them to alpha and beta values that I'm doing here and while specifics of that are not important I just wanted to show how easy it is to give you some other languages is not a given that you can just leave very freely combined these random variables and transform them and and still have to work out and the reason is that these are just the amount of expression graphs that once and multiply them it it will actually take the probabilistic users of the formula and combined that and actually the mass in the background of of combining them so then need to open that up with the with the same with my random variables for each rhythms and instead of having a full understanding 20 of them I can pass in the shape argument and that will generate a vector of a random 20 random variables and that will be the data so this is not a single 1 that actually 21 and before you will know that I had just my hard-coded prior of 5 and 5 here right the provisional but now I'm replacing that with the the group estimates that I they're also going to learn about and now again my data is going to be normally distributed and for the probability now I'm going to use that index that I showed you before and essentially that will index into vector In a way so that it will turn that into the two-dimensional array of the same shape as my data and then if it matches at one-to-one and it just as the right thing and then I passed in there to be the of the roads of binary variables for each and again I'm running I'm finding good starting point and note here that I'm using now this called not sampler which is this state of the art sample that uses the gradient information works much better in a complex model specifically these hierarchical models the difficult to estimate but the this type of sampler does a a much better job and there was 1 of the reasons actually to to develop into 3 OK oops and then with the trace plot command we can just create plot so don't mind what the right side but
you know we get our estimates of the group mean and again we have not a single value but rather than the confidence so on average we think it's about 46 % but we have the scale parameter and we have 20 individual everything dry so that
would be theta 1 to theta 20 and all the more constraining each other in the model so that's pretty cool so about what conditions that produce a programming is
pretty cool and that allows you to tell the generative story about a data right and you listen to me tutorial on how to be good data signed it is telling stories about your data right so whole how can you tell stories of all you have is that black box inference so I think that's where promising programming
it's it's really quite improvement you don't have to worry about inference is black box algorithms so it works pretty well you have to know how the what it looks like if they fail and and can be tricky than together going so it's not such a trivial but still some that they often work out of the box and lastly panties given you these advanced sample remember to that and go to further reading so much about photogrammetry design elements that have hopefully higher chance than 50 % of beating the market on for some 193 actually have written a couple post on that and crisis for the best resource for getting getting started and mainly that's just because there is a not that much else written about 23 in terms of documentation and don't hear these um also some really good resources that are recommended to to think about that so things like H. this this so you can I know that
the yes so so the question is stand provides a lot of tools for assessing convergence and many diagnostics but also very nice feature of transforming the variables and placing bounds on that and I so 23 has like the
most common some statistics that you want to look at the government then our had statistic and all of that and can sample in parallel and then compare them and what you can and we do have support for transformed variables it's not like as and Polish as standard just because it's still alpha but there and you can and you can bonded parameters and so that that works but it's not quite stream more questions the each of so some the real world some and you have the corrections to aggression was I can't use the sample that we provide the anatomical because it's too expensive to use so how difficult would it be to use my own samples and that I think is a big benefit of 23 is that you just is inherited from the sample class and then you overwrite the step method and then you have it you can do your own proposals and acceptance and rejection so that's very easy and you look at stand for example I haven't done it but I imagine that it's quite difficult just when I look at the code it's it's really hard cause plus plus and the ball the templates make my head and the other question eventually was like if you can't evaluate the likelihood of you but some of the main that you will so the question is how this compares the was guest or you write your own 7 Python 1 that and so on and so the I think most of the time is actually not stand in December but rather in evaluating the log likelihood of the model and also the grading competitions difficult and it's true that stands is transfer 1 of its fast once it gets started but and it takes quite want to compare the model actually and so in In that sense so I haven't really done this the comparison and the reason we have noticed some areas where pine C 3 is not fast and we need to fix those and and speeded up and and so this standardized have done a lot to really make a child that's the benefit principles clustered on the other hand and 1 benefit I think to the hours that it does all these simplifications to the complete graph and there's like caching and you can run it on the GPU so as we have really explore that to the fullest extent yet but I think that there's lots of potential speedups that just animal could give us and another answered the question is well if you for example you really spend that much time in a simpler and more of just proposing Johnson could also use site on for example and encode in this was have the reasons about parallel sampling and that it is possible so there is just a piece sample functions of the sample function and that will distribute the model it doesn't quite work in every instance at yeah you to processing so you get through parallelisation and just as an an aside this is really cool project that someone on the main is just what what that is what kinds of to the centric could be applied to 23 and he's uses spot and to basically to the sampling of pair parallel on on big data like you have data that doesn't fit on a single machine you can run individual samples on subsets of the data and parallel and then aggregate them and spot let's you do that very nicely and he basically hooked up and pine C and and spark so that's really really exciting all the


  452 ms - page object


AV-Portal 3.20.1 (bea96f1033d39fbe77f82542458e108105398441)