Merken

# Probabilistic Programming in Python

#### Automatisierte Medienanalyse

## Diese automatischen Videoanalysen setzt das TIB|AV-Portal ein:

**Szenenerkennung**—

**Shot Boundary Detection**segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.

**Texterkennung**–

**Intelligent Character Recognition**erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.

**Spracherkennung**–

**Speech to Text**notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.

**Bilderkennung**–

**Visual Concept Detection**indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).

**Verschlagwortung**–

**Named Entity Recognition**beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.

Erkannte Entitäten

Sprachtranskript

00:15

so much for coming out to me

00:17

is talk about my favorite topic probabilistic programming of the introduced myself real quick and recently wrote relocated back to

00:26

Germany after starting Brown where did my PhD on Bayesian modeling in which the decision making and therefore a couple of years have also been working with talking which is a Boston-based start-up and as a quantitative research and there we're building the world's 1st with MCE training platform in the web browser the top will be tangentially related and so on and say it was going to show you that screenshot of what it looks like so this is essentially what you see when you go on the website it's a based IDE way can call Python caught up in training strategy and then we provide historical data so that you can test holy on would have done and it was 2 thousand 12 and then on the right you see how well did didn't often and that's what I refer back to you are interested in whether you beat the market or against the market and also show that it's completely free and everyone can use it OK so I think that every time they should be the main question

01:27

is well why should you care about probabilistic programming and

01:33

and so it's not really easy charges because probabilistic programming each have at least a basic understanding of some concepts

01:44

of probability theory and statistics so the 1st 20 minutes I will just give a very quick primer focusing on 1 and to the level of understanding I can get a quick show of hands like cool so the understands on an intuitive level Bayes formula works OK so most of you so maybe you wanted need that primer but the of interesting and 2 is the and then we have a simple example minimal more advanced exam that should be interested even if you know already quite a bit about business statistics so to motivate this further I really

02:20

like this In contrast that alleviate gave it this talk about machine learning and that is

02:29

chances are you are a data scientist maybe use I could learn to train your machine learning classifiers so what this looks like this on the left you have data in the that used to train of M and then you that I will make predictions and of those predictions all you care about then that that might be finite but 1 several problem that most of these items have is that in the very bad at conveying what they have learned very difficult to inquire what goes on in this black box right here so on the other hand probabilistic programming is inherently open box and using the best way to think about this is that it is a statistical toolbox for you to create these very rich models that's a really tailored to the specific data that you're working with and then you can require that model and we see what's going on and what was learned so that you can learn something about the data rather than just making predictions right and the other big benefits I think and we'll see that there is that these the only type of models work with them so the black box inference engine which our sampling islands that work across the world huge variety of models so you don't really have to worry about the inference that all you have to do is basically build that model and then the inference spot and in most cases you just get the answers that you looking for so there's not really much in terms of solving equations which is always so throughout this talk only use a very simple example that most of you will be familiar with and that is that they be tested as you know when you have to websites and you wanna know which 1 works better and some measure that uses and maybe conversion rate or how many users click on an ad where you to test that so but the users into 2 groups and give group 1 right so that they can give group to websites the and then you wanna look which 1 had the higher measured that problem is course which would general and since some coming in from finance background and so you switch back and forth between the statistically speaking and then the problem we have 2 training

04:52

algorithms and you wanna know which 1 has a higher chance of being the market on each day so she understands that

05:05

generate some data and to really see what the triple ancestral answers that you might come up with the yield

05:13

and how we can improve upon that and then of you might be surprised about using real data but I think that is actually a critical step as before you applying model on real data you should always use simulated data we really know what's going on in the parameters that you want recover so that you know that the model works correctly and only then you can be sure that you get correct answers by applying it to really right so the date the data that I work with is it will be binary of the sporting events and that type of statistics statistical process called really and that is essentially just the coin right the probabilities of conflicts and I can use that's from sigh price that's and that's apparently and might pass and the probability of that the point of coming up heads for that algorithm of beating the market on a particular day on that website and that of converting the user and here I'm sampling 10 trials so this will be the result right just a bunch of binary 0 and what zeros and ones some generating 2 rhythms and 1 with the 2001 the 60 % so you wanna know which 1 is better the easiest thing that you might want do that you might come up with is just well let's take the mean right and actually statistically speaking there's not a terrible idea is called the maximum likelihood estimate and so if you ask an applied mathematician from which to do then that might be the answer to the cause in applied math and and the proof so we're very similar way because we have this problem and then you say well OK

07:00

let's have all data go to infinity and then you solve it and then you get the estimator works correctly in that case and that's great but what do you do if you don't have an infinite amount of data that's the much more likely case that UBM and the that I think is where degree work well so what happens in our case now where I just take the mean of the data just generated and she can see them in this case we get the we

07:31

estimated that the chance of a sovereign beating the market is 10 per cent and 40 per cent for the other 1 so obviously that's completely minus 50 and 60 % are generated and the obvious answer of why this goes wrong it's just I was unlucky and the observant members in the audience will have noticed that I used a particular random see here so I found that we did that to that random seed to produce this very we're sequence of events and that is the produce this pattern but certainly that can happen with really the right you can be unlucky and that 1st 10 you visitors of the website just just complex and the central thing that I think is missing here is the uncertainty that has the the right 10 % for that and that's just number but we're missing whole confidence behind the number so for the remainder of the talk will be recurring topic is really trying to quantify the uncertainty and then you

08:32

might say and say well there is this huge profit just equipment

08:37

frequentist at the states on which designed a statistical test to decide which 1 of those 2 is that all there is a significant difference than you might want to test and that returns a probability value that indicates how likely I to observe the data it was generated chance and that you that's certainly the practical to do but 1 of the central part of frequentist artistic is that it's incredibly easy to use it for example and you might on the you might collect some data and the test doesn't have anything and then on the next day and more data so what you want you just run another test with him with all the data we have now right you have more data so the test should be more accurate I'm putting that's not the case and you can see that you just created a very simple as

09:30

an example where do that procedure generates 50 random binaries with 50 % probably both so there is no difference between them and then I start with just 2 events are to test if that is not do 3 other entity test right and that just that process of continuously adding data and testing whether there's a difference and if there is a difference of smaller than point 0 5 10 I refer it is and then return false and then every time that a thousand times and I look at what the probability is that you know there is no statistical there's no difference at all between those 2 it was point 5 what is the chance of this test you an answer it's that it is a significant difference and it's 36 . 6 per cent in the case which also absurdly high right so this procedure really fails of use it in that way and granted I and is used to test the right it's not designed to work in that specific area but it's extremely common if people that and up for me 1 of the central problems is that frequentist test really I depend on the intentions of collecting the data

10:46

so if you use a different procedure of collecting the data for example say what I just did I just had data every day then you need a different statistical test the if you think about this more actually pretty crazy right if you just I you data sizes and

11:04

you just get data from a database have no idea what intentions were of going back to right so and you want to be very free in exploring the dataset and running all kinds of statistical tests to see what's going on so I think was proposed to this is certainly not wrong it's often very constricting and what it allows you to do any if you don't do things correctly you might you should use of the and I think that's really not a good set of a statistics and use that very quickly so the core

11:40

we have based formula and if you don't know what that is essentially it's just the formula that tells us how to update our beliefs when we observe data that implies that we have prior beliefs about the world that we have to followers and then few we apply then we see data and we apply for these funds to update of beliefs in light of the new data to give us all posterior and in general these beliefs are represented as random variables and also the very quickly talk about what what goes on with 2 ways of thinking models so decisions like to call their parameters the random variable state so this 1 menus here and let's define prior for a random variable fade and fade out will be the random variable about going algorithm beating the market as single Ivan beating the market or the websites converting user right so what's the chance that that happens groups so I didn't show that I just want to show that and so the best way to think about that random variable it is as opposed to the variable that might have from Python programming which just has a single value say i equals 5 is here we don't know the value right we want to reason about the value we have some ideas some rough idea what that value so rather than just having 1 we have we allow multiple values and assigned each possible value of probability and what that what chosen so on the that such as we have possible states that the system can be a for example that I wouldn't can have a chance of 50 % of the market and then assume that that is the most likely case just that's my personal prior belief without having seen anything and then assume that on average 50 % is probably a good estimate but I wouldn't be terribly surprised to see something with 60 % you know it's less likely 80 % considerably less likely but still possible 100 % that's like beats the market on every day that that I think would be next impossible right so then assigned very low probability that so that's very intuitive way of thinking about that so now let's see what happens if I have observed data and for that I created this and literature and where can add data when I use this letter and then it will update that probability distribution on him and so that will be of history right currently there is no data available so all posterior will just be all prior so that is just believe we have without having seen anything and now and then and a single data points from success so we just ran the item for a single day and beat the market so now as you might have seen that the distribution is shifted a little bit to the right side right and that represents all updated belief that it's a little bit more likely now that the algorithm it is generating positive returns so now that's reproduce that example from before we had 1 success and 9 and failures right so there was I was may then we estimated has a 10 percent chance of beating the market so and that was that was ridiculous right with that amount of data no way we could say that and also without prior knowledge way we would assume that 10 % is actually the probably Sinatra that his updating probably distribution here which is not updated belief that's certainly with 9 assume that there is lower chance of success of that I which is represented by this distribution moving to the left and but still knowledge that if the 10 % is still extremely unlikely under this condition right and that is the influence of the prior we set 10 % is unlikely so that would influence or estimates away from these very low values the other thing to note is that the distribution is still pretty wide so she and now we have our uncertainty measure in the width of the distribution of the wider it is the less certain I am about that particular value so now wanted to you had just imagine what the distribution look like if I move this up to 90 and the success of 210 right so basically now we observing data that is in line with the hypothesis that it has a 90 % failure probability so as you can see the main thing that happens is that most select but also gets much narrow and that represents all increase confidence with having seen data we have more confidence in the past and that's exactly what by the way close is it that I can use these images and life notebook but as well as the catch with all of that right this sounds a that's too good to be true it just like created model and you update your beliefs and and you're done possibly it's not always that easy and 1 of the main difficulties is that this formula in the middle here can In the most in most cases cannot be solved the case that I just showed you it's extremely simple you just apply the following to cancel that and then you can compute a posterior analytically but even with like just a tiny bit more complex models you get multidimensional integrals stole infinity that will make your eyes leaving no sane human would be able to solve so and I think historically that's 1 of the main reasons why phase which has been around since the 16th century has not been used up until recently now it's kind having renaissance is just people when able to to solve for it and the central idea of probabilistic programming is the logic cancel something then we approximate and luckily for us this this classifier remains the most commonly used called Markov chain Monte Carlo and instead of computing the posterior analytically that curve that we've seen it to draw samples from it and that's about the next best thing I we can do the so just due to time constraints I won't go into the details of the MCMC so we'll just gonna assume that it's a pure black magic and it works and it's the solid is intended as a very

18:22

simple but the fact that it works in such cases is still mind blowing to me and and the big benefit is that yeah it it can be applied very widely so often you just justify model we say go and then will give you a so what is MCMC sampling like as we've seen before this is the posterior that everyone right this needs a closed form solution

18:46

which we can get in reality so instead we gonna draw samples from the distributions and we have enough samples we can do histogram and then it'll start resembling but it's OK so let's get 2 times the 3 as if we had a promise program framework written in Python and for Python and allows

19:12

for contractual was models using intuitive syntax and 1 of the reasons for during times 3 rather than 2 maybe some obvious you use binds

19:23

to actively is actually complete rewrite uses no called from times to there were couple of reasons 1 is just new technological death at the coalface of attention to this pretty complexes requested compiled Fortran college always causes huge headaches for users to to get working so I C 3 is actually very simple cold and 1 of the reasons that is the reason the on off for all things and for the whole compute engine so how basically just computing the weirdest creating that compute graph and then shifting everything off to the other and the other benefit we get from the is that it compiled that it can give us the gradient information of the model and there's this new class of algorithms called Hamiltonian Monte Carlo that work that and advanced samples and those works even better in very complex models so they're much more powerful but they require that extra step and that's not easy to get luckily for us the Arnold provides that out of the box so we don't really have to do don't have to do anything the other point I was stresses that Apache 3 it is very extensible and and also it allows it to interact with model much more freely so maybe a used drags on winbugs or a stand which is a

20:43

part of the very interesting recent progress in programming framework which and why those a really cool 1 problem I personally have them is that they require you to write a probabilistic programming this specific language and then the compiler you have some wrapper code to get the data into standard and then you have some radical to get it out of standard results and for me there's always very cumbersome so you can really see what's going on in the divided sometimes if we you can right model in Python code and then we interact with it free so you never have to leave essentially Python and that for me is this is a very very powerful and so if you can think of the much was a library and we'll see that in a 2nd just the authors so john saw that use the main guys came up with a

21:36

response back also programs quite a bit of Kurdistan outside it still works so it's part which directly well already the main reason myself as and mainly that we're missing good documentation and we're currently right those but if you are operated and would like to have what was that that something more than PCA OK so let's look at that model from early example that we want to and see how we can solve it now any times 3 and for that and understand write down the model how right in statistical terms so we have these 2 random variables right that 1 reason about say a infinity and that will represent the chance of the algorithm beating the market and she we say this toll means it's distributed as something of working with numbers but with distributions so this is at the distribution and that is the distribution that we have been looking at the at the beginning right just from 0 to 1 if he is the 1 with probability is the latest edition is the 1 to use and and so far this is the thing that we want to reason they want lot about given data and then we how do we learn about well we observed data and the data that assimilated was finally so that came from the newly distribution so we have to assume that it's that the data is distributed according to the new distributions of zeros and ones and the probability of the Bernoulli distribution before I just fixed point 5 right here now we're actually wanted for that so since we don't know that value we replace it with a random variable and that is the random variable fit a that we had about the so that is how commonly these these models look like and the other point I wanna make here is that here you really see how you know about they creating a generative model right so you you might wonder like how can I construct no model and the thing goods path for that is to just think of how the data would have been generated right here I know well this this probability and it generated randomly data so that similar mimicry that you can get arbitrarily complex and then so I have all these hidden causes that somehow related

24:03

complex ways to the data and then you can invert that model using this formula to the inferred these hidden causes so he understands

24:17

again generated a little bit more now so again 50 and 60 per cent probability of being the market or conversion rate

24:24

and 300 dollars and this is what the model looks like in 23 so 1st we just import ESPN and we

24:34

instantiate the model object which will have all the random variables and then what and the other improvements over time C 2 is that everything you specify what model you do under this with context and that's what that does is that everything you do underneath here will be included in that model objects that you don't have to pass it in all the time so underneath you know this is look pretty familiar from before we just had these random variables right theta a distributed as they distribution so here and now write the same but in Python code as so often a is a better solution we given a name and we give it the 2 parameters and of beta are 2 promise of this distribution takes the number of successes and failures so this is the prior that shortly before that was centered around 50 % and I do the same thing be and and then relates those random variables to my data and as I said before that certainly which instantiated in the name and instead of the a fixed pw now I give it the random variable right that we wanna linked together and since this is an object of node we get it is that the rate of 300 binary numbers that are generated just before so this links to the data and links it to the random variables and the same for the so up until here nothing happens we just basically plot together a few probability distributions that make up the whole I think about my data structures now it's often a good idea to stop sample from a good position and for that we can just optimise the log probability of the model using 5 map for find the maximum a posteriori value and then and then instantiate the sample I want use the various you can choose from him using a slice and which is the 1 which works quite well for these simple models and now I actually want to draw all the samples from the posterior right and for that of a simple function and I to tell it how many samples of 110 thousand I provided with the 1st method that I give it the starting out and when I do

27:12

this call it'll take a couple of seconds to run the sampling algorithms and then it really

27:18

would return on the structure to which I call trace here and that is essentially dictionaries for each random variable that have assigned I will get the samples that are drawn and now that around that I can inquire about my posterior right so the amusing seaborne which just as the size is also employed in

27:45

library on top of that of and you should definitely check out creates very nice statistical plots for example the nice this plot function that is the discrete histogram but 1 that looks much nicer and has for example this once shape climb and I give it the samples that I drew that my MCMC sampling I enjoy off the main thing to be and then it will plot the posterior now that I created and that it is again the combination of prior belief updated by the data that I've seen and now I can reason about that and the 1st thing to see is an the CMB B the probabilities all of the chance of that what I was beating the market is 60 per cent and that's what we used to generate the data so that's good that we get that back and again there's where you selected data to know actually that we're doing the right thing and the other 1 is long 50 % or 49 % the other thing and to note is that she and now instead of just having that single number that simply fell from the sky that we would get if you just take the mean we have our confidence passes right we know how wide the distribution is we can answer many questions about the like how likely is it that the the probability that the chance of successful that I really missed 65 per cent and then we we get a specific number out that that represents a level of certainty and we can do other interesting things like hypothesis testing to answer initial question which of the 2 actually does better and for this we can just compare the number of samples that would for today to the samples of the data being so we just can well how many of those of larger than the other ones and that will tell us well with probability of 99 . 11 % Arabism B is better than a and that is exactly what 1 right so by consistently having confidence estimate carries through from the beginning to the end gives us the benefit of everything you said who has that confidence and probably estimate associated with it OK so that was boring up until now hopefully gets a little more interesting and so consider the case where instead of just 2 urban might have 20 analysis were but we haven't token many users have use the rhythms and maybe we wanna know and only each individual items the chance of successful also the algorithms overall the the group average but they also doing the other also consistently beating the market or not so the easiest modeling can probably build is just the 1 we did before but instead of to set a and B we have 20 right and 1 that's fair and this is called non-pooled model it's somewhat unsatisfying right because we probably assume that then these are not completely separate if there's a lot of work in the same market environment some of them will have similar properties some similar algorithms the using so they will be related somehow right there will be there were differences but they will also have similarities and this

31:21

model does not incorporate that right there is no way of what a lot of Figure 1 I would apply to the 2 the other extreme alternative would be to have a fully pooled model where instead of assuming each 1 has

31:34

its own random variable I just assume 1 Random aerial for

31:37

all of them and that also unsatisfying because we know that there is that structure in our data and which are not exploiting and also even though we might get group estimates we could not say anything about a particular algorithm how well that was the right so the solution which I think is really elegant is called a partially from a hierarchical model and for that

32:07

we add another layer on top of the individual random variables right up until here we only have the the model we had before with all these independently but we can do is instead of placing a fixed prior on that we can actually learn that prior for each of them and have a group distribution that will apply to all of us and those models are really tall from the very and many nice properties 1 of them is well what I like about theta 1 from the data and well shake my group distributions and that in turn will shape the estimate of data to so everything a lot about individuals about the group and well and but the group I can apply to constrained individuals and another example where this where we do this quite frequently from my research on say psychology we have the behavioral task that we need will be test 20 subjects on and off and we don't have enough time to collect a lot of data so the subject by itself the estimates we would get and if we fit a model to to that guy it will be we're very noisy and that is a way to the hierarchical model to basically learned from the group and apply that back to the group so we'll get much more accurate estimates for each individual that's very very nice property of these the hierarchical models so here I understand that generates again and the he then essentially the data will be just an array of 20 times

33:43

200 20 subjects 100 trials and will just be hero is the binaries of each individual rights and then for convenience i also create this this indexing mass that will use in a 2nd that might not make sense right now and but just keep in the back of your mind and base and indexing the 1st row will be just an index for the 1st subject and indexing into that random variables but this is the data that are going work with OK so for that model of latent 23 so here going to 1st create my group variables to meaning scales so how what's the with the average rate to the average chance of beating the market of tolerance and whole various variables are then distributed the scale parameter and this is the choice of making modeling with price 1 of you is here used a gamma distribution and for the variance values the story of the beta distribution for the group mean and I use a gamma distribution because variance can only be positive with the Sun promise but the details of that are not that critical then unfortunately the beta distribution is parameterized in terms of alpha and beta parameters and not in terms of the mean and variance of fortunately there is this very simple transformation we can do to these mean and variance parameters to convert them to alpha and beta values that I'm doing here and while specifics of that are not important I just wanted to show how easy it is to give you some other languages is not a given that you can just leave very freely combined these random variables and transform them and and still have to work out and the reason is that these are just the amount of expression graphs that once and multiply them it it will actually take the probabilistic users of the formula and combined that and actually the mass in the background of of combining them so then need to open that up with the with the same with my random variables for each rhythms and instead of having a full understanding 20 of them I can pass in the shape argument and that will generate a vector of a random 20 random variables and that will be the data so this is not a single 1 that actually 21 and before you will know that I had just my hard-coded prior of 5 and 5 here right the provisional but now I'm replacing that with the the group estimates that I they're also going to learn about and now again my data is going to be normally distributed and for the probability now I'm going to use that index that I showed you before and essentially that will index into vector In a way so that it will turn that into the two-dimensional array of the same shape as my data and then if it matches at one-to-one and it just as the right thing and then I passed in there to be the of the roads of binary variables for each and again I'm running I'm finding good starting point and note here that I'm using now this called not sampler which is this state of the art sample that uses the gradient information works much better in a complex model specifically these hierarchical models the difficult to estimate but the this type of sampler does a a much better job and there was 1 of the reasons actually to to develop into 3 OK oops and then with the trace plot command we can just create plot so don't mind what the right side but

37:49

you know we get our estimates of the group mean and again we have not a single value but rather than the confidence so on average we think it's about 46 % but we have the scale parameter and we have 20 individual everything dry so that

38:08

would be theta 1 to theta 20 and all the more constraining each other in the model so that's pretty cool so about what conditions that produce a programming is

38:20

pretty cool and that allows you to tell the generative story about a data right and you listen to me tutorial on how to be good data signed it is telling stories about your data right so whole how can you tell stories of all you have is that black box inference so I think that's where promising programming

38:39

it's it's really quite improvement you don't have to worry about inference is black box algorithms so it works pretty well you have to know how the what it looks like if they fail and and can be tricky than together going so it's not such a trivial but still some that they often work out of the box and lastly panties given you these advanced sample remember to that and go to further reading so much about photogrammetry design elements that have hopefully higher chance than 50 % of beating the market on for some 193 actually have written a couple post on that and crisis for the best resource for getting getting started and mainly that's just because there is a not that much else written about 23 in terms of documentation and don't hear these um also some really good resources that are recommended to to think about that so things like H. this this so you can I know that

40:11

the yes so so the question is stand provides a lot of tools for assessing convergence and many diagnostics but also very nice feature of transforming the variables and placing bounds on that and I so 23 has like the

40:43

most common some statistics that you want to look at the government then our had statistic and all of that and can sample in parallel and then compare them and what you can and we do have support for transformed variables it's not like as and Polish as standard just because it's still alpha but there and you can and you can bonded parameters and so that that works but it's not quite stream more questions the each of so some the real world some and you have the corrections to aggression was I can't use the sample that we provide the anatomical because it's too expensive to use so how difficult would it be to use my own samples and that I think is a big benefit of 23 is that you just is inherited from the sample class and then you overwrite the step method and then you have it you can do your own proposals and acceptance and rejection so that's very easy and you look at stand for example I haven't done it but I imagine that it's quite difficult just when I look at the code it's it's really hard cause plus plus and the ball the templates make my head and the other question eventually was like if you can't evaluate the likelihood of you but some of the main that you will so the question is how this compares the was guest or you write your own 7 Python 1 that and so on and so the I think most of the time is actually not stand in December but rather in evaluating the log likelihood of the model and also the grading competitions difficult and it's true that stands is transfer 1 of its fast once it gets started but and it takes quite want to compare the model actually and so in In that sense so I haven't really done this the comparison and the reason we have noticed some areas where pine C 3 is not fast and we need to fix those and and speeded up and and so this standardized have done a lot to really make a child that's the benefit principles clustered on the other hand and 1 benefit I think to the hours that it does all these simplifications to the complete graph and there's like caching and you can run it on the GPU so as we have really explore that to the fullest extent yet but I think that there's lots of potential speedups that just animal could give us and another answered the question is well if you for example you really spend that much time in a simpler and more of just proposing Johnson could also use site on for example and encode in this was have the reasons about parallel sampling and that it is possible so there is just a piece sample functions of the sample function and that will distribute the model it doesn't quite work in every instance at yeah you to processing so you get through parallelisation and just as an an aside this is really cool project that someone on the main is just what what that is what kinds of to the centric could be applied to 23 and he's uses spot and to basically to the sampling of pair parallel on on big data like you have data that doesn't fit on a single machine you can run individual samples on subsets of the data and parallel and then aggregate them and spot let's you do that very nicely and he basically hooked up and pine C and and spark so that's really really exciting all the

00:00

Algorithmus

Wellenpaket

Browser

Browser

Systemplattform

Gebäude <Mathematik>

Systemplattform

Optimierung

Computeranimation

Entscheidungstheorie

Entscheidungstheorie

Bayes-Verfahren

Mathematisches Modell

Lesezeichen <Internet>

Rechter Winkel

Strategisches Spiel

Optimierung

01:27

Bit

Wahrscheinlichkeitsrechnung

Statistische Analyse

Optimierung

Optimierung

Computeranimation

Übergang

Ausdruck <Logik>

02:20

Offene Menge

Web Site

Umsetzung <Informatik>

Wellenpaket

Inferenzmaschine

Inferenz <Künstliche Intelligenz>

Quader

Blackbox

Mathematisches Modell

Gruppenkeim

Sprachsynthese

Gleichungssystem

Term

Computeranimation

Mathematisches Modell

Open Source

Virtuelle Maschine

Softwaretest

Prognoseverfahren

Gruppentheorie

Datentyp

Mathematische Modellierung

Schlussfolgern

Kontrast <Statistik>

Algorithmische Lerntheorie

Optimierung

Einflussgröße

Prognostik

Web Site

Umsetzung <Informatik>

Bitrate

Optimierung

Rechter Winkel

Messprozess

Versionsverwaltung

Bitrate

Varietät <Mathematik>

04:51

Resultante

Punkt

Prozess <Physik>

Bernoullische Zahl

Mathematisches Modell

Versionsverwaltung

Extrempunkt

Computeranimation

Eins

Algorithmus

Softwaretest

Gruppentheorie

Reelle Zahl

Schätzung

Maximum-Likelihood-Schätzung

Datentyp

Statistische Analyse

Schreib-Lese-Kopf

Parametersystem

Äquivalenzklasse

Physikalischer Effekt

Stichprobe

Statistische Analyse

Web Site

Umsetzung <Informatik>

Ereignishorizont

Rechter Winkel

Beweistheorie

Mathematikerin

Messprozess

Schwebung

Bitrate

Versionsverwaltung

06:59

Schätzwert

Folge <Mathematik>

Stichprobe

Zahlenbereich

Ereignishorizont

Computeranimation

Unendlichkeit

Arithmetisches Mittel

Divergente Reihe

Minimalgrad

Bereichsschätzung

Rechter Winkel

Mustersprache

Statistische Analyse

Schwebung

08:29

Softwaretest

Zentralisator

Subtraktion

Exakter Test

Mereologie

Statistische Analyse

Computeranimation

Aggregatzustand

09:30

Softwaretest

Subtraktion

Prozess <Physik>

Punkt

Binärcode

Variable

Ereignishorizont

Algorithmische Programmiersprache

Computeranimation

Spannweite <Stochastik>

Zufallszahlen

Flächeninhalt

Exakter Test

Existenzsatz

Rechter Winkel

Statistische Analyse

11:03

Zentralisator

Distributionstheorie

Bit

Punkt

Mathematisches Modell

Gruppenkeim

Statistische Hypothese

Computeranimation

PROM

Algorithmus

Exakter Test

Schwebung

Ausdruck <Logik>

Statistische Analyse

Speicherabzug

A-posteriori-Wahrscheinlichkeit

Gerade

Einflussgröße

Phasenumwandlung

Parametersystem

Datenhaltung

Stichprobe

Cliquenweite

Optimierung

Variable

Entscheidungstheorie

Verkettung <Informatik>

Menge

BAYES

Rechter Winkel

Zufallsvariable

Konditionszahl

Markov-Kette

Aggregatzustand

Nebenbedingung

Web Site

Abgeschlossene Menge

A-posteriori-Wahrscheinlichkeit

Mathematische Logik

Ausdruck <Logik>

Mathematisches Modell

Bereichsschätzung

Notebook-Computer

Stichprobenumfang

Mathematische Modellierung

Optimierung

Bildgebendes Verfahren

Schätzwert

Diskrete Wahrscheinlichkeitsverteilung

Algorithmus

Eindringerkennung

Videospiel

Einfache Genauigkeit

Statistische Analyse

Physikalisches System

Unendlichkeit

Integral

Speicherabzug

18:21

Distributionstheorie

Markov-Ketten-Monte-Carlo-Verfahren

Mathematisches Modell

Abgeschlossene Menge

A-posteriori-Wahrscheinlichkeit

Dialekt

Framework <Informatik>

Computeranimation

Markov-Ketten-Monte-Carlo-Verfahren

Bildschirmmaske

Histogramm

Rechter Winkel

Stichprobenumfang

Vorlesung/Konferenz

A-posteriori-Wahrscheinlichkeit

Optimierung

19:11

Schnelltaste

Mathematische Modellierung

Punkt

Jensen-Maß

Quader

Graph

Markov-Ketten-Monte-Carlo-Verfahren

Klasse <Mathematik>

Mathematisches Modell

Mathematisierung

Optimierung

Komplex <Algebra>

Computeranimation

Gradient

Diskrete Wahrscheinlichkeitsverteilung

Algorithmus

Framework <Informatik>

Mathematische Modellierung

Stichprobenumfang

Information

20:42

Distributionstheorie

Bit

Punkt

Jensen-Maß

Compiler

Mathematisches Modell

Formale Sprache

Zahlenbereich

Term

Framework <Informatik>

Code

Computeranimation

Eins

Algorithmus

Arithmetische Folge

Mathematische Modellierung

Endogene Variable

Schlussfolgern

Optimierung

Autorisierung

Mathematische Modellierung

Markov-Ketten-Monte-Carlo-Verfahren

Physikalischer Effekt

Güte der Anpassung

Ähnlichkeitsgeometrie

Optimierung

Unendlichkeit

Diskrete Wahrscheinlichkeitsverteilung

Generator <Informatik>

BAYES

Framework <Informatik>

Zufallsvariable

Rechter Winkel

Betafunktion

Mereologie

Standardabweichung

Fitnessfunktion

24:03

Distributionstheorie

Umsetzung <Informatik>

Bit

Extrempunkt

Mathematisches Modell

Program Slicing

Zahlenbereich

Code

Computeranimation

Ausdruck <Logik>

Mathematisches Modell

Knotenmenge

Stichprobenumfang

Mathematische Modellierung

Statistische Analyse

Multitasking

Schlussfolgern

A-posteriori-Wahrscheinlichkeit

Datenstruktur

Modallogik

Diskrete Wahrscheinlichkeitsverteilung

Algorithmus

Parametersystem

Lineares Funktional

Markov-Ketten-Monte-Carlo-Verfahren

Physikalischer Effekt

Betafunktion

Binder <Informatik>

Bitrate

Kontextbezogenes System

Variable

Objekt <Kategorie>

BAYES

Zufallsvariable

Rechter Winkel

Betafunktion

Schwebung

27:10

Algorithmus

Markov-Ketten-Monte-Carlo-Verfahren

Zwei

Stichprobe

Befehl <Informatik>

Variable

Computeranimation

Data Dictionary

Mathematisches Modell

Algorithmus

Rechter Winkel

Stichprobenumfang

Multitasking

Schlussfolgern

A-posteriori-Wahrscheinlichkeit

Datenstruktur

Modallogik

27:45

Distributionstheorie

Subtraktion

Mathematisches Modell

Schaltnetz

Gruppenkeim

Zahlenbereich

Statistische Hypothese

Ähnlichkeitsgeometrie

Computeranimation

Übergang

Eins

Markov-Ketten-Monte-Carlo-Verfahren

Mathematisches Modell

Softwaretest

Algorithmus

Bereichsschätzung

Exakter Test

Mittelwert

Stichprobenumfang

Programmbibliothek

Äußere Algebra eines Moduls

Plot <Graphische Darstellung>

A-posteriori-Wahrscheinlichkeit

Figurierte Zahl

Analysis

Schätzwert

Algorithmus

Lineares Funktional

Shape <Informatik>

Mathematische Modellierung

Kategorie <Mathematik>

Plot <Graphische Darstellung>

Statistische Analyse

Ähnlichkeitsgeometrie

Arithmetisches Mittel

Hierarchische Struktur

Histogramm

Rechter Winkel

Parametersystem

Mehrrechnersystem

Programmierumgebung

31:33

Softwaretest

Schätzwert

Distributionstheorie

Kategorie <Mathematik>

Mathematisches Modell

Gruppenkeim

Hierarchische Struktur

Computeranimation

Task

Mathematisches Modell

Algorithmus

Rechter Winkel

Zufallsvariable

Parametersystem

Mathematische Modellierung

Datenstruktur

33:43

Punkt

Formale Sprache

Mathematisches Modell

Gruppenkeim

Ungerichteter Graph

Dicke

Binärcode

Computeranimation

Gradient

Arithmetischer Ausdruck

Maßstab

Prozess <Informatik>

Schlussfolgern

Plot <Graphische Darstellung>

Auswahlaxiom

Array <Informatik>

Umwandlungsenthalpie

Zentrische Streckung

Parametersystem

Shape <Informatik>

Ruhmasse

Plot <Graphische Darstellung>

Bitrate

Variable

Arithmetisches Mittel

Rechter Winkel

Zufallsvariable

Automatische Indexierung

Information

Aggregatzustand

Jensen-Maß

Betaverteilung

Hierarchische Struktur

Vektorraum

Gammaverteilung

Transformation <Mathematik>

Term

Ausdruck <Logik>

Mathematisches Modell

Variable

Datensatz

Zufallszahlen

Bereichsschätzung

Mittelwert

Objektorientierte Programmiersprache

Stichprobenumfang

Datentyp

Jensen-Maß

Integraloperator

Varianz

Schätzwert

Markov-Ketten-Monte-Carlo-Verfahren

Betafunktion

Vektorraum

Thetafunktion

Betafunktion

Shape <Informatik>

38:05

Mathematische Modellierung

Inferenz <Künstliche Intelligenz>

Quader

Blackbox

Mathematisches Modell

Element <Mathematik>

Optimierung

Term

Computeranimation

Algorithmus

Komplex <Algebra>

Rechter Winkel

Konditionszahl

Stichprobenumfang

Schlussfolgern

Vorlesung/Konferenz

Optimierung

Lesen <Datenverarbeitung>

39:53

Algorithmus

Jensen-Maß

Datenanalyse

Browser

Bayes-Netz

Twitter <Softwareplattform>

Optimierung

Computeranimation

Gebundener Zustand

Variable

Statistische Analyse

Notebook-Computer

Hacker

40:43

Vektorpotenzial

Web Site

Klasse <Mathematik>

Mathematisches Modell

Wärmeübergang

Code

Streaming <Kommunikationstechnik>

Virtuelle Maschine

Variable

Stichprobenumfang

Jensen-Maß

Maßerweiterung

Parallele Schnittstelle

Schreib-Lese-Kopf

Parametersystem

Lineares Funktional

Template

Likelihood-Funktion

Statistische Analyse

Paarvergleich

Likelihood-Quotienten-Test

Teilmenge

Flächeninhalt

Parallelrechner

Projektive Ebene

Vollständiger Graph

Decodierung

Instantiierung

Fitnessfunktion

### Metadaten

#### Formale Metadaten

Titel | Probabilistic Programming in Python |

Serientitel | EuroPython 2014 |

Teil | 51 |

Anzahl der Teile | 120 |

Autor | Wiecki, Thomas |

Lizenz |
CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. |

DOI | 10.5446/20040 |

Herausgeber | EuroPython |

Erscheinungsjahr | 2014 |

Sprache | Englisch |

Produktionsort | Berlin |

#### Inhaltliche Metadaten

Fachgebiet | Informatik |

Abstract | Thomas Wiecki - Probabilistic Programming in Python Probabilistic Programming allows flexible specification of statistical models to gain insight from data. Estimation of best fitting parameter values, as well as uncertainty in these estimations, can be automated by sampling algorithms like Markov chain Monte Carlo (MCMC). The high interpretability and flexibility of this approach has lead to a huge paradigm shift in scientific fields ranging from Cognitive Science to Data Science and Quantitative Finance. PyMC3 is a new Python module that features next generation sampling algorithms and an intuitive model specification syntax. The whole code base is written in pure Python and Just-in-time compiled via Theano for speed. In this talk I will provide an intuitive introduction to Bayesian statistics and how probabilistic models can be specified and estimated using PyMC3. |

Schlagwörter |
EuroPython Conference EP 2014 EuroPython 2014 |