Introduction to Nonparametric Bayesian Models
Video in TIB AVPortal:
Introduction to Nonparametric Bayesian Models
Formal Metadata
Title 
Introduction to Nonparametric Bayesian Models

Title of Series  
Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2017

Language 
English

Content Metadata
Subject Area  
Abstract 
Introduction to Nonparametric Bayesian Models [EuroPython 2017  Talk  20170713  Anfiteatro 1] [Rimini, Italy] When we use supervised machine learning techniques we need to specify the number of parameters that our model will need to represent the data (number of clusters, number of Gaussians, etc.). Somewhat, we are making our model inflexible. In this talk we will study the nonparametric models, in specific, Bayesian Nonparametric Models (BNP) whose main purpose is getting more flexible models since that in BNP the parameters can be automatically inferred by the model. The outline is the next: Parametric vs Nonparametric models A review on probability distributions Nonparametric Bayesian Methods Dirichlet Process Python (and R maybe) libraries for NPB Conclusion

00:00
Intel
Linear regression
Observational study
Parameter (computer programming)
Distance
Mathematical model
Mathematical model
Wave packet
Thomas Bayes
Different (Kate Ryan album)
Software
Linear map
Capability Maturity Model
Machine learning
Computer font
Mathematical model
Linear regression
Moment (mathematics)
Nonparametric statistics
Parameter (computer programming)
Computer network
Line (geometry)
Limit (category theory)
Statistics
Mathematical model
Markov chain
Equations of motion
Process (computing)
Nonparametric statistics
Parametrische Erregung
Bayesian network
Hyperplane
Curve fitting
03:17
Wage labour
Tournament (medieval)
Multiplication sign
Execution unit
Moment (mathematics)
Data storage device
Line (geometry)
Parameter (computer programming)
Rule of inference
Mathematical model
Power (physics)
Wave packet
Hypothesis
Mathematical model
Degree (graph theory)
Process (computing)
Personal digital assistant
Term (mathematics)
Authorization
Speech synthesis
Pairwise comparison
Resultant
Russell, Bertrand
06:53
State observer
Metric system
Parameter (computer programming)
Mathematical model
Likelihood function
Inference
Estimator
Hash function
Species
Pairwise comparison
Algorithm
Bit
Twitter
Demoscene
Mathematical model
Proof theory
Googol
Maximum likelihood
Mixture model
Phase transition
Quantum
Species
Arithmetic progression
Bounded variation
Classical physics
Asynchronous Transfer Mode
Observational study
Gene cluster
Mathematical model
Graph coloring
Polarization (waves)
Hypothesis
Twitter
Wellformed formula
Term (mathematics)
Gamma function
Posterior probability
Numerical taxonomy
Pairwise comparison
Dialect
Matching (graph theory)
Volume (thermodynamics)
Database
Extreme programming
Equivalence relation
Word
Estimation
Information retrieval
Data center
Bayesian network
Business cluster
14:04
Complex (psychology)
Gene cluster
Similarity (geometry)
Infinity
Mathematical model
Mathematical model
Number
Sequence
Mixture model
Aeroelasticity
Selectivity (electronic)
Process (computing)
Gamma function
Area
Mathematical model
Kolmogorov complexity
Electronic mailing list
Infinity
Bit
Mathematical model
Process (computing)
Order (biology)
Chain
Nonparametric statistics
Parametrische Erregung
Bayesian network
Table (information)
Business cluster
17:44
Cluster sampling
Distribution (mathematics)
State of matter
Multiplication sign
Workstation <Musikinstrument>
1 (number)
Parameter (computer programming)
Mereology
Mathematical model
Perspective (visual)
Likelihood function
Square number
Process (computing)
Library (computing)
Machine learning
Pattern recognition
Algorithm
BayesEntscheidungstheorie
Theory of relativity
Digitizing
Point (geometry)
Electronic mailing list
Parameter (computer programming)
Maxima and minima
Complete metric space
Mathematical model
Maximum likelihood
Beta distribution
Mixture model
Order (biology)
Parametrische Erregung
Web page
Observational study
Parametrische Erregung
Gene cluster
Similarity (geometry)
Infinity
Mathematical model
Rule of inference
Power (physics)
Number
Mixture model
Electronic data processing
Distribution (mathematics)
Business cluster
Mathematical model
Nonparametric statistics
Human migration
Personal digital assistant
Nonparametric statistics
Bayesian network
Pressure
Business cluster
Library (computing)
24:54
Software developer
Multiplication sign
Archaeological field survey
Incidence algebra
Mathematical model
Mathematical model
Power (physics)
Markov chain
Mathematical model
Mixture model
Phase transition
Right angle
Gamma function
00:06
hi uh and model difference and I want to
00:10
talk about some things our for a nonparametric bias and most of the things I would say our based on some tutorial from these researchers someone they're small I I'm can actually had wanted you to make things we will see recap or in the definition for model in and itself have and I will show the most simple example for nonparametric by its that I that studies clustering a Hollywood show you hold we usually do uh the clustering and that they're not the approach that this the nonparametric model and well I would say you know something and and I will try to say some journalists had this I hope you understand uh so OK the or what we don't limit to their knees of can I say that not only by the main idea is to add modify the values all for some parameters in some training process and world peace the it's almost 30 if all of my proximity learning so it's not common to hear parametric models but almost all models that we're using right now our parametric but there and and all the grouplets so maturity models that are a nonparametric and for me that they are very interesting and and the need to this discuss the idea so our just to remember the union regression where parameters are well aware of the 2 values so we can move a line in the plane or on a hyperplane and well being there on that were successful was saying but all of this is stuff percept and had the confusing but it's the same we modify the ways and that set their coastal I know human mark models we have some kind of ways that are there she can bias and we modified them so then taking you know have similar to moments in any so I want to ask you something but
03:20
therefore you you'll know which 1 is the best model as think on their blue line as I I don't know what kind of time and they're at line the model so for a you are how many of dual seeing that the model a piece the best 1 on how many of which deemed that there model B is the best 1 the but the a N well I at Iraq had because I didn't show you the complete data and will be in the last step we can see that they're lame will drop down there might the so in this case this model these uh there at line by unknown 32 there 1 in the the letter but so then it's not so easy Lajos just modifying the parameters unit training process uh data science sometimes it's rather than a scientist and power so our let's think I don't know very someone from geese here those guys were doing this talk of marketing so imagine that uh we are buying the stocks but in some moment you need to stop by because it would be risky so their mother in this idea is of a good model in that kind of and speech scenario at but maybe you are asking him where his main labor tournament results in the give is to repeat the you know the history OK had so it's very funny because you might in that want to OK that thesis smart distort store k is used because this philosophy in the degrees so he may conclusions and then each morning he receives uh the full from she's honor and then she started seeing OK I'm receiving full of nite and a Marine the rainy days in a days weekends so she uh start to conclude that that she would receive full for from a lot of time but that is not the case then increase must they because the next day and they there will to dead trapped authorities to k so this model is very good in that scenario and I like that she served world from Britain do so so arretine that you know these formal and of course the by this rule and well I would like to have a repeat what the it goes I mean uh well that the star we did little term
06:47
that is our prior knowledge they might in that the
06:53
conclusions that our bit to recreate is getting every day so but this uh reality it we change um we change the because of the observations let's Dean the the dataset they them or get observations even in their our hypotheses or polar parameter and at the end that we will have an annual reasoning that is called the posterior volume that 1 in direct indirect color and uh so well we defined already led a model these so now we're talking about or by yes and reasoning that some piece of all I seem very old but this is still very useful on very popular the so the idea is that will this was before they're much in the interval was shown on the data center was shown so they gave that manipulating probabilities we can make inference so uh let's CDs for the formal uh we want to get their marks in argument in divided for of so this called maximumaposteriori for example from this the tool to this extreme you know why we believe there the Eastern you know here's person but not here and you tell me because they're like 20 formless ladies and not just king uh well so we can see that this term is the not affecting the other 1 so we deleted and also if we have at all no prior knowledge we can say that the probability of data it that all for our quantity of our where I put this issue saying so also weakened the need for that and we finish with this uh maximum likelihood estimation that is a very important formula and it's not so hard so we will see maximum likelihood estimation in almost every algorithm is you want to prove that you migrating is correct just try to uh match with these marks and likelihood estimation and I will come back to all where I use I want to mention that there is a very nice paper where it in the paper got a study the history of this much and likelihood estimation so it's not a general see the progress and I don't remember I have the title but on google if you're interested so now we know our moral somebody's and reasoning and we know that is present in many agreed in so much regions absolutely existing on some problems with data uh but a place always evolving so for example imagine Wikipedia the 1st years I I don't know a whole many articles for the topic of the articles but let's say that they were used biology and Chen through and then in the next years do uh Wikipedia's start to have articles in a sports or biographies from Matisse so there dates are possible in the same way that speciation the planet every week or every few days the biologists are discovering new species so they need to modify somewhat the taxonomy sometimes How and then this evolving the data um this scene in the social that words they are uh the ball B every 2nd for example the harsh thoughts on Twitter and so are how we usually uh there's the problem for example clustering the there is some kind of common way to do it and 1 classic approach so let's say that we want to use max what notion meets to models and this scene indigo Shamit's tour ass something that this program but for the maximum likelihood so there it goes and meets your models this this equivalent knowledge quantum about it complains we to not completes it fits with their much later it this is stop a claim without proof I want grow uh but did go then India gulshan needs to model the use of by his recent so yet uh we can do some clustering and maybe we Gaussian is Nietzsche during but then some question arises homey clusters do we need if not my database evolving and some of them we usually great uh too many turnovers and then you allow comparison
13:38
I don't know maybe with variation information retrieval or silhouette of the of matrix and in these phase the the best clustering where words of 5 the the notion the I don't know if you can see the so this 1 yeah it seems that clustering but
14:05
virtually also between so that
14:10
this will work we usually do but listing then having an order approach so we have seen their parametric approach but listing in a nonparametric approach and nonparametric uh the gum be confused play there's no parameters subtle but actually been we're Fermentas these you few and but also the use of other selection of order so the idea is that we have an empty clusters and infini and the clusters and we we start to see them we store data so in this way we can solve that problem if I were more there's uh can be adapted to the complexity of our data it itself and you will let's see these there are so we have a known divided similar there's something our number made 3 and those names our phoneme had led you know Chinese restaurant and Indian buffet I know of sometimes the sciences 1 in and you know I am from Mexico and thinking that if I studied in on some of the labeling process I can conclude of and other similar modeling and all that makes a contact area process maybe so this is a modest our known us beta process and I will explain a bit Chinese restaurant process because for me it's very intuitive so imagine that we are in Chinese restaurants and usually the chinese restaurants in California team their huge so this 1 scientists discovered and he says all the same interests sciences are huge so so you can go there and if their 1st sounds these anti you can choose any table and then the 2nd customer will go and we choose another table or the same way with some probability and then at the end as we analyze the uh what is happening with a little was tumors are going to the science is a kind of clustering and be stable it's 1 cluster and well that's critical so that is the idea of the model the chain for some process 1 model that is evolving so this is the clustering for or the previews of data we so but so we do the the thing in infinite Gaussian mixture model that is of some mm of data processed
17:44
so and we can go where it's even though you can do with a parametric model for example digit recognition for topic modeling on well actually I
18:00
on in the completion so let's recap there in the traditional approach or sorry the that makes some kind letters instead of the top of the list in probability of page while the number of moles of parameters a fixed and we have some distribution over those parameters and in the other case the nonparametric models we assume that we have you seen the number of clusters on our data can be adapted to all none of our model can be adapted to the state so there are some migration by 2 of who it's square and it's scholar of about day they only half life to a treaty for 1 1 or 2 uh daily and algorithms and did this 1 in my opinion because it's more like in the research stuff is this 1 data microscopes but I don't like so much because they only a bilevel Kondor and not in the officials in by Tom Gruber mn but uh actually get are in diverse library they have their slavery so uh if we want to know more about these we well and we can study of what did better distribution it's uh in child beta distribution is just the world have probability of probabilities and then we have to died of distribution that is the part some generalization from replace relation then that means that we use our distribution of distributions the and finally to data data process and what if you also want to mn to read more about in the world of Mars sorry who much your needs that long written by Tom Mitchell and this tutorial rhinitis and the library I mentioned the half also very nice tutorials on you can take it and will that's all that if so all have of and you use a microphone and speak directly to him soul you mentioned the Gaussian mixture model is that considered a facial model the if the the thereby descend model it's not it's not a Gaussian mixture model the synoptic strictly my dissimilar but uh the algorithm that we use to but the clustering is the respect take perspective from maximum and this 1 the use of the marginal likelihood so the maximum likelihood is on deriving from and the and their wages so again so what we can say that that a piece borrow of budgets and reasoning by user model and actually there is some discussion lady most of what we know as by models they're not exactly by using models about yours studies prelearning and then listing that if we use couples and we save for everything that we are doing the things from Newton and it's not exactly laid that was new times less important articles but of the others much magicians that also called things in that order so it's almost same with by US things by a sentence and sometimes it's not exactly is it but the house power behind this by rule the OK what questions so we have plenty of time have pressure question was in China 4 key themes in always set from someone again all correct the i can ask the Secretary of scientists trying to understand this station models is it's like tank URI of models or is is a of like a wave of of using different other models because I mean I I know the name Gaussian mixture model and I'm trying to understand Bayesian models is it also architec URI of other models was the way of using a model it is and just the yeah there's some was that are is strictly by and for example of the that saves Whiteson worst we have uh strictly some 2 probabilities that uh In not independent depend on their 2 different they depend on each each other so we can say that our bodies in that course of strictly and the and there are other ones with canteen and I will use him others
24:56
for killing Markov models because yeah we have some of nodes on the half hour dependency Bob in these dependencies a probability about but all it's sexy what can and then got the Gaussian mixture model us about his model but it's still a size that it cost from and things inside have right is kept as noted it was a good dancer so well maybe can finish that you know I the area of moral and so for developer but I am not doing so for development anymore because this kind of boring so then I decided to go to must sink appears science and time doing out of the phase takes now but I am also and statistician so Bertrand Drusus survey and note the philosopher by another matematician either solely on us and power have so it is the same the last question the incidence of tools and 2nd wall fj