Introduction to Nonparametric Bayesian Models

Video in TIB AV-Portal: Introduction to Nonparametric Bayesian Models

Formal Metadata

Title
Introduction to Nonparametric Bayesian Models
Title of Series
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Identifiers
Publisher
Release Date
2017
Language
English

Content Metadata

Subject Area
Abstract
Introduction to Nonparametric Bayesian Models [EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 1] [Rimini, Italy] When we use supervised machine learning techniques we need to specify the number of parameters that our model will need to represent the data (number of clusters, number of Gaussians, etc.). Somewhat, we are making our model inflexible. In this talk we will study the nonparametric models, in specific, Bayesian Nonparametric Models (BNP) whose main purpose is getting more flexible models since that in BNP the parameters can be automatically inferred by the model. The outline is the next: Parametric vs Nonparametric models A review on probability distributions Non-parametric Bayesian Methods Dirichlet Process Python (and R maybe) libraries for NPB Conclusion
Intel Linear regression Observational study Parameter (computer programming) Distance Mathematical model Mathematical model Wave packet Thomas Bayes Different (Kate Ryan album) Software Linear map Capability Maturity Model Machine learning Computer font Mathematical model Linear regression Moment (mathematics) Non-parametric statistics Parameter (computer programming) Computer network Line (geometry) Limit (category theory) Statistics Mathematical model Markov chain Equations of motion Process (computing) Non-parametric statistics Parametrische Erregung Bayesian network Hyperplane Curve fitting
Wage labour Tournament (medieval) Multiplication sign Execution unit Moment (mathematics) Data storage device Line (geometry) Parameter (computer programming) Rule of inference Mathematical model Power (physics) Wave packet Hypothesis Mathematical model Degree (graph theory) Process (computing) Personal digital assistant Term (mathematics) Authorization Speech synthesis Pairwise comparison Resultant Russell, Bertrand
State observer Metric system Parameter (computer programming) Mathematical model Likelihood function Inference Estimator Hash function Species Pairwise comparison Algorithm Bit Twitter Demoscene Mathematical model Proof theory Googol Maximum likelihood Mixture model Phase transition Quantum Species Arithmetic progression Bounded variation Classical physics Asynchronous Transfer Mode Observational study Gene cluster Mathematical model Graph coloring Polarization (waves) Hypothesis Twitter Well-formed formula Term (mathematics) Gamma function Posterior probability Numerical taxonomy Pairwise comparison Dialect Matching (graph theory) Volume (thermodynamics) Database Extreme programming Equivalence relation Word Estimation Information retrieval Data center Bayesian network Business cluster
Complex (psychology) Gene cluster Similarity (geometry) Infinity Mathematical model Mathematical model Number Sequence Mixture model Aeroelasticity Selectivity (electronic) Process (computing) Gamma function Area Mathematical model Kolmogorov complexity Electronic mailing list Infinity Bit Mathematical model Process (computing) Order (biology) Chain Non-parametric statistics Parametrische Erregung Bayesian network Table (information) Business cluster
Cluster sampling Distribution (mathematics) State of matter Multiplication sign Workstation <Musikinstrument> 1 (number) Parameter (computer programming) Mereology Mathematical model Perspective (visual) Likelihood function Square number Process (computing) Library (computing) Machine learning Pattern recognition Algorithm Bayes-Entscheidungstheorie Theory of relativity Digitizing Point (geometry) Electronic mailing list Parameter (computer programming) Maxima and minima Complete metric space Mathematical model Maximum likelihood Beta distribution Mixture model Order (biology) Parametrische Erregung Web page Observational study Parametrische Erregung Gene cluster Similarity (geometry) Infinity Mathematical model Rule of inference Power (physics) Number Mixture model Electronic data processing Distribution (mathematics) Business cluster Mathematical model Non-parametric statistics Human migration Personal digital assistant Non-parametric statistics Bayesian network Pressure Business cluster Library (computing)
Software developer Multiplication sign Archaeological field survey Incidence algebra Mathematical model Mathematical model Power (physics) Markov chain Mathematical model Mixture model Phase transition Right angle Gamma function
hi uh and model difference and I want to
talk about some things our for a nonparametric bias and most of the things I would say our based on some tutorial from these researchers someone they're small I I'm can actually had wanted you to make things we will see recap or in the definition for model in and itself have and I will show the most simple example for nonparametric by its that I that studies clustering a Hollywood show you hold we usually do uh the clustering and that they're not the approach that this the nonparametric model and well I would say you know something and and I will try to say some journalists had this I hope you understand uh so OK the or what we don't limit to their knees of can I say that not only by the main idea is to add modify the values all for some parameters in some training process and world peace the it's almost 30 if all of my proximity learning so it's not common to hear parametric models but almost all models that we're using right now our parametric but there and and all the grouplets so maturity models that are a non-parametric and for me that they are very interesting and and the need to this discuss the idea so our just to remember the union regression where parameters are well aware of the 2 values so we can move a line in the plane or on a hyperplane and well being there on that were successful was saying but all of this is stuff percept and had the confusing but it's the same we modify the ways and that set their coastal I know human mark models we have some kind of ways that are there she can bias and we modified them so then taking you know have similar to moments in any so I want to ask you something but
therefore you you'll know which 1 is the best model as think on their blue line as I I don't know what kind of time and they're at line the model so for a you are how many of dual seeing that the model a piece the best 1 on how many of which deemed that there model B is the best 1 the but the a N well I at Iraq had because I didn't show you the complete data and will be in the last step we can see that they're lame will drop down there might the so in this case this model these uh there at line by unknown 32 there 1 in the the letter but so then it's not so easy Lajos just modifying the parameters unit training process uh data science sometimes it's rather than a scientist and power so our let's think I don't know very someone from geese here those guys were doing this talk of marketing so imagine that uh we are buying the stocks but in some moment you need to stop by because it would be risky so their mother in this idea is of a good model in that kind of and speech scenario at but maybe you are asking him where his main labor tournament results in the give is to repeat the you know the history OK had so it's very funny because you might in that want to OK that thesis smart distort store k is used because this philosophy in the degrees so he may conclusions and then each morning he receives uh the full from she's honor and then she started seeing OK I'm receiving full of nite and a Marine the rainy days in a days weekends so she uh start to conclude that that she would receive full for from a lot of time but that is not the case then increase must they because the next day and they there will to dead trapped authorities to k so this model is very good in that scenario and I like that she served world from Britain do so so arretine that you know these formal and of course the by this rule and well I would like to have a repeat what the it goes I mean uh well that the star we did little term
that is our prior knowledge they might in that the
conclusions that our bit to recreate is getting every day so but this uh reality it we change um we change the because of the observations let's Dean the the dataset they them or get observations even in their our hypotheses or polar parameter and at the end that we will have an annual reasoning that is called the posterior volume that 1 in direct indirect color and uh so well we defined already led a model these so now we're talking about or by yes and reasoning that some piece of all I seem very old but this is still very useful on very popular the so the idea is that will this was before they're much in the interval was shown on the data center was shown so they gave that manipulating probabilities we can make inference so uh let's CDs for the formal uh we want to get their marks in argument in divided for of so this called maximum-a-posteriori for example from this the tool to this extreme you know why we believe there the Eastern you know here's person but not here and you tell me because they're like 20 formless ladies and not just king uh well so we can see that this term is the not affecting the other 1 so we deleted and also if we have at all no prior knowledge we can say that the probability of data it that all for our quantity of our where I put this issue saying so also weakened the need for that and we finish with this uh maximum likelihood estimation that is a very important formula and it's not so hard so we will see maximum likelihood estimation in almost every algorithm is you want to prove that you migrating is correct just try to uh match with these marks and likelihood estimation and I will come back to all where I use I want to mention that there is a very nice paper where it in the paper got a study the history of this much and likelihood estimation so it's not a general see the progress and I don't remember I have the title but on google if you're interested so now we know our moral somebody's and reasoning and we know that is present in many agreed in so much regions absolutely existing on some problems with data uh but a place always evolving so for example imagine Wikipedia the 1st years I I don't know a whole many articles for the topic of the articles but let's say that they were used biology and Chen through and then in the next years do uh Wikipedia's start to have articles in a sports or biographies from Matisse so there dates are possible in the same way that speciation the planet every week or every few days the biologists are discovering new species so they need to modify somewhat the taxonomy sometimes How and then this evolving the data um this scene in the social that words they are uh the ball B every 2nd for example the harsh thoughts on Twitter and so are how we usually uh there's the problem for example clustering the there is some kind of common way to do it and 1 classic approach so let's say that we want to use max what notion meets to models and this scene indigo Shamit's tour ass something that this program but for the maximum likelihood so there it goes and meets your models this this equivalent knowledge quantum about it complains we to not completes it fits with their much later it this is stop a claim without proof I want grow uh but did go then India gulshan needs to model the use of by his recent so yet uh we can do some clustering and maybe we Gaussian is Nietzsche during but then some question arises homey clusters do we need if not my database evolving and some of them we usually great uh too many turnovers and then you allow comparison
I don't know maybe with variation information retrieval or silhouette of the of matrix and in these phase the the best clustering where words of 5 the the notion the I don't know if you can see the so this 1 yeah it seems that clustering but
virtually also between so that
this will work we usually do but listing then having an order approach so we have seen their parametric approach but listing in a non-parametric approach and nonparametric uh the gum be confused play there's no parameters subtle but actually been we're Fermentas these you few and but also the use of other selection of order so the idea is that we have an empty clusters and infini and the clusters and we we start to see them we store data so in this way we can solve that problem if I were more there's uh can be adapted to the complexity of our data it itself and you will let's see these there are so we have a known divided similar there's something our number made 3 and those names our phoneme had led you know Chinese restaurant and Indian buffet I know of sometimes the sciences 1 in and you know I am from Mexico and thinking that if I studied in on some of the labeling process I can conclude of and other similar modeling and all that makes a contact area process maybe so this is a modest our known us beta process and I will explain a bit Chinese restaurant process because for me it's very intuitive so imagine that we are in Chinese restaurants and usually the chinese restaurants in California team their huge so this 1 scientists discovered and he says all the same interests sciences are huge so so you can go there and if their 1st sounds these anti you can choose any table and then the 2nd customer will go and we choose another table or the same way with some probability and then at the end as we analyze the uh what is happening with a little was tumors are going to the science is a kind of clustering and be stable it's 1 cluster and well that's critical so that is the idea of the model the chain for some process 1 model that is evolving so this is the clustering for or the previews of data we so but so we do the the thing in infinite Gaussian mixture model that is of some mm of data processed
so and we can go where it's even though you can do with a parametric model for example digit recognition for topic modeling on well actually I
on in the completion so let's recap there in the traditional approach or sorry the that makes some kind letters instead of the top of the list in probability of page while the number of moles of parameters a fixed and we have some distribution over those parameters and in the other case the nonparametric models we assume that we have you seen the number of clusters on our data can be adapted to all none of our model can be adapted to the state so there are some migration by 2 of who it's square and it's scholar of about day they only half life to a treaty for 1 1 or 2 uh daily and algorithms and did this 1 in my opinion because it's more like in the research stuff is this 1 data microscopes but I don't like so much because they only a bilevel Kondor and not in the officials in by Tom Gruber mn but uh actually get are in diverse library they have their slavery so uh if we want to know more about these we well and we can study of what did better distribution it's uh in child beta distribution is just the world have probability of probabilities and then we have to died of distribution that is the part some generalization from replace relation then that means that we use our distribution of distributions the and finally to data data process and what if you also want to mn to read more about in the world of Mars sorry who much your needs that long written by Tom Mitchell and this tutorial rhinitis and the library I mentioned the half also very nice tutorials on you can take it and will that's all that if so all have of and you use a microphone and speak directly to him soul you mentioned the Gaussian mixture model is that considered a facial model the if the the thereby descend model it's not it's not a Gaussian mixture model the synoptic strictly my dissimilar but uh the algorithm that we use to but the clustering is the respect take perspective from maximum and this 1 the use of the marginal likelihood so the maximum likelihood is on deriving from and the and their wages so again so what we can say that that a piece borrow of budgets and reasoning by user model and actually there is some discussion lady most of what we know as by models they're not exactly by using models about yours studies prelearning and then listing that if we use couples and we save for everything that we are doing the things from Newton and it's not exactly laid that was new times less important articles but of the others much magicians that also called things in that order so it's almost same with by US things by a sentence and sometimes it's not exactly is it but the house power behind this by rule the OK what questions so we have plenty of time have pressure question was in China 4 key themes in always set from someone again all correct the i can ask the Secretary of scientists trying to understand this station models is it's like tank URI of models or is is a of like a wave of of using different other models because I mean I I know the name Gaussian mixture model and I'm trying to understand Bayesian models is it also architec URI of other models was the way of using a model it is and just the yeah there's some was that are is strictly by and for example of the that saves Whiteson worst we have uh strictly some 2 probabilities that uh In not independent depend on their 2 different they depend on each each other so we can say that our bodies in that course of strictly and the and there are other ones with canteen and I will use him others
for killing Markov models because yeah we have some of nodes on the half hour dependency Bob in these dependencies a probability about but all it's sexy what can and then got the Gaussian mixture model us about his model but it's still a size that it cost from and things inside have right is kept as noted it was a good dancer so well maybe can finish that you know I the area of moral and so for developer but I am not doing so for development anymore because this kind of boring so then I decided to go to must sink appears science and time doing out of the phase takes now but I am also and statistician so Bertrand Drusus survey and note the philosopher by another matematician either solely on us and power have so it is the same the last question the incidence of tools and 2nd wall fj
Feedback