Yak shaving a good place to eat using non negative matrix factorization


Formal Metadata

Yak shaving a good place to eat using non negative matrix factorization
Title of Series
Part Number
Number of Parts
Petrich, Adriano
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place
Bilbao, Euskadi, Spain

Content Metadata

Subject Area
Adriano Petrich - Yak shaving a good place to eat using non negative matrix factorization Trying to find a good place to eat has become much easier and democratic with online reviews, but on the other hand, that creates new problems. Can you trust that 5 star review of fast food chain as much as the 1 star of a fancy restaurant because "Toast arrived far too early, and too thin"? We all like enjoy things differently. Starting of on the assumption that the "best pizza" is not the same for everyone. Can we group users into people that has similar tastes? Can we identify reviews and restaurants to make sense of it? Can that lead us to a better way to find restaurants that you like? Using some data handling techniques I walk you through my process and results that I've got from that idea. There are no requisites for this talk except basic python and math knowledge (matrices exist)
EuroPython Conference
EP 2015
EuroPython 2015
Principal ideal Plane (geometry) Computer animation Lattice (order) Lecture/Conference Multiplication sign Knot
Beat (acoustics) Process (computing) Divisor Confidence interval Distribution (mathematics) Control flow Insertion loss Parameter (computer programming) Density of states Average Metric tensor Residual (numerical analysis) Goodness of fit Computer animation Bit rate Dedekind cut Eigenvalues and eigenvectors Website Statistics Absolute value
Standard deviation Rotation System call Structural load Euler angles State of matter Scientific modelling Mathematical singularity Home page Insertion loss Icosahedron Mereology Weight Matrix (mathematics) Collective intelligence Medical imaging Sign (mathematics) Matrix (mathematics) Video game Bit rate Military operation Hausdorff dimension Cuboid Extension (kinesiology) Position operator Physical system Metropolitan area network Electronic mailing list Ext functor Bit Mereology Port scanner Measurement Maxima and minima Category of being Lattice (order) Vector space Graph coloring Hausdorff dimension Order (biology) Personal area network Digitizing Resultant Annihilator (ring theory) Ocean current Einheitskugel Connectivity (graph theory) Real number Student's t-test Average Hand fan Number Hypothesis Arithmetic logic unit Subtraction Mutual information Task (computing) Form (programming) Data type Beat (acoustics) Multiplication Forcing (mathematics) Mathematical analysis Independence (probability theory) Grand Unified Theory Local Group Approximation Plane (geometry) Subject indexing Spring (hydrology) Computer animation Personal digital assistant Negative number Matrix (mathematics)
Metropolitan area network Matrix (mathematics) Computer animation Structural load Structural load Data acquisition Water vapor Gamma function Matrix (mathematics) Matrix (mathematics) Maxima and minima
Metropolitan area network Greatest element Auto mechanic Bit Grand Unified Theory Weight Matrix (mathematics) Sparse matrix Matrix (mathematics) Computer animation Forest Electronic visual display Matrix (mathematics) Resultant
Metropolitan area network Radius Computer animation Function (mathematics) Electronic visual display Affine space Social class
Metropolitan area network Cloud computing Code Sound effect Client (computing) Causality Sic Computer animation Vector space String (computer science) Data acquisition Data Encryption Standard Aerodynamics Electronic visual display
Metropolitan area network Category of being Goodness of fit Computer animation Linear programming Code Personal area network
Metropolitan area network Computer programming Computer animation Function (mathematics) Code
Metropolitan area network Multiplication Computer animation Divisor Natural number Function (mathematics) Code Similarity (geometry) Pattern language Insertion loss
Metropolitan area network Vapor barrier Computer animation Function (mathematics) Executive information system Code Local ring
Meta element Link (knot theory) Demo (music) Chemical equation Finitary relation Morley's categoricity theorem Bit Water vapor Rule of inference Theory Category of being Voting Computer animation Task (computing) Mutual information Resultant
Metropolitan area network Hausdorff space Computer animation Function (mathematics) Code Aerodynamics Grand Unified Theory Whiteboard Matrix (mathematics) Data type
Area Metropolitan area network Structural load Closed set Source code Interface (computing) Total S.A. Paradox Shape (magazine) Mass Price index Urinary bladder Maxima and minima Summation Computer animation Bit rate Quicksort Matrix (mathematics) Units of measurement
Metropolitan area network Sign (mathematics) Matrix (mathematics) Computer animation Structural load Subtraction Resultant
Computer animation Lecture/Conference Logarithm Computer file Resultant
let me give you make from was beginning 1st of it's my opinion not that that and really it creates like this that look OK from this and I
think it's all it was website was created by having move people that beating at meetings places that had like reviews that saves the best feature that place whatever so that's what started Brazilian Python is that most for the weather demands and principal at time million people city so moved from the user into the 140 thousand people the force large
lost negative made a lot of in Scotland hold the cake and but little the so again from this not 2 minutes away from my place so just because that define what is a good place to eat but not we don't know what it is such a lot about that this talk is about let you can go through to the end of the 1st place and factors you it's going to be nice but I'm worried that
on the notes about what that person in 19 voted 314 such as and that's what this site about how to the 1st idea would be like stars and the ratings and if you get something like this what this about what there was nothing from this process and the McDonald's or very good beaches places so so we really don't have to suggest that also if it is a McDonald's in the get like 5 cents for a lot of people might be and the associated with the best 1 so what does that affect all of these just that scare likes to alleviate Morrison would be done but the sound like and be it is just not sane and all the rating sites found that really early on as anybody just from kindergarten was say that break is not gonna work you have to to use the lower bound all fall within score confidence interval for a binaural parameter of course so another good metric that might try to move the of ratings so you can get it and put forth residue had led much for the reviews the demand will be good but it's not nearly enough to know that for the rest 2nd reference you got what's called loss is going what because the
entire when it has a lot of terrible will review of X and reviews the thing is that people tend to fall to the floor devoted and very bad and average just you will feel that it was right so that helps but doesn't solve the problem so now for something completely different and you when is that you yeah yes that's good so trying and unless you have difficulty about community that is it for Wikipedians they may be right at the death metacommunication also between you after home so as a funny thing about methods in the case of is that the abroad attitude you is that most of the matter that continued for this task or you get different results with different dimensions and that was by the bond order that is so the trick here is that not to multiply 2 matrices the middle values that is the columns for the 1st meeting and the roles of the 2nd half the DC number and the user learning with the Department what all use epistemology and that's going to the p so that's something really nice but also called for that a lot of immigrants of independent of the 19th century it was the population of the city 35 per cent would be balanced 11 % were reportedly that's almost 50 per cent of the students in the 7th sentence when it got close to 1940 it was almost 700 thousand readout and all other models at that that it is for the state the 2nd from the component parts it but all but in the summertime and quick and that's of the world Englishes solid that that was less than 1 million people so the of that and a lot of images that they create some coats were also to be called promote close to what you have written as well with the you know the is that the new New York New York style pizza we do have a so-called pizza and that that's important because it comes to my hypothesis that people that have the same background check the things just use things that if you have a culture of something you going to all and all of you have a single treatment the same is ball has advantage at different intensity try to prove it that is the first one it but it's not even the top places and never been there but call by the galson people agree quite well that's a good pizza if it was a bit it's going to be lower but still be as tight as well the 2nd 1 is the the best pizza place in the and it's also but look holds 1st egophony and then the has been % of of people from from abroad and each 1 has a different idea what it would be if and only have a so that the find but indeed the same thing happens you well the force on the system is not even again they are going to be the 1 the 1st thing that on the unit ball and it's a solid but again a very tight of loss and a lot of people know that so for the default before but they want to remember it's officership places and people that have a group of researchers digits and also very tight and the gossip that had from like people that will not those so many boxes from well the model we have talked about that size is that just a exchange and monitoring the goal the rotation of analysis repeated so this is just the sum of current and is that that is what makes it a good place to be is based on individual background and that's where the blue based on the individual links that if you're going from multiple reviews from someone you better you the chance that you know like another rest and that the light will be much more likely and then we go back to linear house no food suppose that I have to do with a gun somewhere for a friend a huge list also hopes using that the amount of stars for and have this of those the evidence for the 500 top restaurants in Dubai also what do that course loading matrix and matrix that is residents for its role uses for its color and the value in the middle of the vector for each position is just 1 of the things that they gave you and colleagues and doesn't have initial-stressed amusing so the same way that you could remove the measures that can also create them so if I have a matrix and the 1 created to new matrix that if it can make that's question
I can generate insights that what with this extra-dimensional created and that's quite useful because because suppose that I create a matrix that's really an approximation rate exceeds the separate approximation that matrix uses that got from real data but this is the article the radical 1 at created that and that was is the result of users or of precedents with some that was that the 2 and a matrix of users with the same number of categories if I can multiply them and treatment of C. that again it's but through the position of to some extent we much classified the reference and the users in this category and that much of what are nonnegative matrix position this is only to treat weights for automatically generated that that there is so you don't know what it's it's kind various forms beforehand sometimes it they don't even make sense but sometimes you can get there from which the world's worst mission what the young no and then through the spring which on the tips estimates that can tell me all reference has discussed everything in different ways and they can try to match those to all the residents and try to find present that they might like this and that will that allow OK social so you want the non-negative part of it and that's the thing you should have faith it's just that if you keep it all positive all of it and greater equal to 0 it was as if a lot of new ways of generating C. because of 1 of the most usual ways of doing that it's the least so of the least squares and that's gonna be easier if you do just with with positive forms the and orchestrated because stars that received to fight so achieves this is still there is taken from the book Programming the collective intelligence it's so slightly older book I get it doesn't care and the problem of but it's 1 of those great books that when you return to them in french spanish find different that's of it's all done by it's probably 1 of the most valuable to going to heaven your bookshelf so this is just let is always for that and then just left the comment was that majority of 0 you so 1st start are use just random value and then inserting the ratings and for this reason couple called different C and are and then there is the same you just access but that's more mood didn't doesn't happen to like real life data but then you fixate 1 of the major and integrated the weights for the other 1 benefits that they all belong to try to predict the index for the weights for the long go click again and try to prove it and see what get and as much as how about the we have this is hot
water because can go from the new from you so very much like different concepts of what I've been distinguished just loads a bunch of import and natural indirectly so as long as the data that I got OK so if someone around the nominated and that's the loses just run on here but I have just run it forgot the name matrix that's we which so it is 11 thousand of users and
500 a D from that they got the the rest of the matrix
so that is at the bottom of the with
the and weights and they have and also vary the weights for the users so if you look at here this that use 0 voltage for for forests and 0 and so forth as a 5 year that the sparse matrix and the more connected to that has the better results than defense then makes us like to try to find people that holds will likely likewise for all the rest of the you have elects suggesting to support against the mechanism the bigger that much here too a bit of a predefined of that so OK old so this is the rest so we have reference here is the fact is that the role the you're out and that was for 500 so for the 1st and last you have to use that just transpose the matrix so both makes make it more likely the coloring but mess I and
so and so has so again
uses the radius of a lot of them is just like people political to class so they don't really really influence all someone affinity means that if all the just will and here I
do have some of our some kind of magic to get people and sold them all of them can be delivered back this but they
as they get the rest that like and this is how they look at come on I should have cultural OK so here's the problem so it has a huge vector for which we don't know what it is a large effect on the effect of the and that want to so we can make a lot of clients in the restaurants so this is just the basic find similar that I found it's virtually but it's
not the best diversity is probably use some linear programming to find the along with the the
different but here have In the which is a solid for because in the becomes 0 here but you can't see it because the
4th like 14 for which huge for and from that you can get an idea of what that there might be because the same compare that apples good beaches and of cooking because public schools so that might be in the kitchen properties in public schools in a market gospel also hit for the last as 0 well that that means it is method to see don't have
that much of the the city residents
netted among all and love
is still there and that's all Europe because of I was looking at didn't seem that interesting but that's something that has a huge for has a at 0 0 of from to so that's something that these programs the way of thinking just suggest a new place to do looking
it is the will of the people that they didn't but this is just
setting the similarity between the current all the facts of the same so all on the same all the rest of the day you can also do that so the users that you have the factors here that are correlated some factors so if a friend who voted for all multiple lot for of factor for might be people that like this kind of like the fluorescent and they might be interested to find what the loss so the patterns this next the use of these extreme peaks to both a lot at 0 at 3 AM that might be interesting was the what is the voted for what is the nature of this kind of stuff and then from then you can try to find from residents that people like and again it is in the
middle of figuring out this a
different 1 it's local barrier
gospel again he was and that's it there are OK focused
but I suppose if you think that for changing the sub following the the military the right place
sorry and that's it thank you but to hi I and you ended up with 9 categorization in your example was suspected of 10 so what what was if you ended up with 9 categories it's always it's absolutely 9 votes than could use a little bit than what we just a random little value all I tried this plenty it got the results of the Monday there is you had the closer they get to through the more more similar the natives are but also make it harder for certain makes less sense of the it's that theory I got a balance about this that satisfy an iterative 20 and we must then was like this it's bottom that's as much water is also when you run a demo inflation so that was about to or rules the you have
have to here welcome on board the ship
so far the beautiful and not don't so yes it can get in going called yeah so this is stored interactions of .
a total of that they're not put so they are
calculating it should build putting something like so it's a sort of a medical claim that show you how to present thank you more useful so you have having internal that's is it true they can't sources tell how much going like this this worse just by looking at the shape of the units just go on on you will review or something like that and an optimal reunited CT and in this area is the sum of 2 just too speculative masses you get if you were on yeah yeah the the the justice said that interface of all of but I can tell you refer from the user it's a huge in from looking at the shape you can clearly see always these again to take shape their and depend on a whole of the terrible 1 it could be an indication that a lot of averaging and and poor ratings that are not seen so that's something that sort of thing as close as you can see it is to
to to to talk about gossip gossip it's usually a good sign but even though it allows 1 might mean just that the rest is not really a stable really doesn't always do the stuff to do things so what the do not being into some not pleased to see but it does have some of the thank you Over here
can thus so so we can use the public good results come all this is the difference and the other goals it do something given reduce anymore and that was the goal of the the the amount of difference that I have between the matrix that is correct 1 to the to the to the actual so we find that
thing doesn't go that it's too obvious when extracting it doesn't do that but not that much and doesn't improve the results that much more so again not because is just a guy who couldn't trenchantly listed that this is
just what


  782 ms - page object


AV-Portal 3.9.1 (0da88e96ae8dbbf323d1005dc12c7aa41dfc5a31)