Yak shaving a good place to eat using non negative matrix factorization
4 views
Formal Metadata
Title 
Yak shaving a good place to eat using non negative matrix factorization

Title of Series  
Part Number 
90

Number of Parts 
173

Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
DOI  
Publisher 
EuroPython

Release Date 
2015

Language 
English

Production Place 
Bilbao, Euskadi, Spain

Content Metadata
Subject Area  
Abstract 
Adriano Petrich  Yak shaving a good place to eat using non negative matrix factorization Trying to find a good place to eat has become much easier and democratic with online reviews, but on the other hand, that creates new problems. Can you trust that 5 star review of fast food chain as much as the 1 star of a fancy restaurant because "Toast arrived far too early, and too thin"? We all like enjoy things differently. Starting of on the assumption that the "best pizza" is not the same for everyone. Can we group users into people that has similar tastes? Can we identify reviews and restaurants to make sense of it? Can that lead us to a better way to find restaurants that you like? Using some data handling techniques I walk you through my process and results that I've got from that idea. There are no requisites for this talk except basic python and math knowledge (matrices exist)

Keywords 
EuroPython Conference
EP 2015
EuroPython 2015

00:00
Principal ideal
Plane (geometry)
Computer animation
Lattice (order)
Lecture/Conference
Multiplication sign
Knot
00:43
Beat (acoustics)
Process (computing)
Divisor
Confidence interval
Distribution (mathematics)
Control flow
Insertion loss
Parameter (computer programming)
Density of states
Average
Metric tensor
Residual (numerical analysis)
Goodness of fit
Computer animation
Bit rate
Dedekind cut
Eigenvalues and eigenvectors
Website
Statistics
Absolute value
02:54
Standard deviation
Rotation
System call
Structural load
Euler angles
State of matter
Scientific modelling
Mathematical singularity
Home page
Insertion loss
Icosahedron
Mereology
Weight
Matrix (mathematics)
Collective intelligence
Medical imaging
Sign (mathematics)
Matrix (mathematics)
Video game
Bit rate
Military operation
Hausdorff dimension
Cuboid
Extension (kinesiology)
Position operator
Physical system
Metropolitan area network
Electronic mailing list
Ext functor
Bit
Mereology
Port scanner
Measurement
Maxima and minima
Category of being
Lattice (order)
Vector space
Graph coloring
Hausdorff dimension
Order (biology)
Personal area network
Digitizing
Resultant
Annihilator (ring theory)
Ocean current
Einheitskugel
Connectivity (graph theory)
Real number
Student's ttest
Average
Hand fan
Number
Hypothesis
Arithmetic logic unit
Subtraction
Mutual information
Task (computing)
Form (programming)
Data type
Beat (acoustics)
Multiplication
Forcing (mathematics)
Mathematical analysis
Independence (probability theory)
Grand Unified Theory
Local Group
Approximation
Plane (geometry)
Subject indexing
Spring (hydrology)
Computer animation
Personal digital assistant
Negative number
Matrix (mathematics)
12:36
Metropolitan area network
Matrix (mathematics)
Computer animation
Structural load
Structural load
Data acquisition
Water vapor
Gamma function
Matrix (mathematics)
Matrix (mathematics)
Maxima and minima
13:36
Metropolitan area network
Greatest element
Auto mechanic
Bit
Grand Unified Theory
Weight
Matrix (mathematics)
Sparse matrix
Matrix (mathematics)
Computer animation
Forest
Electronic visual display
Matrix (mathematics)
Resultant
14:48
Metropolitan area network
Radius
Computer animation
Function (mathematics)
Electronic visual display
Affine space
Social class
15:13
Metropolitan area network
Cloud computing
Code
Sound effect
Client (computing)
Causality
Sic
Computer animation
Vector space
String (computer science)
Data acquisition
Data Encryption Standard
Aerodynamics
Electronic visual display
16:07
Metropolitan area network
Category of being
Goodness of fit
Computer animation
Linear programming
Code
Personal area network
16:53
Metropolitan area network
Computer programming
Computer animation
Function (mathematics)
Code
17:25
Metropolitan area network
Multiplication
Computer animation
Divisor
Natural number
Function (mathematics)
Code
Similarity (geometry)
Pattern language
Insertion loss
18:31
Metropolitan area network
Vapor barrier
Computer animation
Function (mathematics)
Executive information system
Code
Local ring
18:52
Meta element
Link (knot theory)
Demo (music)
Chemical equation
Finitary relation
Morley's categoricity theorem
Bit
Water vapor
Rule of inference
Theory
Category of being
Voting
Computer animation
Task (computing)
Mutual information
Resultant
20:48
Metropolitan area network
Hausdorff space
Computer animation
Function (mathematics)
Code
Aerodynamics
Grand Unified Theory
Whiteboard
Matrix (mathematics)
Data type
21:33
Area
Metropolitan area network
Structural load
Closed set
Source code
Interface (computing)
Total S.A.
Paradox
Shape (magazine)
Mass
Price index
Urinary bladder
Maxima and minima
Summation
Computer animation
Bit rate
Quicksort
Matrix (mathematics)
Units of measurement
23:15
Metropolitan area network
Sign (mathematics)
Matrix (mathematics)
Computer animation
Structural load
Subtraction
Resultant
24:21
Computer animation
Lecture/Conference
Logarithm
Computer file
Resultant
00:00
let me give you make from was beginning 1st of it's my opinion not that that and really it creates like this that look OK from this and I
00:11
think it's all it was website was created by having move people that beating at meetings places that had like reviews that saves the best feature that place whatever so that's what started Brazilian Python is that most for the weather demands and principal at time million people city so moved from the user into the 140 thousand people the force large
00:45
lost negative made a lot of in Scotland hold the cake and but little the so again from this not 2 minutes away from my place so just because that define what is a good place to eat but not we don't know what it is such a lot about that this talk is about let you can go through to the end of the 1st place and factors you it's going to be nice but I'm worried that
01:18
on the notes about what that person in 19 voted 314 such as and that's what this site about how to the 1st idea would be like stars and the ratings and if you get something like this what this about what there was nothing from this process and the McDonald's or very good beaches places so so we really don't have to suggest that also if it is a McDonald's in the get like 5 cents for a lot of people might be and the associated with the best 1 so what does that affect all of these just that scare likes to alleviate Morrison would be done but the sound like and be it is just not sane and all the rating sites found that really early on as anybody just from kindergarten was say that break is not gonna work you have to to use the lower bound all fall within score confidence interval for a binaural parameter of course so another good metric that might try to move the of ratings so you can get it and put forth residue had led much for the reviews the demand will be good but it's not nearly enough to know that for the rest 2nd reference you got what's called loss is going what because the
02:57
entire when it has a lot of terrible will review of X and reviews the thing is that people tend to fall to the floor devoted and very bad and average just you will feel that it was right so that helps but doesn't solve the problem so now for something completely different and you when is that you yeah yes that's good so trying and unless you have difficulty about community that is it for Wikipedians they may be right at the death metacommunication also between you after home so as a funny thing about methods in the case of is that the abroad attitude you is that most of the matter that continued for this task or you get different results with different dimensions and that was by the bond order that is so the trick here is that not to multiply 2 matrices the middle values that is the columns for the 1st meeting and the roles of the 2nd half the DC number and the user learning with the Department what all use epistemology and that's going to the p so that's something really nice but also called for that a lot of immigrants of independent of the 19th century it was the population of the city 35 per cent would be balanced 11 % were reportedly that's almost 50 per cent of the students in the 7th sentence when it got close to 1940 it was almost 700 thousand readout and all other models at that that it is for the state the 2nd from the component parts it but all but in the summertime and quick and that's of the world Englishes solid that that was less than 1 million people so the of that and a lot of images that they create some coats were also to be called promote close to what you have written as well with the you know the is that the new New York New York style pizza we do have a socalled pizza and that that's important because it comes to my hypothesis that people that have the same background check the things just use things that if you have a culture of something you going to all and all of you have a single treatment the same is ball has advantage at different intensity try to prove it that is the first one it but it's not even the top places and never been there but call by the galson people agree quite well that's a good pizza if it was a bit it's going to be lower but still be as tight as well the 2nd 1 is the the best pizza place in the and it's also but look holds 1st egophony and then the has been % of of people from from abroad and each 1 has a different idea what it would be if and only have a so that the find but indeed the same thing happens you well the force on the system is not even again they are going to be the 1 the 1st thing that on the unit ball and it's a solid but again a very tight of loss and a lot of people know that so for the default before but they want to remember it's officership places and people that have a group of researchers digits and also very tight and the gossip that had from like people that will not those so many boxes from well the model we have talked about that size is that just a exchange and monitoring the goal the rotation of analysis repeated so this is just the sum of current and is that that is what makes it a good place to be is based on individual background and that's where the blue based on the individual links that if you're going from multiple reviews from someone you better you the chance that you know like another rest and that the light will be much more likely and then we go back to linear house no food suppose that I have to do with a gun somewhere for a friend a huge list also hopes using that the amount of stars for and have this of those the evidence for the 500 top restaurants in Dubai also what do that course loading matrix and matrix that is residents for its role uses for its color and the value in the middle of the vector for each position is just 1 of the things that they gave you and colleagues and doesn't have initialstressed amusing so the same way that you could remove the measures that can also create them so if I have a matrix and the 1 created to new matrix that if it can make that's question
08:49
I can generate insights that what with this extradimensional created and that's quite useful because because suppose that I create a matrix that's really an approximation rate exceeds the separate approximation that matrix uses that got from real data but this is the article the radical 1 at created that and that was is the result of users or of precedents with some that was that the 2 and a matrix of users with the same number of categories if I can multiply them and treatment of C. that again it's but through the position of to some extent we much classified the reference and the users in this category and that much of what are nonnegative matrix position this is only to treat weights for automatically generated that that there is so you don't know what it's it's kind various forms beforehand sometimes it they don't even make sense but sometimes you can get there from which the world's worst mission what the young no and then through the spring which on the tips estimates that can tell me all reference has discussed everything in different ways and they can try to match those to all the residents and try to find present that they might like this and that will that allow OK social so you want the nonnegative part of it and that's the thing you should have faith it's just that if you keep it all positive all of it and greater equal to 0 it was as if a lot of new ways of generating C. because of 1 of the most usual ways of doing that it's the least so of the least squares and that's gonna be easier if you do just with with positive forms the and orchestrated because stars that received to fight so achieves this is still there is taken from the book Programming the collective intelligence it's so slightly older book I get it doesn't care and the problem of but it's 1 of those great books that when you return to them in french spanish find different that's of it's all done by it's probably 1 of the most valuable to going to heaven your bookshelf so this is just let is always for that and then just left the comment was that majority of 0 you so 1st start are use just random value and then inserting the ratings and for this reason couple called different C and are and then there is the same you just access but that's more mood didn't doesn't happen to like real life data but then you fixate 1 of the major and integrated the weights for the other 1 benefits that they all belong to try to predict the index for the weights for the long go click again and try to prove it and see what get and as much as how about the we have this is hot
12:37
water because can go from the new from you so very much like different concepts of what I've been distinguished just loads a bunch of import and natural indirectly so as long as the data that I got OK so if someone around the nominated and that's the loses just run on here but I have just run it forgot the name matrix that's we which so it is 11 thousand of users and
13:28
500 a D from that they got the the rest of the matrix
13:36
so that is at the bottom of the with
13:39
the and weights and they have and also vary the weights for the users so if you look at here this that use 0 voltage for for forests and 0 and so forth as a 5 year that the sparse matrix and the more connected to that has the better results than defense then makes us like to try to find people that holds will likely likewise for all the rest of the you have elects suggesting to support against the mechanism the bigger that much here too a bit of a predefined of that so OK old so this is the rest so we have reference here is the fact is that the role the you're out and that was for 500 so for the 1st and last you have to use that just transpose the matrix so both makes make it more likely the coloring but mess I and
14:58
so and so has so again
15:01
uses the radius of a lot of them is just like people political to class so they don't really really influence all someone affinity means that if all the just will and here I
15:15
do have some of our some kind of magic to get people and sold them all of them can be delivered back this but they
15:26
as they get the rest that like and this is how they look at come on I should have cultural OK so here's the problem so it has a huge vector for which we don't know what it is a large effect on the effect of the and that want to so we can make a lot of clients in the restaurants so this is just the basic find similar that I found it's virtually but it's
16:09
not the best diversity is probably use some linear programming to find the along with the the
16:17
different but here have In the which is a solid for because in the becomes 0 here but you can't see it because the
16:26
4th like 14 for which huge for and from that you can get an idea of what that there might be because the same compare that apples good beaches and of cooking because public schools so that might be in the kitchen properties in public schools in a market gospel also hit for the last as 0 well that that means it is method to see don't have
16:56
that much of the the city residents
16:59
netted among all and love
17:02
is still there and that's all Europe because of I was looking at didn't seem that interesting but that's something that has a huge for has a at 0 0 of from to so that's something that these programs the way of thinking just suggest a new place to do looking
17:26
it is the will of the people that they didn't but this is just
17:32
setting the similarity between the current all the facts of the same so all on the same all the rest of the day you can also do that so the users that you have the factors here that are correlated some factors so if a friend who voted for all multiple lot for of factor for might be people that like this kind of like the fluorescent and they might be interested to find what the loss so the patterns this next the use of these extreme peaks to both a lot at 0 at 3 AM that might be interesting was the what is the voted for what is the nature of this kind of stuff and then from then you can try to find from residents that people like and again it is in the
18:33
middle of figuring out this a
18:36
different 1 it's local barrier
18:41
gospel again he was and that's it there are OK focused
18:59
but I suppose if you think that for changing the sub following the the military the right place
19:28
sorry and that's it thank you but to hi I and you ended up with 9 categorization in your example was suspected of 10 so what what was if you ended up with 9 categories it's always it's absolutely 9 votes than could use a little bit than what we just a random little value all I tried this plenty it got the results of the Monday there is you had the closer they get to through the more more similar the natives are but also make it harder for certain makes less sense of the it's that theory I got a balance about this that satisfy an iterative 20 and we must then was like this it's bottom that's as much water is also when you run a demo inflation so that was about to or rules the you have
20:54
have to here welcome on board the ship
21:07
so far the beautiful and not don't so yes it can get in going called yeah so this is stored interactions of .
21:34
a total of that they're not put so they are
21:51
calculating it should build putting something like so it's a sort of a medical claim that show you how to present thank you more useful so you have having internal that's is it true they can't sources tell how much going like this this worse just by looking at the shape of the units just go on on you will review or something like that and an optimal reunited CT and in this area is the sum of 2 just too speculative masses you get if you were on yeah yeah the the the justice said that interface of all of but I can tell you refer from the user it's a huge in from looking at the shape you can clearly see always these again to take shape their and depend on a whole of the terrible 1 it could be an indication that a lot of averaging and and poor ratings that are not seen so that's something that sort of thing as close as you can see it is to
23:15
to to to talk about gossip gossip it's usually a good sign but even though it allows 1 might mean just that the rest is not really a stable really doesn't always do the stuff to do things so what the do not being into some not pleased to see but it does have some of the thank you Over here
23:52
can thus so so we can use the public good results come all this is the difference and the other goals it do something given reduce anymore and that was the goal of the the the amount of difference that I have between the matrix that is correct 1 to the to the to the actual so we find that
24:21
thing doesn't go that it's too obvious when extracting it doesn't do that but not that much and doesn't improve the results that much more so again not because is just a guy who couldn't trenchantly listed that this is
24:43
just what