AV-Portal 3.23.2 (82e6d442014116effb30fa56eb6dcabdede8ee7f)

What about recommendation engines?

Video in TIB AV-Portal: What about recommendation engines?

Formal Metadata

What about recommendation engines?
Is your Netflix recommendation accurate?
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
How recommendation engines are taking part in our daily routine and how companies as Netflix and Amazon implement it? This talk aims to show the elements that compound a recommendation engine to people who have never been in touch with the matter or want to know a bit more. At the end of this session, you might be able to reproduce your own recommendation system and also know where to find more about it. Talk structure: 1. What is and why use a recommendation engine? 2. Recommendation engine importance 3. Steps of a recommendation 4. Recommendation algorithms 5. Basic Statistics for distance and correlation 6. Example
Keywords Algorithms Big Data Business Data Science Python 3
Inclusion map Execution unit Googol Multiplication sign Convex hull Point cloud
Algorithm Context awareness Building Multiplication sign Square number Computer network Bit Extension (kinesiology) Formal language Extension (kinesiology)
Degree (graph theory) Statistics Information Software developer Frustration Software developer Right angle Frustration Computer-assisted translation Field (computer science) Hypothesis Hypothesis
Predictability Group action Information Key (cryptography) Archaeological field survey Database Software bug Category of being Word Bit rate Prediction Query language Videoconferencing Website Right angle Information Endliche Modelltheorie YouTube Address space Physical system God Spacetime
Digital filter Mobile app Decision theory Content (media) Graph coloring Semantics (computer science) Internetworking Computer configuration Semiconductor memory Different (Kate Ryan album) Authorization Musical ensemble Representation (politics) Ranking Endliche Modelltheorie MiniDisc YouTube Descriptive statistics Physical system God Collaborationism Information Content (media) Interactive television Process (computing) Personal digital assistant Internet service provider Ranking Row (database)
Digital filter Pauli exclusion principle
Algorithm Information Code File format Multiplication sign Expression Similarity (geometry) Bit rate Image registration Sparse matrix Number Collaborative filtering Mathematics Sparse matrix Bit rate Calculation Physical system
Control flow Similarity (geometry) Device driver Menu (computing) Database Mathematics Mathematics Pointer (computer programming) Bit rate Prediction Coefficient Relief Physical system
Point (geometry) Functional (mathematics) Code Plotter Pythagorean triple Device driver Similarity (geometry) Distance Plot (narrative) Dimensional analysis Number Mathematics Machine learning Computer cluster Root Bit rate Well-formed formula Square number Algorithm Potenz <Mathematik> Graph (mathematics) Pythagorean triple Scattering Symbol table Distance Similarity (geometry) Ring (mathematics) Function (mathematics) Calculation Triangle Euklidischer Raum
Predictability Collaborative filtering Prediction Average
Collaborative filtering Bit rate Prediction Function (mathematics) View (database) Average Similarity (geometry)
Predictability Collaborative filtering Bit rate Average Code Average Device driver Thresholding (image processing)
Continuum hypothesis Average Similarity (geometry) Database Ranking Distance Binary multiplier Euklidischer Raum Similarity (geometry)
Predictability Computer icon Algorithm Functional (mathematics) Device driver Coma Berenices Field (computer science) Function (mathematics) Device driver Bit rate Prediction Function (mathematics) Ranking
Predictability Algorithm Bit rate Prediction Code Code Device driver Arrow of time
Predictability Email Hypermedia Feedback Gender Feedback Projective plane Website Musical ensemble Product (business)
hello everyone it's super nice to have you here today this is the first time you talk to this amount of people so I'm kind of nervous and anxious thank you for being there for me this is an amazing experience thank you very much so today I'm going to talk about recommendation engines and how to build
one simple recommendation algorithm but
first if you don't mind I would like to give you some context about me and also break the ice so I came from I come from
Brazil with the ass actually I know that you know frizzy with the Z but we write it with the S we have more than 8,000 square kilometers of extension and it borders like ten countries this is a huge but I come from the very south of Brazil I will show you this is the Hugh Grant you do so so a little bit about Brazil where language is the Portuguese I don't know if I have another Brazilians here oh nice nice to see you Portuguese have London landed a bit more than five hundred five hundred years ago we have thirty years of a week much democracy and I don't know if you heard but we kind of are having coupe des tartes and coupe des tartes all the time we have unlimited not so well and how how are you is boy to the way well so from the very
south of Brazil its abridge the zoo we like Suhas go into my home we don't have much of Samba neither the warm weather we bored Argentina and Uruguay we are full of playing fields so we use it very much for agriculture and cattle breeding our daily temperature can vary from 28 to 10 degrees in the same day and the most traditional thing we haven't even aged DUSU is ever my TT it's called Shima ho so this thing in the ladies hand is the Shema hoe you can see a very huge piece of meat that is being cooked by the fire that is under the ground so I don't know I don't know if you have heard about my home but I don't know why maybe it's because it has caffeine some footballers have been drinking it like the England's footballers and Messi he's Argentinean right you know that Brazil Argentinians have some struggles and this is her now Jing you know you should also drinking tomorrow he's from Porto Alegre as well and the curious thing is it's a hot beverage and we drink it on the beach in the hot weather it doesn't matter it doesn't matter you like it so
this is me I'm a developer I'm economist I'm doing a master degree in statistics so I'm a dating Tuesday as I'm a cat person and I am addicted to travel this was me my first travel talking about how to do deal with the first age frustration at not of not having your hypothesis proven as a data scientist I don't know if I have data science here but sometimes we spend like days trying to prove a hypothesis and it doesn't happen like we went but the thing is that even if it's not a good I parties I mean it's not accepted we can gather information of this so yeah this is was in the human in data science okay let's go what about
recommendation system so the key word for recommendation systems are revenue and customer engagement what happen is that we as customers are overloaded with information so we have a lot of items out there a lot of movies to watch a lot of I don't know videos and we don't know what to watch sometimes or we don't know what to buy sometimes what is the best for us so we have a lot of information this is a very new personnel personalized way of selling buying watching and getting to know things and well McNee said that this it helps groups of user or users to select items from a crowded item or information space well Amazon YouTube Netflix I mean there is a lot of others like udemy Google oh my god almost every website where every ecommerce uses it but just to bring some examples for Amazon in a quarter they had almost almost 13 billion dollars of revenue and this was the first first quarter they implemented the recommendation engine and they had 30 billion and it was 30 percent more than the same quarter of the last year so for YouTube users and for YouTube more than 70% of the user consumption come from recommendations so when you were watching a video and there's another video coming or you go through the feet and there's some videos recommended for you almost 10% of the consumption of YouTube is about this and for Netflix 75% of the consumption comes from recommendation so we can see it's a very big matter recommendations so when
we need to recommend something we might be looking for answer to problems prediction and rating there is an approach for recommendation that use it that is being used that it has not artificial intelligence that is like oh the most sold the most click 'add items that it appears and it can be easily made with a query in the database and showing the user what items has having more so than everything but what we want to have is things that the customers are likely to loved are likely to buy are likely to watch so we just we don't need just to predicted rating for an item but either if they might like it or not okay little bug there but we how do you get this data right so data is the key and we have two kinds of data I'm saying two kinds because I'm separated that way I don't know maybe we have more even more than we haven't talked about but we can put everything in this two categories so implicit data it's about tracing it's about the data that the customer didn't give to us like their name or address but where did they click what kind of movie they like and something like this we can collect this data right big data and we have the explicit data this is the more difficult to get because it depends on an action of the user so answering a survey or rating items I don't know I think I have never rated an item I don't know about you but it's it's it's not common I mean think for the people who rates because I'm always going to the rating then yeah I have never so this is the basic models
of record recommendation systems with a a a I okay so we have the content based the color the collaborative and the hybrid solutions I mean this companies like Amazon Netflix YouTube they are using more hybrid solutions that is very based on the content but it's very based on the collaborative and has some secrets of the recipe that we don't know so
content based the recommendations are based on the description of the item or in the synopsis or in the jury or even in the author there is um there is a an article in the internet showing that an author had launched the book like in 2010 and the book was not very good sold that's okay in 2015 and not the author launched another book but the content was very similar to this one that was has been launched in 2010 and what happened was that the book that was launched in 2010 it started to have a lot of people buying and buying it and dowser was like oh what's happening then tracing it back they have seen that because of this book that was very similar to the 2010 book it starts to sell you more so it happened so content-based recommender systems are born for date from the idea of using the content of each item for recommending purpose it avoids the cold start recommendation problem wait for it I'm going to talk about it and content representations I open up two options to be used with different approaches like Janelle or Texas text processing techniques semantic information it has a whole world - of tools - to analyze this data so another one is the collaborative and it's memory based recommendations so in this case are based on user social interaction and rankings provider provided by other users this collaborative model is called collaborative filtering and it's divided by user based and item based so let's see this problem it's Saturday night I am Tom I opened my favorite streaming app and I don't know what watch I don't know but I know my friend Marta she likes thriller and drama so do I maybe it's a good idea to ask her for a movie recommendation and this is the truth okay Marta is my friend I didn't know what watch and I was oh my god I'm always like this I don't know if I believe it we live in this kind of thing but I'm Germany and I can make decisions so yeah I was like oh my god what I'm going to
watch so I'm showing you the user based filtering this on top is my friend Harry he likes orange grapes raspberry and banana Martha likes grapes and Danny likes grapes and banana maybe Martha could like orange or banana or raspberry and she never have tasted it maybe I can recommend her but looking at this picture I can see that Danny is more similar to Emery because she likes more fruits maybe Martha just like grapes and if she likes more fruits and she is more similar to Henry maybe I can recommend her oranges and raspberry
well but what about the item based filtering the thing is again every likes the same fruit smart as well and then as well but if I go and check the items I see that more people like banana maybe I can recommend Marta a banana and maybe she likes it
so it has been shown shown in the market the collaborative filtering more accurate than the content-based it's easy to implement but sometimes it's not a good idea to implement we have to keep it in mind when is not a good idea when we don't have enough knowledge about the item and the user so we don't have enough ratings I don't know nothing about what user I can't do the math to see with what user or item this user is more similar so how can I recommend something and when the item has not been ranked enough so it's a new item and I don't have any formation of this so a problem code start code start at as I have said before and now I will explain is the expression we use when we don't have much information about the user and would like to recommend them something I think that helps on the code start is I don't know if you have Netflix or there are another size like this like Pinterest you go for the first time there you do your registration and the website asks you to select items that you like more or gems like that you like more so this helps the algorithm to recommend you things and avoid avoid the code start so it sparsity that is the problem when we don't have much number of ratings and new items because you don't have information so that there it's not a good idea when we don't have any information because this collaborative filtering is very based on similarity similarity calculations
so let's build our own recommendin system algorithm this is these are the
steps to the basic steps to be break to build a recommendation system so first we need to talk we might choose how to do the math to find a similarity coefficient between users then we have to predict so we have to find the predicted score for the movies that the user didn't watch and tell so let the user know what you have predicted to them so recommend it so let's go back to my problem Saturday night I went to watch a movie I don't know what to do so maybe our asks Martha but may I try another friend the thing is I have this database and I
as I said its truth my friends gave me these ratings it's good because now we know how similar we are and how to get recommendations so this is our database let me see if I can have a pointer here okay so the wolf of Wall Street Danny hasn't watch it Cool Runnings I didn't baby driver neither Danny as well didn't
watch the coroners every didn't watch babe driver and ultra didn't watch the Lord of the Rings so let dude the cow to
choose to see the similarity between us what I have done here is I have plot a two dimension graph of the wolf of Wall Street and the Devil Wears Prada and I seen how the data is dispersed so here I can see that Marta gave a 3 rating for the other Devil Wears Prada and three for the wolf of the Wall Street I gave four and four and Phillip gave three for the Devil Wears Prada and four 450 for the wolf a straight when I see this graph and I try to to to to understand who is closer to me because I'm trying to get the similarity here so when we talk about data similarity is much about where are we together what points or data points we share and things like it like this so when I want to recommend something to anyone I want I really want to recommend to someone that is very similar to me so I know that this person is going to like it so thinking about it I can try to measure the distance so from Philip I am 0.5 distant and from ootah well this is a triangle so let's do the hypotenuse doing the Pythagorean so do you remember that from school so yeah we get the distance of this casino and this one then we can doing the square of those summing up and the square root we get the I potenuse so doing this math I know that I'm 0 71 distant from ohta so I'm more distant to altar than to Philip this is the okk lydian distance this is also a math that is used for the cane nearest neighbor this is very used algorithm in machine learning I would I would just show this formula but I thought that if I showed you the triangle and the Pythagoras it will be easier because I don't think people like symbols characters and numbers exponential everything together so this is just the Pythagoras with summing up everything with all dimensions we have because I have showed you the two dimensional graph but if I haven't a lot of movies out you have a lot of dimensions so this is how we are going to make the similarity calculation if I do function name similarity get similar sorry I did this I will show you the code and it shows that regarding our movies I'm more similar to ootah then to everyone else
so let's predict the prediction is that
I have to predict what is going to be my waiting for Cool Runnings and for webdriver because maybe that the
movie I'm gonna see tonight actually in
Saturday so here we can see that we
don't have much ratings I mean like four of those more plus mine are missing so
what if like at a view that is the most
similar to me like here like we'll see
here hadn't waited movies let's suppose
that we have like 50 movies not just five what if a person appreciate a movie that everyone else have low rated it a solution to this problem is to use the weighted average so let's go to the Saturday night like movie should I watch protip it's up to us to choose what is the recommendation threshold we have this method the dis eristic method that is to get my average rating and just recommend if the prediction is higher than my average rating so
recommendations for me here is the code
I have done for the recommendation this measures the Euclidean distance and multiplies for the similarity for the waited for the weighted average so here
what happen is I have put the similarity
here sorry the predictions we have Danny hasn't predict the hasn't rated Cool Runnings so it's blank and I have multiplied this two to get the weighted rank and in the end we get a prediction for the Cool Runnings for example for me I would rate Cool Runnings as 3.6 at least the algorithm says that and the same thing for webdriver I would rate it as 3.93 so
here's the let me see here yeah this is the function of the recommendation and this is the output so as we have seen I would recommend more of a driver than Cool Runnings what does it mean that I may be recommended with webdriver by the algorithm so I should tell my users what is the predicted here doing the
prediction for everyone I have seen that I would be recommending cool burners for Runnings like this baby driving like this for the Danny this is these are the predictions and at the Saturday night I'm going to see babe driver so problems soaked as I have seen here my average average rating is 3 so this 2 can be recommended maybe in my Netflix or my streaming up the this 2 is going to appear like watch it now Cool Runnings and babe driver they have likely to be successful of my good movies so this the
code is here if you want to check if you want to see and see how to make this algorithm so tiny dot CC arrow Python 2019 and thank you very much I would
like to hear a feedback from you and I'm here if you have any questions thank you [Applause]
maybe maybe one question here thanks for the talk I'm curious at your company what products or items are you you're recommending nice so I'm a data enthusiast my company is that works we don't have much work on recommendation we have been doing some influences like we have made for a big media company gender prediction so when a user enters in the website is it is day of mail or is a male what age and kinds of this but recommendation we have done it yet in a project thanks welcome [Music] [Applause]