We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

What about recommendation engines?

00:00

Formale Metadaten

Titel
What about recommendation engines?
Untertitel
Is your Netflix recommendation accurate?
Serientitel
Anzahl der Teile
118
Autor
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
How recommendation engines are taking part in our daily routine and how companies as Netflix and Amazon implement it? This talk aims to show the elements that compound a recommendation engine to people who have never been in touch with the matter or want to know a bit more. At the end of this session, you might be able to reproduce your own recommendation system and also know where to find more about it. Talk structure: 1. What is and why use a recommendation engine? 2. Recommendation engine importance 3. Steps of a recommendation 4. Recommendation algorithms 5. Basic Statistics for distance and correlation 6. Example
Schlagwörter
20
58
PunktwolkeGoogolInklusion <Mathematik>RechenwerkKonvexe HülleMultiplikationsoperatorAlgorithmusVererbungshierarchieVorlesung/Konferenz
Gebäude <Mathematik>RechnernetzMaßerweiterungSoftwareentwicklerStatistische HypotheseGeometrische FrustrationInformationPrognoseverfahrenAlgorithmusKontextbezogenes SystemGeometrische FrustrationBitrateInformationMinkowski-MetrikYouTubeAbfrageComputerunterstützte ÜbersetzungGrundsätze ordnungsmäßiger DatenverarbeitungStatistikAdressraumDatenfeldWort <Informatik>Formale SpracheWeb SiteSchlüsselverwaltungKategorie <Mathematik>MultiplikationsoperatorDatenbankRechter WinkelQuadratzahlPhysikalisches SystemMaßerweiterungVideokonferenzBitStatistische HypotheseVorhersagbarkeitMinimalgradGruppenoperationProgrammfehlerSoftwareentwicklerWhiteboardComputeranimation
Mini-DiscFormation <Mathematik>Inhalt <Mathematik>DigitalfilterRankingPauli-PrinzipKollaborative FilterungSchwach besetzte MatrixBitrateApp <Programm>MathematikGrundsätze ordnungsmäßiger DatenverarbeitungAutorisierungCodeBitrateEntscheidungstheorieInternetworkingÄhnlichkeitsgeometrieYouTubeService providerGruppenoperationRechenbuchDeskriptive StatistikRankingSondierungKollaboration <Informatik>Inhalt <Mathematik>InformationEndliche ModelltheorieZahlenbereichRechter WinkelAlgorithmusSchwach besetzte MatrixCASE <Informatik>MultiplikationsoperatorRegistrierung <Bildverarbeitung>Physikalisches SystemGraphfärbungInteraktives FernsehenArithmetischer AusdruckDatensatzDateiformatSelbstrepräsentationHalbleiterspeicherProzess <Informatik>Formale SemantikKonfiguration <Informatik>DifferenteFilter <Stochastik>Computeranimation
MathematikPrognoseverfahrenMenütechnikBetragsflächePlot <Graphische Darstellung>StreuungPythagoreisches ZahlentripelAbstandEuklidischer RaumPhysikalisches SystemAlgorithmusPlotterÄhnlichkeitsgeometrieSystemaufrufDimensionsanalyseAbstandMathematikBitrateGraphPythagoreisches ZahlentripelRechenbuchAlgorithmische LerntheorieKoeffizientPunktDatenbankTreiber <Programm>Einfacher RingZeiger <Informatik>DreieckZahlenbereichWurzel <Mathematik>SymboltabelleQuadratzahlPotenz <Mathematik>KontrollstrukturMehrrechnersystemAusdruck <Logik>Computeranimation
ÄhnlichkeitsgeometrieFunktion <Mathematik>PrognoseverfahrenKollaborative FilterungMittelwertTreiber <Programm>DatenbankKontinuumshypotheseRankingCOMBildschirmsymbolDatenfeldTreiber <Programm>FunktionalBitrateVorhersagbarkeitAbstandAlgorithmusMultipliziererFunktion <Mathematik>CodeÄhnlichkeitsgeometrieSichtenkonzeptSchwellwertverfahrenRechenbuchEuklidischer RaumMittelwertRankingAuflösung <Mathematik>XML
CodeRückkopplungZeitrichtungAlgorithmusRückkopplungCode
HypermediaVorhersagbarkeitProjektive EbeneWeb SiteProdukt <Mathematik>Geschlecht <Mathematik>E-MailFormation <Mathematik>Logischer SchlussVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
Hello, everyone. It's super nice to have you here today. This is the first time I talk to this amount of people. So, I'm kind of nervous and anxious. Thank you for being there for me. This is an amazing experience. Thank you very much. So, today I'm going
to talk about recommendation engines and how to build one simple recommendation algorithm. But first, if you don't mind, I would like to give you some context about me and also
break the ice. So, I come from Brazil with the S, actually. I know that you know Brazil with the Z, but we write it with the S. We have more than 8,000 square kilometers of
extension, and it borders like 10 countries. This is huge. But I come from the very south of Brazil. I will show you. This is in the Rio Grande do Sul. So, a little bit
about Brazil. I don't know if I have another Brazilian here. Oh, nice. Nice to see you. Portuguese have landed a bit more than 500 years ago. We have 30 years of a weak
democracy, and I don't know if you heard, but we kind of are having coup d'etats and coup d'etats all the time. We have unlimited natural wealth. And how are you?
Oi, todo bem. Well, so from the very south of Brazil, it's Rio Grande do Sul. We like Churrasco and Chimaro. We don't have much of Somba, neither the warm weather, actually. We board Argentina and Uruguay. We are full of playing fields, so we use
it very much for agriculture and cattle breeding. Our daily temperature can vary from 28 to 10 degrees in the same day. And the most traditional thing we have in Rio Grande do Sul is Ervamati tea. It's called Chimaro. So, this thing in the
lady's hand is the Chimaro. You can see a very huge piece of meat that is being cooked by the fire that is under the ground. So, I don't know. I
don't know why. Maybe it's because it has caffeine. Some footballers have been drinking it, like the England's footballers. And Messi, he's Argentinian,
right? You know that Brazil's Argentinians have some struggles. And this is Renaudinho, also drinking the Chimaro. He's from Porto Alegre as well. And the curious thing is, it's a hot beverage. And we drink it on the beach in the hot
weather. It doesn't matter. We like it. So, this is me. I'm a developer. I'm an economist. I'm doing a master's degree in statistics. So, I'm a date enthusiast. I'm a cat person. And I am addicted to travel. This was me, my first
travel, talking about how to deal with the frustration of not having your hypothesis proven as a data scientist. I don't know if I have data
science here, but sometimes we spend like days trying to prove a hypothesis. And it doesn't happen like we want. But the thing is that even if it's not a good hypothesis, I mean, it's not accepted, we can gather information of
this. So, yeah, this was in the human data science. Okay, let's go. What about recommendation systems? So, the key word for recommendation systems are revenue and customer engagement. What happened is that we as customers are overloaded
with information. So, we have a lot of items out there, a lot of movies to watch, a lot of, I don't know, videos, and we don't know what to watch sometimes or we don't know what to buy sometimes, what is the best for us.
So, we have a lot of information. This is a very new personalized way of selling, buying, watching, and getting to know things. And, well, McNee said that it helps groups of users to select items from a crowded item or
information space. Well, Amazon, YouTube, Netflix. I mean, there is a lot of
others like Udemy, Google, oh my God, almost every website, every e-commerce uses it. But, just to bring some examples, where Amazon, in a quarter, they had almost $13 billion of revenue. And, this was the first quarter they
implemented the recommendation engine. And, they had $30 billion and it was 30% more than the same quarter of the last year. So, for YouTube users and for
YouTube, more than 70% of the user consumption come from recommendations. So, when you were watching a video and there's another video coming or you go
through the feed and there's some videos recommended for you, almost 10% of the consumption of YouTube is about this. And, for Netflix, 75% of the consumption comes from recommendation. So, we can see it's a very big matter
recommendations. So, when we need to recommend something, we might be looking for answer two problems, prediction and rating. There is an approach for recommendation that is being used that has not artificial intelligence that is
like, oh, the most sold, the most clicked items that it appears and it can be easily made with a query in the database and showing the user what items
have been more sold and everything. But, what we want to have is things that the customers are likely to love, are likely to buy, are likely to watch.
So, we don't need just a predicted rating for an item, but either if they might like it or not. Okay, a little bug there. But, how do you get this data, right? So, data is the key. And, we have two kinds of data. I'm saying two
kinds because I'm separated that way. I don't know, maybe we have more, even more than we haven't talked about, but we can put everything in these two categories. So, implicit data. It's about tracing. It's about the data that the
customer didn't give to us, like, their name or address. But, where did they click? What kind of movie they like? And something like this. We can collect
this data, right? Big data. And, we have the explicit data. This is the more difficult to get because it depends on the action of the user. So, answering a survey or rating items. I don't know. I think I have never rated an item. I don't know about you. But, it's not common. I mean, thank for
the people who rates because I'm always going to the rating. But, yeah, I have never. So, this is the basic models of recommendation systems. We
have AI, okay? So, we have the content-based, the collaborative, and the hybrid solutions. I mean, these companies like Amazon, Netflix, YouTube, they are using more the hybrid solutions that is very based on the
content, but it's very based on the collaborative and has some secrets on the recipe that we don't know. So, content-based. The recommendations are based on the description of the item or in the synopsis or in the
genre or even in the author. There is an article in the internet showing that an author had launched a book, like, in 2010, and the book was not
very good sold. That's okay. In 2015, another author launched another book, but the content was very similar to this one that has been launched in 2010. And what happened was that the book that was launched
in 2010 started to have a lot of people buying it, and the author was like, oh, what's happening? Then, tracing it back, they have seen that
because of this book that was very similar to the 2010 book, it starts to sell more. So, it happens. So, content-based recommender systems are born from the idea of using the content of each item for recommending
purpose. It avoids the cold start recommendation problem. Wait for it. I'm going to talk about it. And content representations are open up to options to be used with different approaches, like, you know, or text
processing techniques, semantic information. It has a whole world of tools to analyze this data. So, another one is the collaborative,
and it's memory-based. Recommendations, so, in this case, are based on user social interaction and rankings provided by other users. This collaborative model
is called collaborative filtering, and it's divided by user-based and item-based. So, let's see this problem. It's Saturday night. I am at home. I open my favorite streaming app, and I don't know what to watch. I
don't know what to watch. Maybe it's a good idea to ask her for a movie recommendation. And this is the truth, okay? Marta is my friend. I
didn't know what to watch. And I was, oh, my God, I'm always like this. I don't know if you believe it, believe in this kind of thing, but I'm Demini, and I can make decisions. So, yeah, I was like, oh, my God, what am I going to watch? So, I'm showing you the user-based
filtering. This on top is my friend Henry. He likes orange, grapes, raspberry, and banana. Marta likes grapes, and Demini likes grapes and banana. Maybe Marta could like orange, or banana, or raspberry,
and she never have tasted it. Maybe I can recommend her. But looking at this picture, I can see that Demini is more similar to Henry,
because she likes more fruits. Maybe Marta just like grapes. And if she likes more fruits, and she's more similar to Henry, maybe I can recommend her oranges and raspberry. Well, but what about the item-based filtering?
The thing is, again, Henry likes the same fruits, Marta as well, and Demini as well. But if I go and check the items, I see that more people like banana. Maybe I can recommend Marta a banana, and maybe
she likes it. So, it has been shown in the market, the collaborative filtering, more accurate than the content-based. It's easy to
implement, but sometimes it's not a good idea to implement. We have to keep this in mind. When is not a good idea? When we don't have enough knowledge about the item and the user, so we don't have enough ratings, I don't know nothing about the user, I can't do
the math to see with what user or item this user is more similar. So, how can I recommend something? And when the item has not been ranked enough. So, it's a new item, and I don't have any
information of this. So, a problem. Code start. Code start, as I have said before, and now I will explain, is the expression we use when we don't have much information about the user and
would like to recommend them something. I think that helps on the code start is, I don't know if you have Netflix, or there are other sites like this, like Pinterest, you go for the first time there, you do your registration, and the website asks you to select items that you like more, or
gems that you like more. So, this helps the algorithm to recommend you things and avoid the code start. So, it's sparsity, that is the problem when we don't have much
number of ratings. And new items, because we don't have information. So, it's not a good idea when we don't have any information, because this collaborative filtering is very based on similarity calculations. So, let's build
our own recommand system algorithm. These are the steps, the basic steps to build the recommendation system. So, first, we need to call, we might
choose how to do the math to find the similarity coefficient between users. Then, we have to predict. So, we have to find the predicted score for the movies that the user didn't watch, and tell. So, let the user know what you have predicted to them.
So, recommend it. So, let's go back to my problem. It's Saturday night. I want to watch a movie. I don't know what to do. So, maybe I will ask Martha, but may I try another friend?
The thing is, I have this database, and as I said, it's true. My friends gave me these ratings. It's good, because now we know how similar we are, and how to get recommendations. So, this is our database. Let me see if I can have a pointer here.
Okay, so, the Wolf of Wall Street. Danny hasn't watched it. Cool Runnings, I didn't. Baby Driver neither. Danny, as well, didn't watch the Cool Runners, and we didn't watch
Baby Driver, and Ulta didn't watch the Lord of the Rings. So, let's do the call to see the similarity between us. What I have done here is I have plot a two-dimension graph
of the Wolf of Wall Street and The Devil Wears Prada, and I'm seeing how the data is dispersed. So, here, I can see that Martha gave a three rating for The Devil Wears Prada
and three for The Wolf of Wall Street. I gave four, and four, and Philip gave three for The Devil Wears Prada, and four, and fifty for The Wolf of Wall Street.
When I see this graph, and I try to understand who is closer to me, because I'm trying to get the similarity here. So, when we talk about data similarity, it's much about where are we together, what data points we share, and things like this.
So, when I want to recommend something to anyone, I really want to recommend to someone that is very similar to me, so I know that this person is going to like it.
So, thinking about it, I can try to measure the distance. So, from Philip, I'm 0.5 distance, and from Ulta, well, this is a triangle. So, let's do the hypotenuse,
doing the Pythagorean. So, do you remember that from school? So, yeah, we get the distance of this cathedral, and this one, then we can doing the square of those, summing up,
and the square root, we get the hypotenuse. So, doing this math, I know that I'm 0.721 distance from Ulta. So, I'm more distant to Ulta than to Philip.
This is the Euclidean distance. This is also a math that is used for the k-nearest neighbor. This is a very used algorithm in machine learning. I would just show this formula, but I thought that if I showed you the triangle and the Pythagoras, it will be easier, because
I don't think people like symbols, characters, and numbers, exponential, everything together. So, this is just the Pythagoras with summing up everything with all dimensions we have,
because I have showed you the two-dimensional graph, but if I have a lot of movies, I will have a lot of dimensions. So, this is how we are going to make the similarity calculation.
If I do function named similarity, get similar, sorry, I did this. I will show you the code, and it shows that regarding all movies, I'm more similar to Ulta than to everyone else.
So, let's predict. The prediction is that I have to predict what is going to be my rating for Cool Runnings and for Brave Driver, because maybe
that's the movie I'm going to see tonight, actually on Saturday. So, here we can see that we don't have much ratings. I mean, like, four of those,
more plus mine are missing. So, what if, like, Octavio, that is the most similar to me, like we see here, hadn't rated movies. Let's suppose that we have, like, 50 movies, not just
five. What if a person appreciates a movie that everyone else has low rated it? A solution to this problem is to use the weighted average. So, let's go to the Saturday night. What movie should I watch? Pro tip. It's up to us to choose what is the recommendation threshold.
We have this method, the method that is to get my average rating and just recommend if the prediction is higher than my average rating. So, recommendations for me. Here is the
code I have done for the recommendation. This measures the Euclidean distance
and multiplies for the similarity for the weighted average. So, here, what happened is I have put the similarity here. Sorry. The predictions we have, Danny hasn't rated Cool
Runnings, so it's blank. And I have multiplied this two to get the weighted rank. And in the end, we get a prediction for the Cool Runnings. For example, for me,
I would rate Cool Runnings as 3.6. At least the algorithm says that. And the same thing for BraveDriver, I would rate it as 3.93. So,
here's the let me see here. Yeah. This is the function of the recommendation. And this is the output. So, as we have seen, I would recommend more BraveDriver than Cool Runnings. What does it mean? That I may be recommended with BraveDriver by the algorithm.
So, I should tell my users what is the predicted. Here, doing the prediction for everyone, I have seen that I would be recommending Cool Runnings like this, BraveDriver like this,
for Danny. These are the predictions. And at the Saturday night, I'm going to see BraveDriver.
So, problem solved. As I have seen here, my average rating is 3. So, these two can be recommended. Maybe in my Netflix or my streaming app, these two are going to appear
like watch it now. Cool Runnings and BraveDriver, they are likely to be successful of my good movies. So, the code is here. If you want to check if you want to see and see how to
make this algorithm. So, tiny.cc, 2019. And thank you very much. I would love to hear a feedback from you. And I'm here if you have any questions. Thank you. Maybe one question here. Thanks for the talk. I'm curious, at your company,
what products or items are you recommending? Nice. So, I'm a data enthusiast. My company is ThoughtWorks. We don't have much work on recommendation. We have been doing some inferences
like we have made for a big media company gender prediction. So, when a user enters in the website,
is they a male or is a male what age? And kind of this. But recommendation, we haven't done it yet in a project. Thanks. Welcome.