Developing a match-making algorithm between customers and Go-Jek products!
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 130 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/49931 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 202072 / 130
2
4
7
8
13
16
21
23
25
26
27
30
33
36
39
46
50
53
54
56
60
61
62
65
68
73
82
85
86
95
100
101
102
106
108
109
113
118
119
120
125
00:00
Product (business)AlgorithmData modelSoftware testingMenu (computing)Universe (mathematics)VacuumElectric currentEmpfehlungssystemLocal GroupProcess modelingExecution unitMaxima and minimaComputer wormBitRight angleData conversionDegree (graph theory)Multiplication signMathematicsService (economics)DigitizingException handlingComputing platformArithmetic meanGroup actionBit ratePermutationAlgorithmDatabase transactionCommutatorPoint (geometry)Endliche ModelltheorieContext awarenessMobile appConditional-access modulePhysical systemUniverse (mathematics)CASE <Informatik>TouchscreenCellular automatonStatement (computer science)MaizeDiagramUniform resource locatorVenn diagramSocial classQuicksortMatching (graph theory)Raw image formatNatural numberOrder (biology)MathematicianFigurate numberSet (mathematics)Source codeAverageMeeting/InterviewComputer animation
09:03
Local GroupEmpfehlungssystemElectric currentUniverse (mathematics)Data modelProcess modelingTranslation (relic)Form (programming)Medical imagingDevice driverSimilarity (geometry)Scalable Coherent InterfaceCone penetration testSoftware testingDatabase transactionElectronic mailing listTerm (mathematics)StrutRepeating decimalService (economics)Utility softwareType theoryAutomatic differentiationRow (database)Matrix (mathematics)Semiconductor memoryPhysical systemLevel (video gaming)Mathematical optimizationLibrary (computing)Operator (mathematics)Mechanism designSparse matrixProduct (business)Different (Kate Ryan album)Entire functionMultiplication signCellular automatonWrapper (data mining)Wave packetBit ratePurchasingTerm (mathematics)Medical imagingBlack boxAlgorithmEndliche ModelltheorieDatabase transactionInformationInteractive televisionField (computer science)Sampling (statistics)Natural numberSimilarity (geometry)Electronic mailing listData conversionProjective planeArtificial lifeComputer animation
17:45
Software testingData modelMIDIBlogProjective planeBitRight angleSheaf (mathematics)Semiconductor memoryComputer animationMeeting/Interview
18:38
Endliche ModelltheorieType theoryProduct (business)ResultantCASE <Informatik>Similarity (geometry)Multiplication signElectronic mailing listExtension (kinesiology)Matrix (mathematics)Artificial lifeDivisorMatching (graph theory)Service (economics)QuicksortStability theoryReal-time operating systemSet (mathematics)Wave packetBasis <Mathematik>BitUser profileSpacetimeProfil (magazine)Combinational logicOrder (biology)VideoconferencingComputing platformSystem callLibrary (computing)Physical systemMeeting/Interview
25:49
Type theoryLibrary (computing)Different (Kate Ryan album)Process modelingMultiplication signAlgorithmMeeting/Interview
26:53
Service (economics)Library (computing)Database transactionCASE <Informatik>Bit rateEndliche ModelltheorieNumberData conversionMultiplication signConfidence intervalFeedbackMeeting/Interview
28:26
Data modelSoftware testingResultantNumberLibrary (computing)Endliche ModelltheorieProcess modelingMultiplication signLattice (order)Computer animationMeeting/Interview
Transcript: English(auto-generated)
00:07
So I'll start by talking a little bit about myself. First of all, good afternoon to all of you. And I hope that you're all doing safe and you're all healthy whenever you are in the world right now.
00:21
My name is Gunjan. And I work as a data scientist at Gojek. I am a mathematician at heart. I have a bachelor's degree in mathematics and a master's degree in data science. I'm currently working at Gojek since the past two years. And I've been working on some really exciting problems.
00:42
And I'm here to share one of those with you. So let me start by talking a little bit about Gojek and setting a bit of context about the problem that I'm trying to showcase here. Gojek is an app that offers many services to its users.
01:01
Services like ordering food, commuting, digital payments, shopping, hyperlocal delivery, getting a massage, and two dozen other services are all offered in one app by Gojek. It is Indonesia's first and fastest growing decacorn, building an on-demand empire.
01:22
We have a headquarters in Jakarta, Indonesia. And Gojek operates in 207 cities in Southeast Asia and five Southeast Asian countries, including Indonesia, Singapore, Thailand, Vietnam, and Philippines. Now, when you have so many services rolled into one app
01:42
across so many locations, a very natural question comes up. How many people who are on our platform actually use more than one of these services? Let's assume a person, Mike, who uses our two-wheeler taxi service, GoRide, to commute to and from work every day
02:02
but has never used our car taxi service, GoCar. Hi, we have one more person. Welcome. So I was just talking a little bit about Gojek and the place I work for right now. And then I'll go on to talking about the problem that I'm trying to solve with the poster
02:21
that you see on your screen. So Gojek, at Gojek, we have many, many sources rolled into one app. So this is like ordering food, commuting, digital payments, shopping, hyperlocal delivery, getting a massage, and like two dozen other sources all rolled into one app.
02:42
So when you have so many sources in one app, a very natural question comes up that how many people actually use multiple services that we offer? So let's assume a person, Mike, who uses our two-wheeler taxi service, GoRide, to commute to and from work every day but has never used our car taxi service, GoCar.
03:04
Now, does this mean that he will never use it? Or let's take the case of Indra, who religiously orders food from our food delivery service, GoFood, and uses our digital payments platform, GoPay, to pay for them but has never taken a ride
03:20
using GoRide or GoCar? So does it make sense for us to send her a voucher for these services to see if she uses it? Now, with millions of monthly active customers, the permutations are endless. But the key point is that for us as a business, it makes sense if more and more customers use more
03:42
and more services that we offer. But at the same time, we don't want to spam our customers with vouchers or services that are not relevant to them. Imagine constantly getting notifications of a voucher from a particular app which you have no interest in whatsoever. It can surely get very annoying.
04:02
So we decided to build an algorithm that helps us figure out which customer is most likely to use which service based on their transaction history. This meant generating a targeted campaign for our customers. By targeted campaign, I mean only sending a voucher
04:22
or a cash back to a customer if they are highly probable to actually use it. This way, we get higher conversion rates at much lower cost. And we don't end up spamming our customers. Any questions till now? You can just feel free to stop me at any point if you have any questions.
04:48
OK, so let's try to zoom in to the left side of the poster and see what's going on here. So here, you can see a Venn diagram
05:00
that explains how a targeted campaign is designed. So let's say we have a universe of all Gojek users. And out of this, there will always be a base pool of the campaign that you want to run. So let's say we want to run a campaign to convert more and more people to use our food delivery
05:23
system, Go Food. Then our base pool will consist of people who are eligible for that campaign, which means people who have used other services but have not used Go Food till now. And out of this base pool, our aim is to find a target group or a target audience, which
05:43
are users who are most likely to be interested in the campaign. In this case, users who are most likely to convert to becoming Go Food users. So for any targeted cross-sell, our aim is to find this targeted audience.
06:01
So this is the basic problem statement that we're trying to solve. We're trying to find a small group of people to target for a particular campaign. Any questions till now about customer targeting or if the diagram is not clear or anything? Please feel free to ask at any point, and I'll move forward.
06:29
So now to solve this problem, our first hunch was to build a classification model. Now, a little bit about what a classification model is. A classification model is an algorithm
06:43
that is designed to divide your data set into two or more classes if such classes exist. So in our case, we had two very clear classes that existed in our data set. So for our base pool, what we're trying to do is classify our users into two classes,
07:02
customers who will cross-sell and customers who will not cross-sell. And we can only look at the customers who will cross-sell and send the voucher to those people, avoiding a huge bunch of people from that voucher. Now, classification model, when we built it,
07:20
it was working great. We were getting an average uplift for around 5x on natural conversion rates on one of our key services. But at the same time, classification model had some of its pitfalls. So to actually build a classification algorithm for a targeted campaign, we needed to train the model individually for each campaign
07:43
because the base pool itself for each campaign would change. If I want to run a campaign to target new Go Food users, then my base pool would consist of people who have used other services except Go Food. Whereas if I want to run a campaign to target new Go
08:03
Car users, then my base pool would be people who have used other services except Go Car. So the base pool is itself changing for each model, for each cross-sell campaign. Now, when that happens, you literally need to rebuild your classification model
08:22
for each and every campaign that you want to run. So clearly, this was not a scalable approach to go about it. So we needed to rethink the way we were approaching this problem and think of it as making a matchmaking, literally a match between users and customers
08:42
such that we have an algorithm that is generic enough to just such that we don't need to have a lot of effort to go from one targeted cross-sell model to another.
09:02
That's when we thought of using recommendation systems to solve this problem. Now, a recommendation system is an algorithm that uses users' past behavior or users' history to recommend items to users that they're most likely to purchase. In today's world, recommendation systems
09:22
are literally all around us. You see relevant ads, relevant products on Amazon, relevant shows and movies on Netflix. All of these have recommendation systems sitting behind them. We are using recommendation systems based on collaborative filtering as a matchmaking mechanism
09:42
between users and products. Since we have many different products, we can actually use these products as different items in a recommendation system place that we want to recommend to our users. I'm going to take a pause here for two minutes
10:02
to see if there are any more questions.
10:24
I don't see any questions. I'll move on. So I was talking about recommendation systems. So like I said, we are using recommendation systems to recommend products or we're treating each of our services as one item in a recommendation system.
10:41
Now, every recommendation system that is based on collaborative filtering has a utility matrix associated with it. A utility matrix is a matrix that captures interactions between users and items. By interactions, I mean these could be
11:02
the rating that a user has given to an item or this could be the purchase history of that user and that item. This could be the search history. This could be how many times a user has clicked an item, et cetera. This is what our utility matrix look like. So each column of the matrix is one service.
11:22
This is like a sample of our utility matrix, of course. Each column of the utility matrix is one service and each row is a customer. Now, these columns can go as granular as having a merchant as something that we're recommending.
11:42
So the cross-sell model can actually work to the granularity level of a merchant, not just a service or a payment method. So the matrix is filled like this. We have, we're looking at the transaction history of a user for that particular product.
12:02
So let's say user one has used Go Food in the last one month three times. So the sell between user one and Go Food will be filled with three. Similarly, user two has used Go Food two times and the sell between these two will be filled with two. However, user three has not used Go Food
12:21
in the past one month, so this sell will be empty. Now imagine this concept being applied across millions of customers and more than 20, 30 products. So we would have a pretty huge utility matrix at our hands and the idea is that this matrix
12:43
is going to be very, very sparse. By sparse, I mean most of the values of this matrix will be missing because as I said, we have a lot of users, but not everybody uses all the services. So wherever a person is not using a service, that particular sell will be empty.
13:03
So the problem eventually boils down to finding these missing values in the matrix. Now, if we are able to find these missing values in this matrix, we will be able to figure out whether this particular user will be interested
13:21
in this particular product in the future or not. So let's say we have a value for user one against Go Car. Let's say we're able to predict what this value is. If this value is very, very small, we might not want to recommend Go Car to this user. However, if the predicted value is very high,
13:40
we might want to recommend Go Car to this user and include that person in the campaign. Now, for the purposes of this poster, I'm gonna ask you to consider recommendation systems as a black box. I'm not going to go into too much detail about how the algorithm works, et cetera.
14:02
But just to give you an intuitive idea, we can try to understand the image here. So as you can see, user one and user three both have very similar behaviors. Both of them have used Go Ride and Go Pay Offline, and both of them have used Go Pay Offline once.
14:21
And user one has used Go Ride thrice, and user three has used Go Ride four times. Now, this is pretty similar behavior in the past one month. However, user one seem to also have used Go Food thrice. Now, what we can do is recommend Go Food to user three as well, because we have seen that in the past,
14:40
user one and user three have actually behaved similarly. So if there is a service that user one is using, we can actually use that information to recommend it to user three. So this is the basic idea behind how recommendation systems work. They work on finding similarities between users
15:00
and similarities between products based on the past behavior. Any questions still now? So like I said,
15:21
let's treat recommendation systems as a black box. And moving forward, this is what our final workflow looks like. We get our data, our one month transaction history from BigQuery. We feed this data into Pandas and do some ETL on it.
15:43
Now, doing ETL on Pandas is, like to get a utility matrix out of the raw data is not a very expensive operation. So Pandas simply on his own was working really well for it. So we decided to go ahead with that. However, building a recommendation system
16:00
on the utility matrix is quite an expensive operation. We're not only storing the entire utility matrix in memory, but we're also trying to perform optimizations on top of it. So we started off by tying out a library called Surprise in Python,
16:21
which was being used to build our recommendation system. And it was taking around six to seven hours to just build one recommendation system. So after exploring more, we found that there is a Spark ML library as well, which deals with recommendation system. And when we tried using that with a Python wrapper on it
16:43
we found that our training time had reduced from six hours to one hour. So we finally ended up using the Spark ML ALS recommendation engine with a Python wrapper on top of it. And that library basically spits out your final filled in utility matrix
17:02
from which we can infer the list of customers to target. Now, once we had this workflow in place, we decided to do some field tests on the model to see if it is actually, how it actually performs out there in the field. And we got an uplift of around five X to seven X
17:25
on natural conversion rates across service types. And currently also this model is being used to target a lot of people like you to send out a lot of vouchers for cross-selling across products.
17:41
That's about it from me. That's a brief intro to the project that I've done. There is a relevant blog post about this that I will be posting on the Discord channel in a bit. So please feel free to go through it and I'd be happy to answer any questions that you might have.
18:01
Hi, I see someone new has joined. How are you doing? Hi, I'm doing fine, thanks. Okay, I actually just finished talking about the poster right before you joined. So, if you ask any questions that you might have,
18:22
if there's any section that's not clear, I'll be happy to talk about it again. Yeah, amazing. I was in between talks basically and I missed the beginning. I'm really sorry about that. Let me just turn on. No problem at all. Let me turn on the video.
18:42
Hi. Interested by the poster. I reviewed it before the call because I knew I was not gonna have like much, much time. And I saw that you were using the Spark ALS model, basically. And I was wondering if in your setup
19:04
with the recommendation you were working on, if your products were changing quite often or if you had a kind of a stable set of products because I was thinking of using also ALS for, I tried ALS on my own problem,
19:22
on a couple of them actually, like some to do, to find similar users, some to find similar products, et cetera. And they have one use case in mind, but for that use case, like from like basically one week to another, the products are changing.
19:41
So I was wondering if I could make that work or and how was your situation basically? Okay, so currently this model is not a real-time model. It is trained on a weekly basis. So the products are not changing on a weekly basis, but customers are being added on a weekly basis.
20:00
That happens. As our customer base expands, we get more and more customers being added to the model. So, and this is not a model that is even predicting results real-time because this is currently used to generate campaigns. Now those campaigns are usually pre-planned.
20:20
So how it works is whenever a product comes up to us and asks us to give them a list of customers to target for a particular campaign, we just sort of use a pre-trained model for that particular week and give them the list of customers to target for that model, for that campaign. So it is not really real-time,
20:41
but if a weekly sort of retraining works for you, then you can definitely do something like this. Because, so in our case, like I was just talking before, we started using the surprise package when we actually started building the recommendation system.
21:02
There is a surprise package in Python which we started with. But then the problem was that the training time was around six to seven hours, which was pretty huge. And like, it was just not optimal enough for us. So then we moved on, we explored Spark a little bit and we found out that this has an ALS package
21:21
and a 10-hour training time actually reduced to around one hour after we started using Spark. So that was pretty useful for us that way. But I'm not sure how this would work real-time if that's your use case. Yeah, I mean, I've used ALS
21:41
like successfully for use case similar to yours, where I have like my users where, like basically I want to match make users and entities that they follow basically. So I have user profiles with all the entities that they follow and that's pretty stable. And thanks to ALS, I can use that,
22:01
I forgot the name of them in the API, but like you can get like the relationships or like the, I forgot the name, but you can basically, it's like with this market basket model, right? So you can get, you can basically get the next entities that the user is- Yeah, yeah, yeah. Could follow, so that-
22:22
Probability scored for the users who was the next entity. Yeah, I get that, yeah. So that was like working and I was quite happy with that. But now I'm looking at the other one, which is more like kind of like e-commerce recommended use case, where you have these products that changes.
22:41
And for that one, I'm a bit, yeah, I'm a bit stuck, but I will, I will kind of keep on thinking about this. I was thinking of maybe in order to be able to use ALS, kind of find like the similarities between the products or something like this, and then do the recommendations based on that
23:01
or something like that, but I'm not sure how well that will work, but, but yeah. Because then you could say like, you know, if the products are more or less similar, which in my case, I mean, to some extent they are, so then I could maybe, instead of matchmaking a user with a product directly,
23:21
I could matchmake the user with the, let's say the archetype of the product or something like this. But then I would need to define these things. Yeah, so I think like we also face a similar problem, but the thing with us, so the upside that you have is that
23:41
if you're working in an e-commerce space, your product similarities actually make sense. Like if a person is selling a type of clothing, then there would be other people selling similar type of clothing. So you would have something which could be used called a similar product. But for us, what we had was services
24:01
that are offered by Gojek. Now it's very absurd to say that Go Food is similar to Go Car or like a food delivery platform is similar to Go Car healing service. So that was kind of one of the challenges that we faced. So we're actually, we ended up using matrix factorization technique for this very reason.
24:21
We started by using KNN approaches. We tried those, but they're also like, it just doesn't make sense to say that, okay, one product is similar to the other in our case. Users can be similar based on their usage history, et cetera, but it's, there's no such thing as a product profile that we can use to create similar products as such.
24:44
So that's sort of where ALS has helped us because using matrix factorization, using ALS has sort of helped us cross that kind of blocker that we had
25:02
because it uses both user and item similarities. So it's like a combination of both. Yeah, yeah, definitely. Sounds great, yeah, yeah. Thanks, thanks. Actually, I had on my list to check, to check more like the product similarity techniques for that thing, like, and for the KNN, so this Annoy library that I have not used yet.
25:23
So I might, I'm just going to make a note, like actually what you said makes sense, like the ALS is more for the, when you can train on that relationship between the two and less on the, yeah. Cool, thanks. Thanks a lot for, yeah, I have to run story to add to another talk,
25:40
but I'm happy I caught you and could ask my question. Thanks a lot. Thank you. Bye-bye. Bye-bye. Hello, hi, can I ask you a question? Hey, yes, please go ahead.
26:02
Yes, Mark, hi. So you spoke about the differences in learning time between the two packages. The first one you're saying was six to seven hours, and the second one was much shorter, I think it's about an hour. But apart from the speed of concluding that process, did you determine any particular differences
26:24
in the types of outcome of those learnings, or was it purely just a speed benefit that drove you to the second package? That's a very good question. So before I answer that, I'll talk about something that was essentially different
26:42
between the two algorithms, between Surprise and the Spark ML library. So recommendation engines as an entity, they work on explicit data. By explicit data, I mean data where you have a rating given by a user to a particular item.
27:03
So for example, in Netflix, you have the rating given by a user for a particular movie or a TV show, and you use that explicit feedback to train your model. However, in our case, we didn't have that. So what our Spark ML library, ALF library does
27:22
is that it converts the explicit data into implicit data. Sorry, it converts the implicit data into explicit data first. What we had was implicit data. It was user transaction history. It doesn't necessarily indicate how a user likes or dislikes an item.
27:40
Surprise did not have that conversion. It was assuming that the data we're giving it is actually explicit data. So it was working on this underlying assumption, which was not really true for us. Now, when we moved to Spark, this is also one of the benefits that we had, which was that it was converting our implicit data
28:02
to explicit data. It was converting it into a probability score with a confidence, saying that, okay, if this person is using so-and-so service X number of times, I can say that this is the probability that he or she likes this service. So the explicitness of liking or disliking a service
28:22
was coming using that library. So overall, it kind of enriched the process. And yes, our results were also, like our accuracy numbers were much better using the Spark ML library as compared to the Surprise package, because of this one enrichment that the model was doing.
28:44
Okay, that's really interesting. Thanks, yeah, thank you. All right, guys, it's time. I am going to drop off. And I'll be there hanging out at Discord for a while. If you guys have any questions, please feel free to reach out. Thank you. Thank you for joining.
29:01
Bye-bye.