Contextual Personality-Aware Recommender System Versus Big Data Recommender System
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 30 | |
Author | 0000-0003-4392-0205 (ORCID) | |
License | CC Attribution 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/53690 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
11
00:00
Computer animation
00:42
Computer animation
01:18
Computer animation
06:02
Computer animation
07:15
Computer animationTable
Transcript: English(auto-generated)
00:00
So today I'm gonna present my recent work on the contextual personality over recommender system versus big data recommender system. I'm working for Poznań University of Economics and Business in Poland. Okay, so the agenda for this presentation is rather standard. So I'll start with the motivation for this research.
00:22
I'll present a little bit of the literature review. I will discuss the experiment design and I will present the implementation of the personality prediction engine with the product recommender engine along with the evaluation results and we will discuss the future work.
00:43
Okay, so starting with the motivation. While we're doing online shopping, we can be overwhelmed by the number of products and the variety of products. So it is sometimes difficult to find the right product that we are looking for. And that's why we need a product recommender systems.
01:04
And especially in commerce, it is very profitable domain because it increases the user satisfaction and it increases the sales. So there's a high business interest in improving the recommender system algorithms. On the other hand,
01:20
we've got the personality theorist researchers that claim that personality of a human can significantly influence the behavior of the customers. So this is very important thing regarding the product recommendations.
01:41
And moreover, those personality traits can be even inferred from the digital traits, from the digital footprints, such as social media, such as text reviews. So the personality of a user can be inferred from those sources as well. So I have decided to do some research
02:04
based on those two topics. So I have prepared a literature review on the customer personality traits identification and the personality-based recommender systems. For this literature review, I have chosen tools such as Google Scholar, Springer Link database, IEEE database,
02:23
and the Mandalay for forward and backward citation search. And the keywords were related to the personality traits and identification of the personality-based recommender systems in the state of the art.
02:41
So starting with the customer personality traits identification. Summarizing this literature review, I have identified that the most common model for personality differentiation is the five-factor model. And sometimes it is also known as a big five model.
03:01
And this model introduces, this consists of the five main traits, and openness, consciousnessness, extroversion, agreeableness, and neuroticism. And each person, each customer, can have different levels of those dimensions.
03:24
And this model is frequently used because it is highly business applicable. And the customers that are homogeneous frequently have a similar behavior. That's why I have chosen this model
03:40
for personality recommender systems, to merge the personality things with the recommender systems. And those traits can be identified either using explicit techniques or implicit techniques. If we are talking about explicit techniques, we're talking about the psychological questionnaires.
04:02
And this is usually the most reliable source. However, it takes time to fill up this questionnaire in like 15 to 20 minutes, depending on the questionnaire, it takes time. Therefore, it is not convenient for the users that, for example, are doing online shopping.
04:21
Therefore, there are also implicit techniques such as identification of the digital footprints, identification of the personality traits from the digital footprint, such as social media, user written text reviews, or even speech or video. The other area of the literature review
04:42
was related to the personality-based recommender system. And here, in fact, there were a couple of attempts to incorporate the personality traits into the recommender systems. And the two main, the most frequently used techniques
05:00
to incorporate the personality traits into recommender system, convert pre-filtering or incorporating the personality information into the algorithm itself. So those are the two techniques. However, the pre-filtering, it means that we are dividing the customers into homogeneous groups.
05:23
That's, for example, customers with the high extraversion or a low extraversion, so those homogeneous groups. And for each group, there is created a different personality-aware recommender system. And the second approach covered just implementing
05:41
this info into the algorithm itself. However, most of the researchers agree that incorporating the personality information can improve the quality of the recommendation. And there's a significantly growing number of the papers. So it is a hot topic right now
06:01
in the recommender systems. However, there is a research gap here that I identified. Most of the studies related to the personality based recommender systems are using a small data. So it is often up to 100 users only. So this is significantly, it is not a big data. So those technologies are not leveraging big data techniques.
06:25
And often those algorithms that are presented in the previous work are not easily scalable. So it is difficult to work on a couple of millions of records or even customers using the presented algorithms.
06:41
Therefore, the aim of the study was to build and evaluate two artifacts. The first one was the contextual personality-aware recommender system that was based on the pre-filtering. I have chosen this technique because it is the starting point and it is simple to implement
07:00
even for the big data scale. And the second technique was based on the whole data set, ignoring the personality traits. And I have called it a big data recommender system. And then I have compared those two artifacts. So here is the experiment, this design of the experiment.
07:21
I was working on Amazon reviews data sets. I will talk about this data set on the next slide. I was doing some pre-processing. Then I have extracted the personality information based on the text reviews from the Amazon data. And at the same time, I have extracted the customer purchase history.
07:41
I have merged the purchase history with the identified personality traits. Then I have divided the whole data set into the training and test subsets. And here we can see two lines. So on the first line, I have created the recommender system
08:04
that was homogeneous according to the identified personality traits. So I have divided it in a few homogeneous subsets. And for each subset, there was a different recommender system. Of course, during the training, I was doing the cross validation
08:24
with the hyper parameter tuning to increase the accuracy of the model. And on the second line, I was doing the big data recommender system that was ignoring the personality traits. And so there was only one big model
08:41
that was trained on the whole data set. And then I have in the study, there was a comparison between those two techniques and those two artifacts. The data set, the initial data set covered 233 million Amazon reviews.
09:01
The data set was retrieved from the University of California, San Diego. It is publicly available. However, it needed a heavy pre-processing because it was a nested JSON. But for the final subset, I have decided to use only the last two years
09:20
of this data set to decrease the computation time. However, it is still a significant advantage comparing to the previous work on the recommender systems. And I have chosen the only verified purchases. So I was sure that indeed this user, in fact,
09:42
purchased this item while doing this review. Okay, so the two main artifacts. The first, the personality prediction engine. For the identification of the personality, I have used an already pre-trained model published by previous researchers
10:02
that was trained on the four different data sets. So three of them are publicly available, but the last one described data from the Reddit is not publicly available. However, it is a very diversified data set. So I think it is a good starting point for training.
10:23
The accuracy of the model is within the range of the state-of-the-art models for identifying the personality. And this model was used by me to identify the personality traits from the Amazon text reviews. On the other hand, there is a second artifact,
10:41
the product recommender engine. And the technique behind this engine was alternating the least squares with the matrix factorization. I was using the PySpark for big data scalability. And so those are the two approaches. The first with the pre-filtering.
11:01
So the homogeneous subsets that were created according to the identified personality traits. And the second one was ignoring the personality traits using the whole data set to train the model. Evaluation results show that the customer, that the personality where recommender system,
11:23
in fact, using the homogeneous subsets were doing significantly lower. So root mean square error was worse than the big data recommender system, while we're talking about the disjoint datasets.
11:40
So here, the threshold was set to 0.5. So if the extra version was, for example, more than 0.5, there was scored a one in this table, and if it was below 0.5, it was zero. So those are the homogeneous datasets according to the personality. And those are the two techniques,
12:01
the personality-aware and the big data recommender system. And we are talking about the weighted average. Unfortunately, the personality-aware system was doing a little bit worse than the big data recommender system. While we're talking about the overlapping datasets, so ignoring the asterisks are here
12:21
representing just zeros or one, it doesn't matter. So if we are talking about overlapping datasets, the customer personality recommender system was doing a little bit better. However, it is still a little bit worse
12:40
than the big data recommender system. So in fact, the training size does matter very much. This is significant, significantly influences the results here. And regarding the limitations. So here, the study also has limitations.
13:01
So first of all, I was using only a verified purchases because I didn't have the Amazon history purchase. So I was only working on that verified purchase reviews. I was treating the personality traits as a binary variable, not continuous. So this is another limitation of the study.
13:24
And the account for reviews that I have, so the reviews that I was gathering might come from the account that was shared with others, for example, the family members. So this is another limitation of the study.
13:40
And the last limitation covers that the model for the personality identification was trained on a slightly different text than the Amazon reviews. However, I think those texts were pretty much diversified from the essays to some social media.
14:01
So I think it is a pretty good starting point for the personality identification. But this is, of course, another limitation. So what about the future work? This research is still ongoing. I have already some promising results, but they are not published yet.
14:21
But the future work covers three main points. The first point, the further segmented fragmentation of the personality traits. So rather than just having the binary states like zero for extraversion or one for extraversion, I'm planning to do the more fragmented way,
14:40
like a high, medium or low dimensionality of the personality, or even treating them as a continuous variables. I'm planning to explore different techniques for incorporating a personality trait into recommender system. So right now I was starting with the pre-filtering,
15:02
but right now I'm trying to also implement different techniques than the pre-filtering. However, then those techniques need to be implemented in the big data scale. So it is another difficulty here. And the last thing, I'm also trying to infer some similarities
15:23
between the way the users write reviews and if those similarities can be correlated with the purchasing behavior. So the NLP techniques need to be addressed here. Okay, so that's it from me. I'm right now happy to take your questions.
Recommendations
Series of 2 media