FluRS - A Library for Streaming Recommendation Algorithms
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 43 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/38181 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Erlangen, Germany |
Content Metadata
Subject Area | |
Genre |
4
5
6
7
9
10
11
14
28
29
32
33
34
35
38
39
41
43
00:00
AlgorithmStreaming mediaLibrary (computing)LaptopTwitterPhysical systemWordPersonal digital assistantRight angleSlide ruleComputer animation
00:14
Streaming mediaAlgorithmLibrary (computing)Library (computing)Physical systemPatch (Unix)CommutatorBitComputer animation
00:26
Library (computing)Virtual machineProcess (computing)Computer animation
00:47
BuildingGoogolStandard deviationSystem programmingPhysical systemRepresentation (politics)Personal digital assistantXMLComputer animation
01:04
System programmingMatrix (mathematics)Physical systemMatrix (mathematics)AlgorithmRepresentation (politics)Interactive televisionSimilarity (geometry)NeuroinformatikArm
01:24
Similarity (geometry)Row (database)Similarity (geometry)Interactive televisionRepresentation (politics)Matrix (mathematics)MereologyInformationComputer animation
01:57
Set (mathematics)AlgorithmSimilarity (geometry)Vector spaceFunctional (mathematics)Characteristic polynomialInformation securityComplete metric spaceMereologyMatrix (mathematics)Connectivity (graph theory)NumberComputer animation
02:20
Mathematical singularityCharacteristic polynomialSingular value decompositionHelmholtz decompositionMatrix (mathematics)Computer animation
02:38
Letterpress printingApproximationRepresentation (politics)Matrix (mathematics)Operator (mathematics)DivisorLine (geometry)MathematicsProduct (business)System callResultantFactory (trading post)Vapor barrier
03:08
Term (mathematics)MatrizenzerlegungSingular value decompositionMachine learningMatrix (mathematics)CodeVirtual machineComputer animation
03:27
Shape (magazine)Representation (politics)Element (mathematics)Letterpress printingRandom numberMatrix (mathematics)Element (mathematics)DivisorMathematical singularitySoftwareMetric systemPhysical lawPredictabilityComputer animation
04:06
AlgorithmEndliche ModelltheorieSoftware
04:30
Random numberLetterpress printingScalabilityAerodynamicsDivisorMatrix (mathematics)Message passingSingle-precision floating-point formatAlgorithmMultiplication signCategory of beingScaling (geometry)Ocean currentWordMathematicsDatabase transactionReal-time operating systemSeries (mathematics)Computer animation
04:59
Student's t-testTape driveScalabilityField (computer science)Database transactionSeries (mathematics)
05:19
Data modelSystem programmingReal-time operating systemWeb serviceFeedbackInteractive televisionEndliche ModelltheoriePhysical systemDifferent (Kate Ryan album)State of matter
06:00
FluxSystem programmingFluidPresentation of a groupComputer animation
06:13
Flow separationLatent heatPerformance appraisalCodeImplementationAlgorithmVariety (linguistics)Numbering schemeLatent heatIntegrated development environmentCodePhysical systemAlgorithmOrder (biology)Endliche ModelltheorieImplementationRepresentation (politics)Performance appraisalComplex (psychology)Network topologyBounded variationStreaming mediaComputer animation
07:31
Data modelPrice indexVector spaceContext awarenessPhysical systemAlgorithmoutputInteractive televisionPerformance appraisalEvent horizonVirtual machineDivisorMachine visionNetwork topology
08:21
PredictionData modelFactorizationDivisorAlgorithmVirtual machineComputer animation
08:32
ComputerContext awarenessReal-time operating systemInformation retrievalFactorizationMathematicsTerm (mathematics)Matrix (mathematics)AlgorithmAerodynamicsPerformance appraisalSystem programmingSelf-organizationFile archiverInstance (computer science)Social classComputer animation
08:51
Event horizonEndliche ModelltheorieEvent horizonGraph (mathematics)Context awarenessFile formatVirtual machinePhysical systemPropositional formulaSystem callDivisorVector spaceSource code
09:24
Interpreter (computing)Numbering schemePerformance appraisalFluid staticsSample (statistics)Physical systemGraph (mathematics)Computer animation
09:47
CountingInformationInformation retrievalLibrary (computing)E-textWeb pageScaling (geometry)ImplementationProduct (business)AlgorithmConnectivity (graph theory)Numeral (linguistics)Physical systemFile format1 (number)Order (biology)Right angleStreaming mediaComputer animation
10:37
AlgorithmCodePortable communications deviceHybrid computerStreaming mediaSoftware developerUniversal product codeCombinational logicProcedural programmingError messageOrder (biology)AlgorithmComputerHypermediaPhysical systemSoftwareNetwork topologyConnectivity (graph theory)Source codeProduct (business)XMLUML
11:29
Java appletSystem programmingPhysical systemSoftwareOpen sourceNumeral (linguistics)Computer animation
11:46
Data analysisRow (database)SpacetimeMultiplication signPhysical systemComputer clusterScaling (geometry)XML
12:27
CountingExtension (kinesiology)AbstractionInformationInformation retrievalE-textWeb pageSet (mathematics)Feature spaceNumberProcess (computing)Multi-core processorComputer animation
13:03
Computer animation
Transcript: English(auto-generated)
00:06
Good morning, everyone. My name is Takuya Akitazawa. And this slide is already uploaded, so you can find it on Twitter. All right. Today, I'm going to guide you to the world of recommender systems in this presentation, along with introducing
00:21
my experimental Python library for recommender systems. And my name is Takuya Akitazawa, and I'm a data science engineer at the startup company. And I'm a committer of Apache Hive Mall machine learning library project, but unfortunately, the library is written in Java,
00:41
so I cannot talk about it here. So all right, so let's move on to the Python. And I'm going to talk about recommender systems. One of the most famous examples is something like on Amazon. And are there anybody who are familiar with what's going on behind the system? OK, I can see a few people.
01:02
I'm going to talk about it. OK, in this talk, I will talk past, present, and the future of recommender systems based on my personal experience. And historically, recommender engine has been built upon matrix representation of user item interactions.
01:21
So for example, one of the most famous algorithm named collaborate refiltering. So we can represent user item interactions into a matrix, and by computing similarity between users, so rows of matrix, or items, so columns of matrix,
01:41
can be find similar users or similar items. Based on the information, we can make recommendation, what kind of user is similar to this user. In Python, collaborate refiltering can be implemented very easily. So similarity function just compute cosine similarity
02:04
between two vectors. So if you define a vector as a NumPy array, you can simply compute similarity between them. And based on the value, we can find a user who has same taste in that item set. Another algorithm which is more mathematically tractable
02:22
is singular value decomposition. Based on the certain mathematical operation, we can decompose original matrix into tiny components, which represents user or item characteristics. It also can be implemented in Python very easily. So we define a matrix, and NumPy has SVD method.
02:46
So we called it. As a result, we have some factors. By computing product of that, we have some approximated matrix of the original matrix. In the matrix, we have a lot of values. So based on the value, we can figure out
03:01
which user item pair is more promising on the data set. And third example is matrix factorization. This is more like machine learning. So the basic concept is very close to singular value decomposition. So we wanna decompose into a factor, so characteristic matrix by using certain operation,
03:23
but it's more like a machine learning style code. So in Python, for the same data set we've seen before, something like matrix factorization can be implemented like that. So we first, I don't wanna talk detail about it, but we first randomly initialize factored matrix P and Q.
03:44
And after that, by iterating each element in the original matrix, we update the factored matrices. And eventually, similarly to singular value decomposition, we can get some predictions. And based on that, we simply make recommendation
04:01
based on which user item pair is promising. And importantly, probably you know Netrix as a company, and Netrix plays a really, really important role in the field. Netrix, 2006, they held a competition
04:21
which offered a wide recommendation algorithm on their data set. And some winning models are very close to something like I've talked before, matrix factorization stuff. But recently, someone reported Netrix never implemented the winning solution itself. Why was that?
04:41
Because single recommendation algorithm cannot scale enough on their world situation. And of course, user's characteristics, our interests, or item properties can change as time passes. So in that sense, one of the most critical problem
05:01
in the field is how to handle more complex and real-time data on a series of transactions. So I basically focused on this kind of topic and I implemented the Python package or I load some research papers before. More formally, I define this topic
05:22
as a streaming recommendation system. So as usual, a recommender first recommend top-end items to a user, something like Amazon Recommender, based on certain kind of recommendation technique. And after that, we, so users, interact some items on the web services.
05:41
And here's the important thing, that kind of interaction could be immediately feedback to the model behind the system and the model is incrementally updated on the fly. This is the difference between the classical one and the state-of-the-jet recommendation technique. And my package, FLURS, F-L-U-R-S, implements
06:04
a some kind of streaming recommendation technique. You can install the package by PIP and also I have a GitHub. There are three basic concepts behind the package. Most importantly, I provide unified data presentation.
06:22
Recommender system itself is strongly depends on the data set, so what company or what kind of data set your company have. So in order to handle such kind of complex data, FLURS, my package, provides a single unified data representation as our user or item classes.
06:42
And also my implementation is somehow algorithm agnostic. So in order to implement recommender systems in Python, we probably need to implement very similar code repeatedly to make recommendation or to register users
07:01
on recommender system. So I don't want to write something like that repeatedly, so I completely separated recommender specific implementation and model code itself. So you can extend the package by implementing some new algorithms on it. And also we want to evaluate the accuracy of recommendation in a streaming environment.
07:22
So there is some appropriate evaluation scheme in the research field, so I implement that scheme on the package. And this is the overview of the algorithm internal code. We have user and item separated single classes,
07:41
and each user item interaction will be represented as event. It contains both user and item. And importantly, like I said, recent input to the recommender system is more complex. Each user item can have a feature vector in it. Or event can have context vector in it.
08:02
And it's input to the recommender, and behind it we have a model. And by injecting recommender itself into evaluator, we can see some evaluation result. And how to handle feature vectors into recommender algorithm is really important. And there's one of the most famous example algorithm
08:23
named factorization machine. This is an algorithm to realize feature-based recommender. So I extended this algorithm into an incremental variant in the first, so you can find a paper on archive. And I also implemented this on first, this package. So using this package is relatively easy,
08:42
and you can linearly write a code, something like that. So you first need to define a user and item based on the corresponding classes. And next, you need to create a recommender instance. In this case, I choose factorization machine, so I will call FM recommender. So we need to initialize it and register user and item.
09:04
And next, we need to update the model based on the event. And finally, we can make recommendation with passing some context as a feature vector format. This is called context-aware recommendation. That's why I want to do that.
09:23
And evaluator can show something like this kind of graph. This graph is coming from my previous paper, but you can plot things like that by using my evaluator, which is particularly appropriate for streaming recommender system.
09:41
And like I said, extending this package is relatively easy. So after I implemented the package, I wrote another paper by using the package and implemented a new algorithm on it. So finally, I hope I have a couple of minutes to talk about the future. And scaling recommender implementation
10:01
into a production is very difficult. And Netflix said everything is recommendation in these days. So you can see personalization everywhere in a wide variety of format. So we need to combine a couple of recommendation algorithms or numerous recommendation algorithms
10:22
or techniques into a real-world systems. So, okay, I created this package. It's awesome, you can use it on your production. It's no way. So you need to combine the other components somehow. So that's why I implemented the package in Python. One aspect, in order to handle
10:42
streamed huge amount of data, you probably need to implement some streaming infrastructure by using something like Spark streaming or Amazon Kinesis or stuff like that. And Python and these kind of components, middlewares, really have a good combination.
11:00
And also, in order to improve algorithm itself, we wanna try and undergo some try and error procedure repeatedly. So Python makes it easy, it's obvious. And most of you knows Python itself is really portable, so integrating with production code is relatively easy.
11:23
So my package will be updated based on these kind of aspects in the future. And finally, I wanna mention, there are numerous other open-source software for recommender systems. So you can get some inspiration from them and you can customize, or you need to customize,
11:41
your own recommender systems by using some of them. That's it, thank you for listening. For maybe a couple of quick questions.
12:00
So I was actually really curious, how well does the system that you've built, because it's Python and it seems that each individual user has its own feature space, how well does it scale to really large data sets, like millions of rows and columns potentially?
12:21
Yeah, that's the thing what I'm thinking about. Actually, I did some experiment to write this kind of research paper onto this package, but if feature space is very long or large, or the number of data sets is gonna be large, I encountered some problem related to running time problems.
12:42
So yeah, I need to improve that. I don't know, I have no clear solution to it yet, but probably it might be integrating some size-only related thing or some distributed process or multi-core support, I'm thinking about it,
13:01
but it's not yet.