We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

NLP based Recommender System for Plone

00:00

Formale Metadaten

Titel
NLP based Recommender System for Plone
Serientitel
Anzahl der Teile
44
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
ProduktionsortNamur, Belgium

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Find out how to level up recommendations to your website based on machine learning open source Python library scikit-learn. Now that we’ve tried its simple and efficient tools ourselves, we will show you hands-on how you can benefit from them. We developed a useful add-on for both Plone Classic and Plone Volto. Get smart content recommendations by using basic Natural Language Processing to integrate this content recommendation system which is accessible for everybody.
EmpfehlungssystemTurtle <Informatik>DokumentenserverTypentheorieBildschirmmaskeZweiOrdnungsreduktionMailing-ListeDickeVolumenvisualisierungWeb-SeiteÄhnlichkeitsgeometrieVektorraumSchnittmengeDigitaltechnikKlon <Mathematik>Data MiningNotebook-ComputerWinkelKartesische KoordinatenTrigonometrische FunktionMultiplikationsoperatorCodeParametersystemComputerspielPaarvergleichBitSymboltabelleKlassische PhysikWort <Informatik>EindeutigkeitInhalt <Mathematik>Automatische IndexierungZählenIndexberechnungDimensionsanalyseSprachsyntheseMereologieElement <Gruppentheorie>Automatische HandlungsplanungRechter WinkelLokales MinimumFront-End <Software>TabelleInformationATMCoxeter-GruppeZahlenbereichRechenbuchSoftwaretestNatürliche SpracheReelle Zahlp-BlockVirtuelle MaschineDifferenteRegulärer GraphProzess <Informatik>CASE <Informatik>LaufzeitfehlerKreisbewegungBimodulMathematikArithmetisches MittelFormation <Mathematik>AbstandResultantePhysikalisches SystemOpen SourceQuaderAuswahlaxiomEntscheidungstheorieDefaultArithmetische FolgeInteraktives FernsehenProgrammbibliothekSoftwareentwicklerRauschenAutomatische DifferentiationDreiecksfreier GraphImplementierungAlgorithmusComputeranimation
Turtle <Informatik>EmpfehlungssystemComputeranimationVorlesung/Konferenz
Physikalisches SystemSoftwareentwicklerVorlesung/Konferenz
Physikalisches SystemNatürliche ZahlProzess <Informatik>Physikalisches SystemÄhnlichkeitsgeometrieInhalt <Mathematik>Prozess <Informatik>Natürliche SpracheInteraktives FernsehenSoftwareentwicklerComputeranimation
Physikalisches SystemNatürliche ZahlProzess <Informatik>Open SourceOpen SourceProgrammbibliothekImplementierungVirtuelle MaschineComputeranimation
Natürliche ZahlProzess <Informatik>Singularität <Mathematik>ProgrammbibliothekOpen SourceÄhnlichkeitsgeometriePunktEindeutigkeitWort <Informatik>Hausdorff-DimensionVektorraumAusgleichsrechnungZählenLokales MinimumMultiplikationsoperatorRechenbuchTabelleWort <Informatik>VektorraumDickeDimensionsanalyseWinkelZählenÄhnlichkeitsgeometrieFitnessfunktionFeldrechnerDifferenteCodeArithmetisches MittelBimodulComputeranimation
BimodulÄhnlichkeitsgeometrieIndexberechnungAbstandAutomatische IndexierungSinguläres IntegralHochdruckVektorraumMailing-ListeAbstandBimodulIndexberechnungTrigonometrische FunktionParametersystemWinkelComputeranimation
IndexberechnungAbstandÄhnlichkeitsgeometrieResultanteComputeranimation
AbstandAutomatische IndexierungSinguläres IntegralHochdruckImplementierungKlon <Mathematik>Physikalisches SystemKlassische PhysikComputeranimation
AuswahlaxiomEntscheidungstheorieTypentheorieVorlesung/Konferenz
ImplementierungSpezialrechnerSchmelze <Betrieb>SpieltheorieDefaultInformationGamecontrollerElement <Gruppentheorie>Web-SeiteQuaderAutomatische IndexierungSchnittmengeZählenInhalt <Mathematik>Computeranimation
PrimidealSchnelltasteSpezialrechnerIkosaederElement <Gruppentheorie>InformationATMRechter WinkelAutomatische IndexierungZahlenbereichLokales Minimump-BlockMathematikWeb-SeiteZählenSchnittmengeInhalt <Mathematik>Front-End <Software>Klassische Physik
ImplementierungMereologieRechter WinkelAutomatische HandlungsplanungComputeranimation
MereologieSprachsyntheseDokumentenserverMailing-ListeRechter WinkelOrdnungsreduktionZweiSoftwaretestTypentheorieBildschirmmaskeWort <Informatik>EindeutigkeitDimensionsanalyseMereologieSprachsyntheseInhalt <Mathematik>Computeranimation
LaufzeitfehlerNotebook-ComputerSchnittmengeData MiningKartesische KoordinatenWeb-SeiteIndexberechnungZweiVorlesung/KonferenzBesprechung/Interview
Lokales MinimumDifferenteÄhnlichkeitsgeometriePaarvergleichVorlesung/Konferenz
CodeMultiplikationsoperatorVorlesung/Konferenz
Hill-DifferentialgleichungTurtle <Informatik>EmpfehlungssystemVorlesung/KonferenzBesprechung/InterviewComputeranimation
Transkript: Englisch(automatisch erzeugt)
It's a bit starting.
Test, test. Test, test. All right. Now, I present you Mr. Gautreaux, please. Sorry. We start.
All right. I present to you Jan Mevicin and Richard Brown from, there is, sorry. Plone developer at Interactive GmbH from Germany.
And he talk about NLP-based recommender system for Plone. Hello and welcome to our talk about NLP-based recommender system for Plone. We will have this talk together. My colleague is Richard Brown.
My name is Jan Mevicin. We are Plone developers working at Interactive in Germany. We like Scrum and we love Plone. This talk is about the content recommendation system we built for Plone and Volto,
where we use natural language processing capabilities of scikit-learn to automatically suggest content based on text similarity. So the talk will be in three parts. First, we take a look at how you can use scikit-learn to calculate text similarities.
Then Richard will demonstrate our add-on in Plone and in Volto. At the end, we will talk about future plans and other implementations.
First, to start off, scikit-learn is an open source machine learning library for Python. And we will show an example of using scikit-learn to calculate text similarities. For this presentation, we use Jupyter Notebook, so the code will all be real-life Python.
First, we define some known text. We have a text A and a text B. Text A is the house is green, the house is nice, the house is house. Text B is a green frog in the greenhouse.
The goal is to calculate the similarity to an unknown text. We define an unknown text here as the greenhouse on the street and the house is full. To make the text calculable, we use vectorization.
The vectorization process basically runs like this. First, we create a vocabulary of all unique words, also known as a bag of words, and use each word as a dimension of a vector space.
Then we can transform the text to vectors of this vector space and can then calculate differences between these vectors. So with scikit-learn, we use count vectorizer.
First, we initialize it and refer it to the known text, so text A and text B. Fitting the vectorizer creates a vocabulary of all unique words in both of the texts.
With this vocabulary, we can transform the text to vectors. For example, in this picture, we plotted the vectors for the word house and the word green.
We can see that the count vectorizer basically counts the words. For example, the word house occurs four times in text A and only one time in text B. All of the values we put in the table.
Now that we have a means of transforming text to vectors, we can start calculating the differences.
For calculating the differences between the vectors, we consider the angles between the vectors. So basically, the smaller the angle between the vectors, the higher the similarity between the texts. This approach is practical because the length of the vectors, therefore the length of the text doesn't matter.
So we only consider the angles between the vectors to calculate the similarities. For calculating the similarities, we use the nearest neighbors module of scikit-learn.
The nearest neighbor module is a tool to get the most similar vectors from a list of known vectors. The nearest neighbors module returns the list, indices of the nearest vectors and distances to the unknown vectors.
In code, it basically looks like this. We import the nearest neighbors module. We initialize it with a parameter cosine so that it considers the angles between vectors, not the distances.
Then we use our vectorizer to transform the known text, text A and text B, and use it to fit the nearest neighbor module. Then we transform our unknown text to an unknown vector and then calculate
the distances using the nearest neighbor module we already fitted to the known vectors. In the example, you see the results.
The distance from the unknown text to text A is calculated as 0.6 and so on. The distance to text B is calculated to 0.3. The similarity of the unknown text to text A is higher than the similarity from the unknown text to text B.
That is basically how you can use scikit-learn to calculate the distances and the similarities of texts.
We will use this in our clone add-on. This was just about how the recommendation system works underneath. Now we will take a look at how it looks like in clone classic and Volto.
After you have installed the package, you have to add the recommendations behavior to the content type of your choice manually. The behavior won't be added by the package setup because we wanted to leave the decision to the user.
For the presentation, we have added the behavior to the default document content type. The page is already open. You can reach it over the control panel and just check the box for the recommended behavior.
After adding the behavior, a viewlet appears below the content. We have already created an example. Here are the recommendations. We have text for testing purpose, my car is nice and slow, and there are the recommendations.
The first one is nice gift code, selling a car through a car hunter is the second one, and so on. And you can also configure the appearance of the viewlet in the control panel.
Here are the settings for the recommendations, and at first you can see some information about it's still work in progress.
That's why it's not perfect now, but we can see the count of all the document elements we have. Which were indexed by the recommendations algorithm, and we have a count for new elements.
The new elements will be added if you click on the refresh button. It's still necessary that you click manually on refresh if you add new content, but we look forward to add something to do this automatically.
But for now you have to click on the refresh button to refresh the recommendations and update the index.
And there's another button to import data set for testing purpose. It's offered by scikit-learn, and we are using it in our presentation. A lot of data used by the recommendations, that's the reason why here's the number of 983.
And yes, you can change the number of elements appearing below the viewlet and activate debug mode to see more information.
Right now we have three elements and we see more information. For example, which position the element has in the index and which distance it has to the content we are looking at.
So the first element is the nearest neighbor in this case. Right. In Volto it's quite similar.
After adding the package to the build out and package JSON, you have to enable the behavior just like I showed before. And afterwards you can add a block called recommendations to your page.
I will just do it. We have already recommendations block and now we can see the recommendations. You can also configure the block.
For example, change the title or change the max element count and save it. Right. And we have also a settings page in the control panel of Volto and you can just do the same like in Plone Classic in the back end.
Right. Now the future plans.
Other future plans are auto tagging to create text with text based on the content and dimensionality reduction to reduce the size of the back of words. Which Jan mentioned before the back of words or the list of unique words.
So we want to decrease it to increase the performance. But dimensionality reduction is possible with part of speech filtering for example. Which categorizes the words by type.
Types like verbs and nouns. And the dimension can also be reduced by lemmatization which breaks the words down to the base form. For example, walk instead of walking.
Yes, right. The package is already available in our GitHub repository. You can find it at github.com slash interactive. recommendations. And for Volto you also need the second package Volto recommendations.
Feel free to test and use it. Thanks for your attention. Are there any questions?
So the question was execution time is a problem. And so we tested it with the data set we got from scikit-learn.
And it has about 6 to 7,000 pages which we automatically created. And the performance of rendering for example the viewlet is very good.
So everything is already there. And updating the indices and rectorizing. I think it took about 20 seconds when I wasn't speaking to a colleague of mine on GitHub.
Not on GitHub, on Discord. With other applications running on my laptop it was a bit slow. But it's quite fast because it's kept quite simple.
Does it also make sense to extend it to the search of Plone to have this recommendation system? Maybe, but it's a bit different. There was a talk yesterday about searching in Plone.
I think that our approach would be much too simple for really good searching or searching that might be better than using Solr. And this is I think best used really for simple similarity comparison of text.
So I guess if you want to get a better searching you need to... I think it was nuclear DB? I think you should have a look at that.
So this is basically... The idea is basically to keep it simple and make it easy installable and easy to use.
The question is if we already use it in live. That's the first time we presented to this. So I think it needs to be tested first.
By others and us. The code is already available on GitHub and Testable. No more questions?
Jan, Richard, thanks.