NLP based Recommender System for Plone - TIB AV-Portal

NLP based Recommender System for Plone

00:00

2

Plone Foundation

Mevissen, Jan Braun, Richard

Formale Metadaten

Titel

NLP based Recommender System for Plone

Serientitel

Plone Conference 2022

Anzahl der Teile

44

Autor

Mitwirkende

Nuyens, André (Moderation)

Lizenz

CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/60243 (DOI)

Herausgeber

Plone Foundation

Erscheinungsjahr

Sprache

Produktionsort

Namur, Belgium

Inhaltliche Metadaten

Fachgebiet

Genre

Abstract

Find out how to level up recommendations to your website based on machine learning open source Python library scikit-learn. Now that we’ve tried its simple and efficient tools ourselves, we will show you hands-on how you can benefit from them. We developed a useful add-on for both Plone Classic and Plone Volto. Get smart content recommendations by using basic Natural Language Processing to integrate this content recommendation system which is accessible for everybody.

Plone Conference 20223 / 44

1

39:40

How Quaive changes the way we work together

2

27:40

Using Plone to build a business application

3

21:00

NLP based Recommender System for Plone

4

35:51

Juggling the development of package-rich Python projects with mxdev

5

50:05

Embedding BPMN-driven business processes into Plone

6

49:12

What's new in Plone 6 classic UI

7

36:45

Searching video contents in Plone

8

35:40

Nobody cares about Plone: Selling a Plone website to somebody who doesn't care

9

32:57

EEA Faceted Navigation and Plone 6

10

23:30

Creating a Volto Theme

11

47:15

Plone Conference 2022 - Plone Annual Meeting, Lightning Talks & Greetings and Announcement

12

59:20

Plone Conference 2022 - Lightning Talks & Greetings and Announcement

13

15:15

When Plone 6 Classic boosts local government websites

14

50:55

Nick: A Nearly Headless CMS

15

51:35

DevOps Bird's Eye View on the Plone 6 Backend

16

39:20

A long time ago in an University far far away in Bologna

17

49:25

Welcome & No ICT without mineral extraction

18

34:10

The past, present, and future of how we test Plone

19

44:55

Reflections on diversity and what we've learned from a pandemic

20

34:40

NucliaDB - OSS cloud native database for unstructured data

21

38:20

New Plone Use Cases at University of Jyväskylä

22

24:05

Brazil still loves Plone

23

34:55

Anatomy of a Volto project

24

29:05

Rai Way: Plone 6 supporting the world of italian information, sports and entertainment

25

51:28

Plone 6 as part of video capability

26

33:50

Module Federation in Plone

27

19:23

Building Frontends at Scale

28

52:18

University of Oxford - Plone 4 to Plone 6 - Upgrading the beast

29

49:20

Plone for managers: how to achieve good ROI for your organisation

30

21:15

Plone 6 frontend and backend automated release and unified changelog

31

49:15

How Plone Powers Hundreds of Websites at one of the Largest Research Institutions

32

33:15

Beautiful Plone - Theming by use case for Classic UI

33

24:48

Extending the Ansible Playbook for Training

34

19:03

Introducing Content Rules and URL Management

35

33:19

Inline migration from Plone 4.3 to 6.0

36

24:22

How we created, deployed and updated over 200 websites at iMio with no downtime.

37

16:49

Guidelines for a BLENDED REMOTE INTERNSHIP in IT companies

38

45:02

A Deep Dive Into Internals Of Volto

39

20:34

How to deploy Volto sites automatically in a no-docker scenarios

40

49:02

Plone 6 Beyond 2022

41

31:13

Documenting Plone 6, or how to lead by getting out of the way

42

56:00

Plone Conference 2022 - The Plone Newsroom - Live

43

51:14

Plone Conference 2022 - Welcome & State of Plone

44

31:05

Plone Conference 2022 - Fast tests

Automatisches Abspielen

Sprache

Text

Bild

00:00

EmpfehlungssystemTurtle <Informatik>DokumentenserverTypentheorieBildschirmmaskeZweiOrdnungsreduktionMailing-ListeDickeVolumenvisualisierungWeb-SeiteÄhnlichkeitsgeometrieVektorraumSchnittmengeDigitaltechnikKlon <Mathematik>Data MiningNotebook-ComputerWinkelKartesische KoordinatenTrigonometrische FunktionMultiplikationsoperatorCodeParametersystemComputerspielPaarvergleichBitSymboltabelleKlassische PhysikWort <Informatik>EindeutigkeitInhalt <Mathematik>Automatische IndexierungZählenIndexberechnungDimensionsanalyseSprachsyntheseMereologieElement <Gruppentheorie>Automatische HandlungsplanungRechter WinkelLokales MinimumFront-End <Software>TabelleInformationATMCoxeter-GruppeZahlenbereichRechenbuchSoftwaretestNatürliche SpracheReelle Zahlp-BlockVirtuelle MaschineDifferenteRegulärer GraphProzess <Informatik>CASE <Informatik>LaufzeitfehlerKreisbewegungBimodulMathematikArithmetisches MittelFormation <Mathematik>AbstandResultantePhysikalisches SystemOpen SourceQuaderAuswahlaxiomEntscheidungstheorieDefaultArithmetische FolgeInteraktives FernsehenProgrammbibliothekSoftwareentwicklerRauschenAutomatische DifferentiationDreiecksfreier GraphImplementierungAlgorithmusComputeranimation

02:11

Turtle <Informatik>EmpfehlungssystemComputeranimationVorlesung/Konferenz

02:31

Physikalisches SystemSoftwareentwicklerVorlesung/Konferenz

02:47

Physikalisches SystemNatürliche ZahlProzess <Informatik>Physikalisches SystemÄhnlichkeitsgeometrieInhalt <Mathematik>Prozess <Informatik>Natürliche SpracheInteraktives FernsehenSoftwareentwicklerComputeranimation

03:29

Physikalisches SystemNatürliche ZahlProzess <Informatik>Open SourceOpen SourceProgrammbibliothekImplementierungVirtuelle MaschineComputeranimation

04:03

Natürliche ZahlProzess <Informatik>Singularität <Mathematik>ProgrammbibliothekOpen SourceÄhnlichkeitsgeometriePunktEindeutigkeitWort <Informatik>Hausdorff-DimensionVektorraumAusgleichsrechnungZählenLokales MinimumMultiplikationsoperatorRechenbuchTabelleWort <Informatik>VektorraumDickeDimensionsanalyseWinkelZählenÄhnlichkeitsgeometrieFitnessfunktionFeldrechnerDifferenteCodeArithmetisches MittelBimodulComputeranimation

07:42

BimodulÄhnlichkeitsgeometrieIndexberechnungAbstandAutomatische IndexierungSinguläres IntegralHochdruckVektorraumMailing-ListeAbstandBimodulIndexberechnungTrigonometrische FunktionParametersystemWinkelComputeranimation

08:56

IndexberechnungAbstandÄhnlichkeitsgeometrieResultanteComputeranimation

09:41

AbstandAutomatische IndexierungSinguläres IntegralHochdruckImplementierungKlon <Mathematik>Physikalisches SystemKlassische PhysikComputeranimation

10:01

AuswahlaxiomEntscheidungstheorieTypentheorieVorlesung/Konferenz

10:19

ImplementierungSpezialrechnerSchmelze <Betrieb>SpieltheorieDefaultInformationGamecontrollerElement <Gruppentheorie>Web-SeiteQuaderAutomatische IndexierungSchnittmengeZählenInhalt <Mathematik>Computeranimation

12:39

PrimidealSchnelltasteSpezialrechnerIkosaederElement <Gruppentheorie>InformationATMRechter WinkelAutomatische IndexierungZahlenbereichLokales Minimump-BlockMathematikWeb-SeiteZählenSchnittmengeInhalt <Mathematik>Front-End <Software>Klassische Physik

14:58

ImplementierungMereologieRechter WinkelAutomatische HandlungsplanungComputeranimation

15:20

MereologieSprachsyntheseDokumentenserverMailing-ListeRechter WinkelOrdnungsreduktionZweiSoftwaretestTypentheorieBildschirmmaskeWort <Informatik>EindeutigkeitDimensionsanalyseMereologieSprachsyntheseInhalt <Mathematik>Computeranimation

17:06

LaufzeitfehlerNotebook-ComputerSchnittmengeData MiningKartesische KoordinatenWeb-SeiteIndexberechnungZweiVorlesung/KonferenzBesprechung/Interview

18:22

Lokales MinimumDifferenteÄhnlichkeitsgeometriePaarvergleichVorlesung/Konferenz

19:31

CodeMultiplikationsoperatorVorlesung/Konferenz

20:04

Hill-DifferentialgleichungTurtle <Informatik>EmpfehlungssystemVorlesung/KonferenzBesprechung/InterviewComputeranimation

Transkript: Englisch(automatisch erzeugt)

01:47

It's a bit starting.

02:01

Test, test. Test, test. All right. Now, I present you Mr. Gautreaux, please. Sorry. We start.

02:24

All right. I present to you Jan Mevicin and Richard Brown from, there is, sorry. Plone developer at Interactive GmbH from Germany.

02:41

And he talk about NLP-based recommender system for Plone. Hello and welcome to our talk about NLP-based recommender system for Plone. We will have this talk together. My colleague is Richard Brown.

03:02

My name is Jan Mevicin. We are Plone developers working at Interactive in Germany. We like Scrum and we love Plone. This talk is about the content recommendation system we built for Plone and Volto,

03:23

where we use natural language processing capabilities of scikit-learn to automatically suggest content based on text similarity. So the talk will be in three parts. First, we take a look at how you can use scikit-learn to calculate text similarities.

03:44

Then Richard will demonstrate our add-on in Plone and in Volto. At the end, we will talk about future plans and other implementations.

04:01

First, to start off, scikit-learn is an open source machine learning library for Python. And we will show an example of using scikit-learn to calculate text similarities. For this presentation, we use Jupyter Notebook, so the code will all be real-life Python.

04:24

First, we define some known text. We have a text A and a text B. Text A is the house is green, the house is nice, the house is house. Text B is a green frog in the greenhouse.

04:42

The goal is to calculate the similarity to an unknown text. We define an unknown text here as the greenhouse on the street and the house is full. To make the text calculable, we use vectorization.

05:03

The vectorization process basically runs like this. First, we create a vocabulary of all unique words, also known as a bag of words, and use each word as a dimension of a vector space.

05:22

Then we can transform the text to vectors of this vector space and can then calculate differences between these vectors. So with scikit-learn, we use count vectorizer.

05:42

First, we initialize it and refer it to the known text, so text A and text B. Fitting the vectorizer creates a vocabulary of all unique words in both of the texts.

06:01

With this vocabulary, we can transform the text to vectors. For example, in this picture, we plotted the vectors for the word house and the word green.

06:20

We can see that the count vectorizer basically counts the words. For example, the word house occurs four times in text A and only one time in text B. All of the values we put in the table.

06:44

Now that we have a means of transforming text to vectors, we can start calculating the differences.

07:00

For calculating the differences between the vectors, we consider the angles between the vectors. So basically, the smaller the angle between the vectors, the higher the similarity between the texts. This approach is practical because the length of the vectors, therefore the length of the text doesn't matter.

07:29

So we only consider the angles between the vectors to calculate the similarities. For calculating the similarities, we use the nearest neighbors module of scikit-learn.

07:47

The nearest neighbor module is a tool to get the most similar vectors from a list of known vectors. The nearest neighbors module returns the list, indices of the nearest vectors and distances to the unknown vectors.

08:08

In code, it basically looks like this. We import the nearest neighbors module. We initialize it with a parameter cosine so that it considers the angles between vectors, not the distances.

08:27

Then we use our vectorizer to transform the known text, text A and text B, and use it to fit the nearest neighbor module. Then we transform our unknown text to an unknown vector and then calculate

08:49

the distances using the nearest neighbor module we already fitted to the known vectors. In the example, you see the results.

09:03

The distance from the unknown text to text A is calculated as 0.6 and so on. The distance to text B is calculated to 0.3. The similarity of the unknown text to text A is higher than the similarity from the unknown text to text B.

09:31

That is basically how you can use scikit-learn to calculate the distances and the similarities of texts.

09:42

We will use this in our clone add-on. This was just about how the recommendation system works underneath. Now we will take a look at how it looks like in clone classic and Volto.

10:06

After you have installed the package, you have to add the recommendations behavior to the content type of your choice manually. The behavior won't be added by the package setup because we wanted to leave the decision to the user.

10:24

For the presentation, we have added the behavior to the default document content type. The page is already open. You can reach it over the control panel and just check the box for the recommended behavior.

10:45

After adding the behavior, a viewlet appears below the content. We have already created an example. Here are the recommendations. We have text for testing purpose, my car is nice and slow, and there are the recommendations.

11:09

The first one is nice gift code, selling a car through a car hunter is the second one, and so on. And you can also configure the appearance of the viewlet in the control panel.

11:27

Here are the settings for the recommendations, and at first you can see some information about it's still work in progress.

11:42

That's why it's not perfect now, but we can see the count of all the document elements we have. Which were indexed by the recommendations algorithm, and we have a count for new elements.

12:08

The new elements will be added if you click on the refresh button. It's still necessary that you click manually on refresh if you add new content, but we look forward to add something to do this automatically.

12:31

But for now you have to click on the refresh button to refresh the recommendations and update the index.

12:41

And there's another button to import data set for testing purpose. It's offered by scikit-learn, and we are using it in our presentation. A lot of data used by the recommendations, that's the reason why here's the number of 983.

13:13

And yes, you can change the number of elements appearing below the viewlet and activate debug mode to see more information.

13:25

Right now we have three elements and we see more information. For example, which position the element has in the index and which distance it has to the content we are looking at.

13:50

So the first element is the nearest neighbor in this case. Right. In Volto it's quite similar.

14:01

After adding the package to the build out and package JSON, you have to enable the behavior just like I showed before. And afterwards you can add a block called recommendations to your page.

14:26

I will just do it. We have already recommendations block and now we can see the recommendations. You can also configure the block.

14:42

For example, change the title or change the max element count and save it. Right. And we have also a settings page in the control panel of Volto and you can just do the same like in Plone Classic in the back end.

15:12

Right. Now the future plans.

15:20

Other future plans are auto tagging to create text with text based on the content and dimensionality reduction to reduce the size of the back of words. Which Jan mentioned before the back of words or the list of unique words.

15:46

So we want to decrease it to increase the performance. But dimensionality reduction is possible with part of speech filtering for example. Which categorizes the words by type.

16:02

Types like verbs and nouns. And the dimension can also be reduced by lemmatization which breaks the words down to the base form. For example, walk instead of walking.

16:22

Yes, right. The package is already available in our GitHub repository. You can find it at github.com slash interactive. recommendations. And for Volto you also need the second package Volto recommendations.

16:46

Feel free to test and use it. Thanks for your attention. Are there any questions?

17:16

So the question was execution time is a problem. And so we tested it with the data set we got from scikit-learn.

17:29

And it has about 6 to 7,000 pages which we automatically created. And the performance of rendering for example the viewlet is very good.

17:45

So everything is already there. And updating the indices and rectorizing. I think it took about 20 seconds when I wasn't speaking to a colleague of mine on GitHub.

18:06

Not on GitHub, on Discord. With other applications running on my laptop it was a bit slow. But it's quite fast because it's kept quite simple.

18:22

Does it also make sense to extend it to the search of Plone to have this recommendation system? Maybe, but it's a bit different. There was a talk yesterday about searching in Plone.

18:42

I think that our approach would be much too simple for really good searching or searching that might be better than using Solr. And this is I think best used really for simple similarity comparison of text.

19:05

So I guess if you want to get a better searching you need to... I think it was nuclear DB? I think you should have a look at that.

19:20

So this is basically... The idea is basically to keep it simple and make it easy installable and easy to use.

19:43

The question is if we already use it in live. That's the first time we presented to this. So I think it needs to be tested first.

20:02

By others and us. The code is already available on GitHub and Testable. No more questions?

20:21

Jan, Richard, thanks.