Plone 5 and how to use machine learning with it.

Bilbao, Euskadi, Spain

Ramon Navarro Bosch - Plone 5 and how to use machine learning with it. Plone is a Document Management System and Content Management System that has been in the Python scope for more than 10 years. Plone 5's features allow us to manage content, define various kinds of content and provide a nice useful UI to work on semantic web technologies. In this talk we are going to explain our approach for using Plone with the Python machine learning toolkit sklearn to enable clusterization and classification of content using a scalable content management system. We will also add some sophisticated front-end gloss using the new functionalities on frontend development added on Plone 5 and some real use cases of CMS/DMS with machine learning using sklean and solr.
so good morning little where we have robber sorry because it's late and in the morning but think is going to be more like a family because they knew no most of the things that's that's really great but so solidarity and murdered thought about this little life and who does not know what's going from others created and present
so that I finished the functional group that's so that's going to try to do that common I am Ramon about was um known formation members and
the framework team in that's the crop will be meeting every week it's really boring and then there is this is the end of your days accompanying the Bologna and yeah and so that's what the consistency measure income so how online humanizes the government fees to talk about low 5 because we are now in good moment known relevant to this room 5 so where and just as the new what's going to change the how you run into problems and then I'm going to talk about machine learning because otherwise if you can include your talking and so let's talk about as of about hope so from the user's point of view from the upper corner you from the city of the the view from the user's point of view it's mostly while user-interface best testing in general you really also we have nearly 12 years so that's in there we will really fast we have a lot of sites around the wall of the the reverse the Brazilian government and it's really being used a lot on the companies in the what have becoming using that so the and why they're using this has a good quality and it's really stable you can grow from a really small side it's only for your personal use it for yourself and the your your slides or you can just use a sense of forwarding really excited weak Internet and condiments and systems so everyone knows flown so that's that's going to boring because I'm going to talk about what's flow from the user point of view so loan from the user point of view is content content is the key the game Japanese so what does it mean that everything's content-oriented this what you have pages you have the document you have all the words that have a variance you have whatever you want to extend the reason of a nasty tho in city that they want to store data their passion and she about OK by his his you he has this problems his mind we saw that this is something everything that's a feasible document that we can having war on X or whatever all we have in a piece of text is adopted so content is the key content is the center of and we have fully into of all and in Catalan also so that it says later in more than 50 languages the add and it is because most of the CMS is that we we have in other languages like Java being something that all of those who have support that means that all the time the like it to all the languages and you have a connection between 1 position to the other a great 1 1 of the most important things about the world is the security why is being used by the i i is being used by the environment right being used by the because they know that it's really secure so you all the content they're all the fields all everything is there is going to be seen by anyone that you don't want to see and why is because of worry we have this kind of deuterium is because we have so until the I'm going to talk a bit about that later and we have flows so all the all the carbon goes through the different different stands on its step has different permissions for different users groups to the mean you are using lonely 4 . 5 . 4 you have your you will surely will greet using block because was was really difficult to you at the Hoover vengeance administered some of the regret of people have you used along that's yeah yeah but acquitted because it was difficult to understand so what we saw was that was almost all the problems that we have to defend for example now we're using some kind of rules is called the as it's using their being they're designed so you can see your side and from knowing anything about I think everyone using rules to move things from 1 from place to the other but in addition IBM integrated required 2 years and less so if you want to win a nice front-end angular with react whatever going to and there provides you with the tools that you need to to the we have in mind the show through the web shows so you can see in the back of you can create content
content around this was a
project is 1 scene of that's
that's us to work in progress as we will be given this great
content you can just greater
local minutes of the web and American robin which is what we want in debates this updating 1 we can try let's say all you know this is the
new qualified content this recall because it's using a single word so that this letter from Barcelona but colonial all also just as you can see at things with it it so there is some kind of cool things to have become roles all your content
here of this resolution is not good for the things
that you can go to the followers
you can go to the content of the signal that we have is the great thing
about creating new page whatever you want where you were this
that the forward through the cool accomplished images which files and see if so that you can go
to there is a lot of good features that if that we still have to promote yes sharing and
cycle connected with the is to about but eventually the who
had it with the friends the
yeah circuit so now as we technical point of the conference was going to talk a bit more about how steadily long because that was like the commercial scale that yeah so that whom
is based on the artists is that the base because that's the users are used to have and there's there's with all those conditions inside that so they're using a specific that the basic cause you uh it's really the right now what the city has more than 10 . 10 years later at more and more than 10 + years of so and at the time the what was talking about known as well and as you these enormous that the Bayes that you store objects inside objects and he starts directed at the cost of the greatest factor so you want to you are going to use flow might you need some kind of that is that you need to store of politics that are going to have some kind of relation between them judaism greater than it is outside of so this concept responsible of storing everything in Europe Europe this is all it's really there is no longer seen as does that every was trying things in the same way that the condiments is not rational that if you go to university and this is going to start an object and I'm going to put all the stimuli in the field of my my table sorry you need distracted that it's semantical you need to have the few the the action have signs of your content you have a folder and then you have a lot of a contents and document and so that's the main 1 of the main points of technically talking the but the reason that the really the content is thinking we assess how useful this chapter so then what are you using to have content we are using a process it's called exterior the amplified commute over that type of you never use archives and so sorry for you but dexterity is a really cool is somewhere because it allows you to create come defined good would like we have a simple possible and just you decide OK I want us to integrate calls on and then you have a choice for you where you can have different book of lot and the local level it's going to be shown on the you I never reached ecstasy that means that I see that rather than insisting on right there are even when you read about define different kind of fields files images you can define also permissions for specific fields and OK I 1 that the models and see that field so I can write that the model limited to field that's that the that's cool so with this single file there
is no copies strange thing that we use in that uh that only defined in the interface the the field that you want to grant the semantic meaning of all of your of the plant you can define your own content but then this standardized it's going to mean map to the URL so 1 of the good things of so we haven't so low levels it's that I do have to rest so that means that all these folders that we have explained to you that you have the following that continent and image for example you can access URL and write the revenue side the the current level of and then the view was that prosody world of formation that render the 7th grade to the lady and starve or whatever you want to use that to specific here we have an example of what's what so here we
have been that's that's 1 of the good things we have to use all these examples the CSS of international assistance you reduce CSS and JavaScript and then made of dramatically pops up on your page when it's needed so you can register for example bundle antiquarian tables and you want this view know it's like random so this page has this element of the service we conclude and says that's needed for this of ourselves to to be loaded so they you just add SOS on request and you write the name of of the resource that you want to act as a the multiple bond to with applied on when you do when you're rendering that as yeah there's a lot of technical details just 1 but that might not here's how do how we register the use of eigenphone asking a lot of people here by right the what's the worst thing about and this is this is system that because it seems is semantics to that it's OK to prevent than this analysis integrated was allows us to have really I would list of with the content of the definitions that we have so I will think it's great to have this this way of defining all the news and other content of the year and we have that the ladies yeah it was you can create a class to show you that the content of the have you that there might be some kind of class to render that we've made by some of the and 1 of things that we have a long and at the end of for it's that we can use community so now no more towers more way of defining the variables with the specific curriculum it's only using the use so so we can define the variables and use the variables with this the last the that's really interest interesting so by coming they are I think most of you you know that the always the most frightening rest framework for work on on the a work on the management you got access the object of your site and it's everything that to that the base then you can just go to the attitude of your site it's called a folder go to the activities of all that's called the document and then you can access attitude of the of the documents you truly have generality and the finding ways doing contaminants yeah I wanted to talk about complement component PDQ because may be used in a lot of last 1 is so this version interactions 2005 by but it's it's really interesting infrastructure and a genius of use that thing because we have so that interface and of the complex component that it's a little work packages for managing the would patterns of programming OK is normally when you write Python code you write a small program but if you promise going to go out and you will need to have some kind of patterns for design and that's somewhere and sometimes you know that there is the good libraries of patents have to develop big southwards without and so it's a really good 1 a really love so well saw in our company lower actors patterns it's a lot of storage of local reduced in to do we have in being so when it's cool for all the websites of the environment of the process of and then what is specific for each side of the local resistor so I wanted to move of 1st then you can point to that wouldn't adopted adopted it's that 1 of the patterns that software it's really good for using we are using the past lights a flashing glasses so we know we ended having their types that maybe you were looking for many classes of loss of life and it was maybe 30 different classes that's illegal act actors to the parents of the class you will never know what's going to be there was a mess so we decided that it's evidence of a mutant of the great despite this that the project and I was just it's an interface is defined In this case a Pearson's you find something normally defacement of whatever in front of the reasons and the attributes of that interface and you have implementations of that interface for example we have a different color I personally that of you find function which is shared and worry and then you have to implement an adaptation it's called the land and then it returns of 11 and is there is another implementation it's called last very interesting feature uh so you can use any of these adapters to adapt object and yet at the end the methods that good this is skriver another kind of cool pattern of property and observe the fact that we of sustained by the end children engineering OK we have a kind of wanted that we want to to to look OK I want to I want to check if you are modify this kind of object if you modify the I person-object they didn't described for I personally and you want to move strive for the modified and you decide which function is going to be executed in case that that's being modified and it did OK that just a you want to store some kind of the stuff without the whole then define interfaces without alcohol and and how how an object that we want to all of the list of wood alcohol because it's in to that's what we use come and then at the end of a conditional thing about balloons stuff they want to talk about the Dallas Cleveland CSS integration within our greater all own keep all on frame or just a framework that's the worst error error with each thing because it's going to be hard to maintain that appear at the moment there was no that we know the solutions we have right now on the on the on the community of jobs but it's really good was really great is using patterns are the patterns as a way of defining OK have these elements and then you can we show that the taxes the XML elements you can compute them and they automatically of around just capable of that renders the part which if for you for example this is the date back the date because the because data which is that we have and will normally that this is the configuration you find the form of that they but and everything it's integrated we required yes unless that means that if you want you can start your loan in the about and then you're going to see anything compiler time on the browser so you will see that the the source told off selling to them asks you can go and about whatever the people in your life I think that has nearly 1 million of lines and with all the libraries that's including so if you want to that and you have the option to really have their uncompressed and that you need to take care about well-being of things so we created by from model that's taking care of that creating the dumbest configuration for CS and compiling so you get the source code there the on the that has a long and understand that just a way of defining themes so you decide you happen to you you will do your design and say OK I need to this book dates just designed that the design browsing is sent to the people who are those that act a man it's the and he does the system now I think that stimulus non and then you you give to the to the block ice this amount of my side and we do they have will you just are removed the I want this this is is the law this disappearance of the kind of and you you don't produce even without doing so hot integration and that's also a little the update new things that we are working on we are going to the next month and hope is the the i it's really clear that all the way framework so the
weather conditions are going to move to demonstrate are all all the Y needs to be done and devastate so what's really needed is that we have a really the rest of the rest API that can interact with so iterated along with the rest of the is not the least but it's going to be released but you can't it happen it's really integrated we have not find a way to define this state that the merits of like who you
get to lead on a specific on the type have I personally and they wanted to be the person so then I can define a specific it's at the bar for deleting on
this object and which is the function is going to be executed so here we use the same example the
implementation of the book OK testing our improved a lot
interesting really all lot of what their gender is the plot art as only what happened low fighter jets have 101 acceptance stands up in the more than 5 thousand more 9 thousand integration there's an unit testing we have we we tested with the if you will come at something and you break the then this guy comes to you on this stuff together you and it you that never go to sleep before the 10th it's clean and so we have a really good testing and we want to thank the 1 that best thing because they would the rate of that because it's we have really really really good testing and
so on that makes us people the companies because OK it is not you
would create a new release of and you know that's going to work because we have the best of
everything at into the name of the bottom of a control panel and how can then fails so that's really really great and for companies that rely on that and they want invest money to to get the size of and the 1 in 5 US is still there so that's really important so then we have to find that
the 3 of these now we created a specific I repository so if anybody wants to try and those of its things OK is going to be difficult to problem because is a complex of OK 1 2 3 4 lines of and so to try that you go just go there and it had become long into you will run the presence itself 0 is already running on Python 3 is so we think that we are going to to someday to have options to have presentation of facts really want have been grown without and then you run instance that you and you have to go outside from solid 4 lines of code and 4 lines of constant I you need to go to that a between the Buddha things that's because it might take 10 or 15 minutes depending on the machine in the network OK I just said we have a really good documentation the strength of the largest that's the 1 that got hundreds of different doubt the condition we the for developers and users and a lot of training on the training that it helps you to understand the OK this feature level and the more maybe but also the 1st the eye and with additional from then I will find tool have some kind of a scene that you will back in maybe with the maybe family who knows who this is the point of look that that this stuff also has to be and machine learning and they said the 10 minutes so I'm going to talk about my slamming also so what we did in blown is integrated like approval constant because our main goal them the gold that we have been using machine learning is a users has it uses CMN so that when you get to the content you want to see for example the related in site but the people doesn't go to see all this is related to this 1 and this is related to the environment and the weather has that in on them as is so we wanted to gray like smart away that everything is related to between them automatically our classification of the protein content so if you have this you upload side of the people has already started the stuff there then you want just made runs on a classification words and that just the
classifier what's new on what's added on your when your content when recommendations of students and back so that means that if you're creating a new content and you are editing the content having the weight of the the user feeling that it's suggesting you OK this function comes about that are just asking the user if if he wants to that describe this kind and we discuss this kind of and semantic search in the life so have a really nice laughter so you can search like England and the content of their about our it's completely texts on the it's looking for that on all the files that we have another problem that we have so we are aware that we're trying to make that it to have like semantic search so it's related stuff so it's more more useful so I started going to use to try to understand what's
machine learning and complete big exceptions as I for 1 would that picture the reluctant to show you this is secular maps to use you quote from a science OK what they need to do I need to have more than 50 samples sample normally have more than 50 samples OK I want to predict theory as it 1 really what's talking about the I have label and that no because people that have been that are and then we have culture yes but how with you so of we know how many could categories we we have and we don't know how many blocks of content is going to be influenced by that defined for the administrative because when applying to try to to predict that so then we are going to use genes thinking and is 1 of the other ones that we have implemented and going to show you then engagement so we have limited information we have our that this is a really good site it was better that it helps you a lot to set which and you need to use for a different kind of usage secular maybe is not the best option but when the time that is integrated in blown the need to write anything else external there is also lead because of implementation with density that works fine but you need a standard recipe i unique export the continent and our clients are released the prosecutor so they don't want weeks for the quantum 120 external applications so we need to have everything embedded the on the street so so we implemented this collective so much learning and or you have I think I thus clustering so on 1 of the initial ones so what we did we created this and that it's called I learning history you'll get from that that from using this fund and diversify person and is about the so this person I want to get which text I want to use in the Marshall and stuff so you maybe you want to join the name the 1st name and his it is it is relevant or whatever and then you create a text line of text that it's going to be useful for analyzing then we normalize it would but arise with you in the beginning and at the with the pressure that we get out of that the corpus off each document and we store that cop was honor because on the database so we can reuse later so it's not extensively to use that as weakest rise and means that with all the problems that we have the greater the magic just all all the things that are founded on the point the side we tried with more than 150 thousand documents OK you need 64 would you like to from all the Gujarat around to run that on memory if you want you can just use the bikes staff of means but that's this is the were running on on a single process we try to we just just memory there is an algorithm that allows you to have much can you make so was right there so we define the numbers of clusters that 1 the front front-end and then just use the communities Our institutes besides groups of content and we start this model the model that we get on because on that that the base and then use that model to pretty in which cluster is going to be the next company you were integrated content and then automatically with COPD something belongs to this cluster the yet the weird thing is that all this implementation has to be implicitly because as you this has
its object and the capital it skewed so all nobody's going to
get any information that doesn't to be I OK the future we are now working with the naive Bayes for classification recommendations semantics has been
we want to be able to experiments to push the from propositional content clone but to be difficult with a security issue and I wanted to show you maybe and the going to believe what you
say OK so this site has reason that because it was doing this morning's lately so this it so
here I for example some
content of the copy of this London here and a cycle any content here I was that belongs to cluster 1 of this and when the city was my friend yet
so far everything and use high-level might axiomatic that gets and classifies all my content in different restaurants the aim of I groundbreaking continent phonetically defines the groups of quantum dots semantically close to that so how would you computed
scrutinize this on top because it thank so that the idea of but so here you
find when the this main line to store the I want to use them and and the case the tightest words that you and the use the different grades and we have if you want to win win remember the Hessian of strings on so there's been story our and then we define how many clusters you won the maximum number of clusters and which is the name of the people that we're going to use to store the last and then you can resist compute the and this can be but it's getting all the content and processing of the content and standing on the ground and operating in the capital so the specific indexes that will allow as to when there are also see which object will work to which clustering around and we're going to do is nice the OK yeah yeah it is
nice of uh outside now and automatically shows you the quantum group by this kind of then it's it's a matter of each cluster you define what about and if you have real content it's really easy to see that because you see the type operated is described this
cluster it's talking about this kind of stuff so you can make that cluster people with see of the company the matter which so this and going to of yeah we have a you on this
disapproval constant values in production some sites because it was needed but we are working over a lot on that and we're speeding on that time so if you're reasoning that science people that want to help us to understand what's going on behind all this stuff how can we don't do they help us they come communities the school community if anyone wants to
join the community and right now that it's much easier than in the past very and questions and thank
thank thank you for the presentation and just 1 question how thus these huntin and content in different languages there are no other this thing that we have these is being these how and we go out there in the control panel job down that's it allows the descendants language you want to use that this thing is implemented in most of the languages just a matter of that we are in the of the foreground this option the element to the give numbers long the computation takes if you have like say a hundred thousand or a million objects in the in the area of we do that I starts things 1 and it seems that for so the always in 150 thousand OK so you you you most likely have a to the instance or step want you say this is the only problem is that when it's finished the computational to right down on all the objects that the cluster and that makes it that would take more than 100 only answer that which is the cluster that belongs but what's the solids once that's done the computation of the on 150 thousand documents and is ready to memory with assortment generator on every side requests to the right some that so sigh of what is the size of the the cool that's generated at the end the computation and is that read into memory on the auto on every title last no no no we're we're storing the moral and we're storing and then that the vectors that from that modifications them and the matrix that the particles that are the purpose of restoring the model so whereas Western model because when you create a new content you want to ask for the model which is the last which we did belongs to in the attributes but and in the matrix so if you want to complete a some more times you need to recalculate everything so this things it can be captured by is normally value when you are using the site when you are doing in request nothing is computer because everything it's the distorted on the and on attributes on the objects of so went not Real time Machine Learning so it's more like we're run the islands and you to win the the the key to action if there something that can do real-time machine learning this not only this and some other science then had OK because I mean I I don't like of solar or less social effects of that I don't think that it is possible to take some time to build all the this is a guess for machine learning that's the same is true if community mental indexes and the algorithm so that that might be hard as it is discovered talking with some that descends from and from the company by then there is some kind of a line of investigation is called online how is it was called online and I mentioned yet so that it's on the kind of problems that you that you will incrementally and it designed for online applications so we need to investigate that life of Jesus 2 more question and if not in lunchtime involvement with the hearing