Bestand wählen

From basic distance search to a complex multi criteria search

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Erkannte Entitäten
so high everyone and today we will talk about how to come from the basic distance search to the complex multi search but 1st let's talk a
bit about me so I'm not an
electron I'm friends and I'm like the lovers in 2009 and I'll purity the encoding adds to look at the Intel company which is the French websites and we have all the many websites and light and you have yield organized by the cone and for Spanish all rents my model in the come for English which means exactly the same but in different languages and we have all the portal out to gage the linear and Italian countries but because of that and I will not tell you that that we have no with as so
what we do at Jomo marketing telecom and I will just say GLM for extracting the and so old use of private camper higher and so basically we try to connect Tampa all owners and travellers so the on just those ads on our website and hope to make money when the is not used and traveller look for critical and just hope to enjoy all the days by traveling all over Europe or French also something on whatever so the search is 1 of our primary features in the Web because we have to coordinate the offer and the gene and to be simple we are like the R. and be Gambia so I say
today we will talk about search but I just want to be clear without 2 day we will talk about this search uh based elastic search function is called and that maybe but we will not talk about how to the elastic search out to instill a for how to configure your Elastic Search indexes and we will In the talk to we will not go deep inside the elastic search implementation so 3 years
ago when we write the 1st version of the GLM site we had to write the have search page and search back we don't have a lot of time because and we had to write all the website we just make something quick for the search and we say OK a simple distance search will be OK at least for the next my so we used you Django to do that's uh and like political to completion in the world for so Google will give us a lot longitudinal 82 and we create just at your point from that and has to general to compute the distance between the points and the camper and working and rest and next sold by distance
and what searching is arts and maybe other than you think and let's say what's happening the search so
here is an example of search for Paris using this kind of as you can see the 1st Kemper that 2 kilometres from parents the 3rd the 2nd 3 and that's very useful to metaphor pair and the 1st is a 2nd is another man and the 3rd is a proper camp you can see that there is no reviews from non of them and words you can see that we don't have any description for the 3rd day here I'm quite confident that we can cite the camper for people who want to read the kappa in Paris on that kind of slide you will see that there is blue like that's blue lines and I were worker of people who work and you have to know the search engine and compute the thoughts here the score of the is dependence on distance because we only use fooled by distance so as you can see the score is dependent on distance and the of it had the
problem is what do do we want we have to we want on the at the lever a great experience the gas and the other question is what gets FIL again we want to rank is very cold calls to accede goes that's important have guessed will also walls at to rent them to call note on the whole range of them have been read you answer for us and probably all the things in a few Guest 1 and available they they don't want the vehicle for books that the owner will never answer all we'll probably rejecting and I guess probably won't have the code with a lot of pictures because that's more picture you have for something you want to rank more the better I did you have of that you would rent so let's do
things better that a GLM we choose a step In this talk we have elastic search which is a fully functional functional search engine provide complication I buy more and this is from full text uh searching in their thing fish stating their needs I don't know if who on the Elastic Search OK so yeah provide all that kind of things and it's more generally kind of North that some people use like In the other and haystack is a jungle and that provide all and query subjects is up to that gender or an on different search engines like Elastic Search and so a lot that can foolish probably all the human to implement that the idea is I like the Haystack have a should make it easy to start uh and easy to understand for people who already know Django because they have the same as priority like queries query sets and it's convenient to make the code clear for everyone even for someone who was for example a front-end developer who just take a look tool your form your back and you and if that they already seen how it looks like as a corrigendum credits that he will exactly know what it's doing so let's
write rewrite this same function with status and elastic search so I will not spend time on all of your that our index is and the I will assume that it's that document ElasticSearch dogcart clear enough to explicitly stated all your index your model so here is the functional function as same as before with the point instantiation here today so the difference is that we will have a lot of that haystack to give us a search query set for a given model which is another and we have asked to compute the distance between the location of the ills of that and the search points and this order by
distance yeah I said we need something better not just distance search and then if you read carefully the
Haystack documentation you will the right in the the problem is explicitly written in haystack documentation and say you cannot specify distance and lexical graphic all during together so you can put has many so by In a study that on except if there is on the word distance if there is one distance haystack coalition that distance is the name of the field and not the name of computer and it does not work
like that so after some group in reading the communication and asking myself is the right steps I find something in the last Dietrich function call the functions score the functional query use it to predict and control tool calling processing Elastic Search so you to apply function to each document that match to the main square in order to to tell you out here and they're all totally replaced the original score so that looks I mean probably we need to use that the problem functions called and the the decay are implemented in a stack before going deeper we
just have a little look on how astic search based study and or where it is work together so on elastic search on the right side of her provide only an HTTP API so basically when you write your query sets on a step however ah but that the stack ElasticSearch factor and generates an HTTP GET send that to Elastic Search Elastic return you a decent restaurant and based on cost of response and transform that in a 2nd object and put that based on the object in the course so all it's an important fixed for all what happened next I
already told the top I would have told you that the last search work with HTTP API and here is the the documentation example of how we look like the function score on top right image against this is just a kernel gets on it didn't that the what's interesting here is that you can see that all you were there are just too it's inside function score object or dictionary called whatever and that function score how to query the query is probably the original query that's you want to search maybe it's a full text searching it's a filter on a given fields of whatever you want to to filter out to us that Elastic Search to to make your search but important things happen that this school more the score Maltese here to tell you how Elastic Search we'll compute the score that will be used to filter your response then score can have different that it can be used client which is the default form that can be a so it can be averaged it can be max meaning of 1st will be the 1st uh score generated here we will use set because the default of supply and have the kind of problems is 1 fuel Jeanette we will see after during this course of steel or not just whatever all the function whatever what happened you all object will be at the end because you know 0 multiplied by something like this so all lets you some and you will see that would be of and now you have the function and functions on you can put as many functions as you want function always have the same form for the difference you define occur you're are important you're on the right you have 3 different kind of the of the ghost the exponential and believe that the graphics speed at here we will apply by goals curve on the location is an origin point at a given your point we have enough sets of 2 kilometers and scale of 3 what other means if you see that beautiful well that means that means an imagined and that's where you have a central point and the 2 kilometres of sets all objects inside these 2 kilometres will have the same school for that function and the the points outside we'll start with cedar Caltech and the scale si however it will be the case for most severity will be so that works what children but that's also work for all feel the 2nd example is another curve but based on the price field there is a little trick origin of sentence k Christ this session that price cont being negative that doesn't and so origin is said 250 and also 250 that means work see just not across what that will be between 0 and 100 that means all price more expensive than 100 we'll see their and In this example the this example in the uh the condition is about the and probably the whole total the less expensive is you want to do better really beyond the maybe better more relevant and the last things and it's quite important it is for each function you can define the weight here the price weights is twice bigger than the location whatever it's just that you can and you'll curves you awaits your regions scales and offsets has
and so what because a structure don't provide we that kind of
functionality so that's right accused on ElasticSearch that can for instance the
actually that's not just about we need to write something name search engine who and the system back and and accused of queries and we need also writing a 3rd uh based search queries don't be afraid that will be case freedom
and time let's how would look like
this within the top that these the search query search queries are like jingle where is the lazy and executed on the when the the idea of this the glass is just to keep track of the decay functioning that is the owner of the meaning that the parents teachers is used by the back-end to generate that's http requests to Elastic Search so if we have a functions for our decay function we just return them in our search the additive function will be used by the costs internally to add new teaching function and the article from China is here to give the ability to a query like angle good itself that means every time he made an orange angle is the worst thing you add a new field there all this by our next or whatever you have every time you add something to your quiz is the queries clone itself and which the new the reason we have at capability and here we every time we come the query we just the decay function to the new clones query here is the search pattern search back as in charge of building and parameters to send to Elastic Search here we have something called which is if we don't have the function we just fast go away we don't have to do nothing with that just let Test Act compute and the rest the difference is if we have the get function the the wheel and that our query constituted by a status in society functions call that code that we see before that we will put the function but the key function that we have generated the query given by elastic aistats and the school more faster
and here is the cost at which this is how use the use the query and we will just adds a new function to the course and who will be teaching decay will take a function who will be at the the data function decay function will be only but dictionary all this code
is available on the Internet at this address I will give it to you
later and no if you so assuming we're all they had very close with a part assistance and you and GPS nearby but you let's say instantiate uh a were word function correspond to the model has filter with Park assistance and GPS and let's compute the distance between nodes certain points and a point no you see how looks like a decay function here will apply an exponential curve on the fields named the complications with the origins scale offset and we will put the weight as to because whites is weighted something important not the most important but it's quite but we will have more often now and the that computation at indexing time we add more fields the fields name picture counts which is the number of pictures of people have and here we we'll add the goalscorer on feature counts with and origin of 50 but not set of quality and scale to 9 what that thing means that just means this picture you have more detail you will be and we will prove that weights at fuel . 5 because that's important but may be less important than this and let's have this dictionary to the decay search queries but any is more even at the indexing time we compute something based on the quality which is from 0 to 1 is arbitrary computed by and the said should in from excitation rates of the of the the number of reviews the answer times the number of bookings and we will see that what is the best folder that you can imagine on the other and 0 is a known at that time do nothing on our website so we want to we want owner so let's see let's say that let's say your score is lower in the search you will be and we don't put weight that means which 1 then that that's all our
detail and no the again let's see what happened that you can see that the 1st description it's 12 kilometres from Paris and 12 kilometers OK it's it's inside of word known decay circle that we define we have reduced we have a number of 8 pictures and the good color for the 2 rates you can see that this is directly dependent from the distance the owner quality rates and the picture because you see that all of them are the the full the 1st and the 2nd and sample so we have review and that a good quality rate pictures so we can now say that if you want to read the vehicle from Paris may be the first one it is will provide you a better all the data maybe of their experience of our website just modifying that search query an OK
so now I'm finished thank you for nutrients and few
if you have any questions will become thank you history of this issue since they're pretty interesting features just the basic developers they want about specific and limited on the side actually I have a lot of things to do have that I just know that there is no functionality like that makes that I didn't see any pull requests for things like that existing on the edge that give them to code is now available on data that made if someone is that the 2 critical request or last case stacked team developer to do that that will be up I think haystack developer tried to keep consistency between all the backends so if try to make something for elastic search will try to make the same thing so lots of convolutional and actually I not at the anything documentation on how to do that with all the last part of the search engine just condemn so I just come from the middle ages the what about the phone assist do you compute the results what and so was it the computed at each juncture we have a culture in our case so and the majority view that I have to search for the fast search something's really is performance and I had indicate function to our research didn't degrade our search response time adding Elastic Search and a stack to our research function made search faster comparing it to the dual Django steerage sort of more functionality and it's against the faster I think it's a good compromise but I was just wondering like you had to explored really deep into the stack and all the different layers and write a lot of code to actually make that happen why do you stick with a instead instead of just going with all queries for for all 6 search actually we already use that stock for another part of our website which is being a fake cute and we used not really intensively but we use a stack to make like some fighting and things that and the other before going to the EPA in a stacked really that that will be easy to you actually all the code I show you it's exactly the golden gate up there is normal of code to do except that the thing is doing that can query sets and things alone me to add even more things or even more things on our website which is not open source now but for example we developed I'm all like this functionality and more relevant more like this functionality and full campers because normally more like this functionality on the work for full-text indexing and now more like these work for temperature and the whole point given campus you have like that was campers nearby with the same with the simple options and the same type for like quality you'll such quot and on the the parameters you pass into the cave functions yeah you have success community tools help you to understand of where the great well the natural groupings in your dataset was the last time to understand the of decay function with this particular set the parameters going really need chocolate you this result actually sounds if I understand a question this is something we do manually that means all that the functions together to make something quite we but we do it's like it's like playing with screws or we just gets a given city and just manually to see OK that's not good this 1 it's not really relevant so maybe we have to decrease by 0 . 1 points that way and we just ages and our set of decay to have something that we've been it's acceptable for us but we have to do that manually and and do you allow the users to sort search results is this all now I have the same simple question is there no end the way we make the search is we provide the kernel for any research so if we allow them to sold by price for example but then you we will totally lost our sold by distance and it it's what I do not conventional or what is today I don't know about to give that functionality to our so today we will assume that our world chorus and it's good enough for you maybe not the best to understand for user might the blog posts now or that you have to make you have to learn that to you have to use the user but we consider that thinking
PASS <Programm>
Schreiben <Datenverarbeitung>
Computerunterstützte Übersetzung
Deskriptive Statistik
Metropolitan area network
Elastische Deformation
Automatische Indexierung
Vervollständigung <Mathematik>
Gebäude <Mathematik>
Güte der Anpassung
Quantisierung <Physik>
Generator <Informatik>
Rechter Winkel
Elektronischer Fingerabdruck
Ordnung <Mathematik>
Objekt <Kategorie>
SCI <Informatik>
Spannweite <Stochastik>
Weg <Topologie>
Endogene Variable
Elastische Deformation
Open Source
Verallgemeinertes lineares Modell
Komplex <Algebra>
Wort <Informatik>
Prozess <Physik>
Web log
Formale Sprache
NP-hartes Problem
Computerunterstütztes Verfahren
Komplex <Algebra>
Kernel <Informatik>
Klon <Mathematik>
Lineares Funktional
Zentrische Streckung
Konfiguration <Informatik>
Arithmetisches Mittel
Funktion <Mathematik>
Geschlecht <Mathematik>
Automatische Indexierung
Web Site
Total <Mathematik>
Gewicht <Mathematik>
Keller <Informatik>
Transformation <Mathematik>
Front-End <Software>
Front-End <Software>
Bildgebendes Verfahren
Automatische Differentiation
Physikalisches System
Kombinatorische Gruppentheorie
Endogene Variable
Objekt <Kategorie>
Klon <Mathematik>


Formale Metadaten

Titel From basic distance search to a complex multi criteria search
Serientitel EuroPython 2015
Teil 94
Anzahl der Teile 173
Autor Lacombe, Antonin
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/20084
Herausgeber EuroPython
Erscheinungsjahr 2015
Sprache Englisch
Produktionsort Bilbao, Euskadi, Spain

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Antonin Lacombe - From basic distance search to a complex multi criteria search This case study show how to start from a simple distance search on elasticsearch and haystack and implement a production ready search like airbnb. The talk will explain decay functions works with the different curves (linear, exponential, gauss) and how to send them with query scores to elasticsearch. With that you will be able to mix the distance, the price, the user activity, the number of picture and whatever you want. Additionally I will show how to write a custom ElasticsearchSearchQuery and ElasticsearchSearchBackend because this is not yet supported by haybtacksearch.
Schlagwörter EuroPython Conference
EP 2015
EuroPython 2015

Zugehöriges Material

Ähnliche Filme