Merken

Beyond the basics with Elasticsearch

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
so hello everyone and in the
morning so I'm here to talk about what's beyond the basics with ElasticSearch I work for us and the company behind it so we've seen a lot of use cases and some of them actually surprised us and definitely surprised many people that are familiar with gnostic search as sort of the full
text search solution but before we get beyond the basics we the 1st need to know what the basics are so server quickly for us that's what we come from we are a search product it's an it's an open source search product and search search is
not a new thing like it's been around for a while for a long while and the basic the really down-to-earth basics haven't changed that much since those types we still use the same data structures we still use the same data structure that you find in any
book at the end but the index especially the
inverted index which from something like that it looks the same in the book as it does in a computer so but it is a list of words that actually exist somewhere in our data set notice that they're sort of and for each of these words we have again sorted out the list of documents or files or pages when it's a book where actually these words exist and then we have some additional information store there too for example how many files does actually contain the word Python or at home so many times is at present in file 1 and on what positions and stuff like that those information those
statistics will be very important for us as we as we go on from the top so this is the data structure that we use so
how does search work that was super simple if we're looking for Python and Django is the same search that you would do if you were looking for those things in a book you locate the line mentioning GenGO and align mentioning Python you can do that effectively both as a computer and as a person because again it's sort of and then just walk the latest and if you find a file or a document that is actually present in both lists that's your result naturally if you want to do an or search instead of and you just take everything from both let's say but that's not enough because this will give you the information what matches but it doesn't give you the most important thing for us and that is the information
how well doesn't match
what is the difference between the between the the Django book that talk specifically about Python Django and the be all over the of jungle Reinhardt 20 mentions in 1 passage that he had an encounter with a Python the state obviously there is a big difference between those 2 books and the differences in
in relevancy it is a numerical value of a sport essentially saying you how well does a given document to match that of a given query and a
lot of research has gone into how best to calculate the school and again it hasn't changed that much since the beginning at the core of it there is still the tf idf formula those are
fancy words fancy shortcuts a term frequency and inverse document frequency it essentially represents how rare words you're looking for how many times have we found it in the document so this essential represents that if you found the word the in a document that doesn't really mean much like every document in the world if we're talking in English will have the word that's not a good information because the idea of the inverse document frequency that's the part that it will tell you that this is not a
specific word it's almost every document if you will however find the work framework or something like that that is fairly specific so that's the idea of
part and the T part is just how many times did you find that if it's only mentioned once in in a book doesn't mean much but if it's if it's there 100 times the problem is more and we can we can keep building on top of that so for losing for example adds another factor to it which is a normalization for the length of the field that's essentially equivalent of
saying that yeah there is there is a fish somewhere in the ocean probably true not really that relevant or surprising but if you have a bucket with water and you say there is efficient that is much more actionable information so that's that's the 2nd part of it the the normalization for the feel like if you find something in a super big field no OK if you find it in a much shorter field for example title compared to body that problem is much more so already we have a formula that's baked into Lucene it's baked into
6 search that does very well force of for text and for search but sometimes even that is not enough for example you're not dealing with text but with numerical information or you have some additional information that Elastic is not aware of for example you have the quality of of of of documents you have some user-contributed value or even somebody paid you to to promote this this piece of content or something or you want panellize or or favor things based on a distance let's say from a from a geo-location or distance from some of from some
numerical range so how do you do that we have a we have a few ways of expressing that and the best way to show it is on an example so this is a this is a standard
query for for elastic search and it's using the functions for query time and the functions query types takes the regular queries normally we're looking for a hotel and we're looking for a hotel let's call the grand hold so far so good and then we want and the hotel to have a balcony we want our balcony known but we don't wanna just filter just the hotels that have balcony because then we would be
robbing ourselves of the opportunity to discover something else but if the hotel has a balcony we want we want a Feb arises we will just add
to to the score so it all the hotels with balconies will be towards toward the top then we want the hotel to be instantly London within 1 kilometer of the center if it's within 1 kilometer it's a perfect match the further away from it at that against the score decreases it will still match but it will be at the score will be sort of smaller again that means that the hotel that perfectly matches our criteria will be at the top but if if we have a super good match outside it will still show up and then we also have the popularity of however that people been happy with the uh with the hotel and let's take
that into account so we have a special single-field value factor which is essential in destroying 6 search there a numerical value in there that determines the quality put it into into the score and finally
we add some we add some random numbers and this is this is actually taken from a real life example because people use this to to mix things up a little bit to give the users chance to discover something new something they wouldn't otherwise see so all these things together will make sure that you find your perfect hotels we're not limiting your choices we're not I just because you say that you want a balcony we will still show you the hotel that is almost perfect for you except for the balcony part we also know just sorting by
popularity so that's something that's really not that good a match but is is really popular would be at the top we're just taking all these factors and combining them together so this is 1 of the main the main ways what we can do with a with the score and how we can use it in in a more advanced way just take all the factors that go into
the perfect results In just combine them b and you're not limited to just picking up 1 and sorting by it you can combine them all together and then it's just a matter of figuring out what these numbers are supposed to be 2 1 at and what will actually give your application the best result as some people actually use uh machine learning techniques to figure out the best ones they have they have a they have a training set and everything and it's it's not that hard because you have all a limited number of options and typically that those off and numerical values so if you know what a good match would be you can actually train and the perfect query for you so this is if you're doing if if you're doing search when you
already know what you're what you're looking for but sometimes it's the it's the other way around and sometimes it's it is
you don't have the document but you have you have the query anyone find the documents so imagine that you have you want
do something like learning or or classification for example you're indexing documents you indexing stock prices anyone be alloted
whenever stock price rises above a certain a certain value sure you could keep running a query in a continuous loop and and see if there's something something new but what we can do instead with with the percolator feature of felsic search is to actually index that query into 6 search and then we just show a document and it will tell us all the queries that matched then that is a very powerful especially because it can use all the features of of success so that that's the allergen case sort of the sports search functionality if you if you supply your users with a search functionality and you want them to be able to store the search and then be alloted whenever there is a new piece of content that actually matches the search with work later you get it essentially for free you just index the query and whenever there is a new piece of content you just wanted by the percolator and it will tell you how you should policy an e-mail to that users are that was here the other day he was that's really interesting that that's the that's the sort of start search you can also use it to do what to do like search so if you've ever been on on on website you did some some searching you were looking for the results and so and there was a pop up the . 5 new documents that match your query since you've been looking at it again easy once you execute a query you also store it as a percolated and then whenever there is a new piece of content during that time you can just push it to the browser say hey there's very new results more more recent so again something that's that's otherwise a fairly hard to do or would require some busy loop or
something and you can do this 1 but will go will go a little bit further than that will look at the classification scheme that is essentially if if
you use the the percolation to in which the data in your document to imagine that you're trying to index events and all you have as far as location goes is a set of coordinates and you wanna find the address this is something that's easy to do the other way around if you have the address and you wanna find all the events in that location you just you just do a Geo shape filter that you're looking something that falls within this shape within the shape of the City of Warsaw and that's a super simple search cell with Percolator we can make it in into super simple reverse search let's say we get our hands on a dataset with all those policies in in Europe or in the world it's not so much we index the cities in into an index so we don't have to construct the polygon every single time we stored in the index called shapes under the Type City and then we create a query for each city we registered with name and then when a document comes along and its and its coordinates that the field locations fall within that shape we will know that it is actually happening in Warsaw Poland so something that is super simple to do 1 way but difficult to do the other we can we can do with the with the percolation just essentially using brute force but in a smart way and outsourcing the brute force to 206 search which can do it very effectively and in a distributed fashion so that's geo classifications and other thing that's easy to search for but not that easy to to do the other way around usually is language classification usually every any language has a few words there are some for specific to that language they don't exist in any other these are some of the examples this is essentially just a test how many Polish people there in the audience and the the assumption here is that if we look for the specific words and you find at police 4 because 4 is always a good number because 42 would be too high have then the assumption is that this is actually a document that contains of Polish language and sure
it's a simplification is a heuristic but that it actually if the works fairly well it just depends on the quality of the of your words
minor super good for Polish status so again and if you if you have a set of a set of words for for each language you can just of start of sort a collection of queries like this and then when a document comes along with a description of the of an event with the
location and a description you can immediately get back the classifier you can get back the location in actually a human readable format that it's actually Warsaw note that it's 473 minus 74 . 1 which is by the way not Warsaw but whatever you also get the language back that it's in Polish you can you can use a similar classifiers to determine the topic like if you have within keyword something like programming in Python on a Django it's it's fairly accurate assessment to say that the conference probably is something about Python so this is how we can use our percolation to enrich our data n sort of 2 to determine something that otherwise would be hard to do another use case for this is so imagine that you have a blog CMS any other category defined as a search that's a super easy to do 1 way but then if you have a blog post and you wanna see in which categories this book post that's the harder part again with percolation with
something like this it's it's a breezy to do when you can actually tag tag that blog posts with the categories as they come and you can do
obviously a little bit more with the but with the percolator you can attach metadata to the to the
percolators and you can filter on the metadata you can aggregate them so as the responsible not only get the Berkeley matched but also let's say and their distribution across categories you can even use them to to highlight something so you can search for some words in your documents and then just highlight the fragments that actually contain nodes and store them separately in the document for easy presentation etcetera etcetera you can get the top 10 hottest categories for for this piece of content or something like that but
next of those are if you if we're working with individual individual documents but we can also look and and more
documents at the same time so this is the traditional search interface you're just looking something and you get back the top handling what we also have here is something that's called faceted search this part the search parties really good when you know what you're looking for this plot shows you what is actually in your data so you can immediately see I see the distribution you can see that if you're looking for something related to gender the most results are in Python and some in jobless growth so it allows you to discover data some people have
taken a even further and we have we have allowed that with with aggregations with multidimensional aggregations that you can aggregate over multiple dimensions at the same time but that is so boring that is still just counting things and that's not really interesting like any database can do that well what we
need is we need to use the data that we have the statistics so to do
that we let's look how we would do recommendations using search and we had this is our
dataset we have we have our
document document for users and then for each user we have a list of artists of musicians that they like and we want to we
wanted do recommendation like assuming that I like these things like what should I listen to next so we have to in this case we have to users they have lot is being common and there 3 other artists so the naive way to do it is to just
aggregate just ask for the most common thing that they have in common so give me all the users that like the same things that I do and then I give me the most popular artists in that group without the ones that I already know that way I will get the most popular artist but not necessarily the the relevant is like asking you like what is the most common website that you go to probably go not interesting because everybody goes to but if I ask the people in this room will and I think about it what is the more specific part for this group compared to if I are somewhere on the street it'll be something like get you probably all go to get nobody in the outside world goes there nobody even knows that it exists that is relevant that would be a good recommendation and we can do that with 6 search we have all the information we have the statistics about how rare a word that is and what is the distribution again across the
popular so we can ask for simple that we ask for the significant terms it will use all the sqaure compared to the background and then the results will look something similar to this part is important because what I would expect is although wants to be on the on the diagonal line but because that's what would happen if I had a random sample the more it moves away from the line the more specific it it's and that is how we can do relevant recommendations because we see that this star here it is obviously much more common in this group then in the general populace that would be here so it is removed greatly and because we have all the information because we've analyzed the data because we are we are the search people we understand the text we understand the frequencies and we can use it we can actually produce something
the obvious isn't carrier for example if I like very popular band light 1 direction then it will skew merit scheme I results because everybody likes 1 direction right and so I need a way to to
combat this because otherwise I would just get completely irrelevant recommendations again we are the search guys we understand data we understand document so we can find and samples just the users that are most similar to me and we have all the tools already at our disposal remember TF IDF normalisation and everything tf the people who like the more things that I like the better they match idf the people who like the share of the rarer thinks that I like put them to the top and then just takes 500 of those best results and only bride the recommendations based on that group it will make it both faster and more relevant it will allow you to discard or all the relevant connections that you might find and only focus on the
meaningful connection on the things that are relevant for your group in this case the group of people who like the same things that you like it will provide you with
with the recommendation so just by just by applying the concepts that we have well we've learned from search into other things like aggregations and everything we can get much more out of it as another example
would be if you have Wikipedia articles when the labels and links are are the words and you apply the same concept i you get you get a meaningful connection between different concepts if you if you try to do it based on popularity it would always be linked through something like yes that person and that person the other both people OK not exciting but if you apply this principle you get something more out of it so if you combine
aggregation and relevancy all the statistics that we can do that is actually how we as humans look at the world if I ask you what is that the more most common website that you go to you'll probably not single because you know that's of interesting we as humans have been trained from the very beginning to to recognize patterns and to spot anomalies at the very same time and this this concept can and can be used for other things as well for example if you use the same principle the significant terms aggregation and per time period
so you split your data into time period and you ask what is significant for that period how do you call that feature was a very common feature that we not see it's what's
trending that's just it because it's more specific it's not more popular then in any other area not necessarily but it is more specific it for this 1 time period for
the current time period let's say compared to yesterday compared to compare to the general background yeah so again once 1 you're once you're doing these aggregations there's again 1 1 single caveat that can happen
is that you can have too many options to many to many buckets too many things to calculate and if that happens what you need so imagine that you're looking for that combination of actors that store are together very often so I'm looking for the top 10 actors and then for each of those I'm looking for a set of twins and actors that act with them that they appear together if I just ran that's what will happen in the background is I will essentially get get a matrix of old of all the artist of all the actors and all the actors and it would be huge it wouldn't fit into memory it would probably blow up like clusters and actually LC search would probably refuse to run this query because it would say hey I would need to much memory this is just like a fly so what you can do is you can just say just doing breadth 1st just 1st get the list of the top 10 actors and greatly limit the matrix that you will need to know that you will need to calculate and then and then go ahead so it'll be a little slower it will have to run through the data essentially twice but it will actually finish and it will still finishing inquired about quite a reasonable time so that's just how to find the the common caveat that people people get into when they start exploring the the aggregation to especially with multidimensional
so just to just to wrap things up because we're approaching approaching the end and questions the lesson here is that information is power now we have a lot of information about the data we have all the statistics all the distributions of the individual words
and if you if you understand this and if you if you can map your data to to this problem you can get a lot more out of felsic 6 search then just finding finding of a good hotel in London or the conference events in Warsaw so that's it for me and if you have any questions so i'm i'm here to answer them
FIL From this question about the and heard a any questions the but the long question if you think this but you know this is more specific question what solution is a full of how to search people like the 500 mark like to can you do that more like people have that there are 90 % of being like me instead of a fixed number of seats number to find the time of course you can do that by a by a simple query because
aggregations are always run on the results of a query to can very easily remember the example that I gave with the with the language classification when I was looking for at least 4 words I could do the same I could say give me only the users that have at least 70 per cent or 90 per cent or or 9 yeah I can use both relative and absolute numbers of the same artist and I like and use those as the basis for the aggregation so yes absolutely and it would actually be much simpler you wouldn't even need a sample aggregation that's any
other questions so still awake they didn't OK I'll take that as its so the question the question going once going twice sold are there Indian performance implications of running out say hundreds of of course of blood again scale way
beyond hundreds of I've seen I've seen people doing millions of percolation and it still works and scales very well with the distributed nature of ElasticSearch and essentially the only resource that the percolation of consumes a CPU so add more CPU either to a Bob saw add more boxes and it will scale fairly nearly as so and also just the more boxes and more CPU you you will have the faster it will get you don't need anything else you don't need much memory you don't need to foster you only need the CPU so it's very easy and fairly cheap to scale up to to give you an idea I think that if you wanna run hundreds and thousands or millions of percolation and you will need like of 4 5 reasonable boxes or something like that and you will get responses within within milliseconds so it actually does scale very well another
question who could you give us some examples of the customers who have you mentioned that you have like cases that were like real impressive for you and you didn't expect those use cases as could you give us some examples
of the use cases from the customers that you mention that uh that you didn't expect that so uh so some of the some of what we can expect it was there was a percolator example there some people running the clusters of Elastic Search and they don't store any data on it like they have a cluster of 15 20 machines without storing and data that is a weird weird experience for for essentially a data store so that's definitely 1 of them and we also always wanting into into these issues where we have a future we recommend people to use it and then people listen to or advice and we find out that we might have underestimated the the the the people in in the wild for example we introduce the the idea of a man of index aliases that you can have an alias for an existential like like assembling or something so you can have you can sort of the couple but the design of the indices from what the application sees so you could have like an alias for users but all the users can live together in 1 big index and the alias will just point to that index and a filter and that works very well unless until we encountered of of of the users that had millions of users and suddenly we had millions of aliases and we didn't thought that would ever happen so as with anything else with with computer engineering like assumptions assumptions assumptions so we we encountered something like that we had to go back and fix it and we work the aliases so these are the 2 most notable examples were we got really surprised by how our users use of the product that we really didn't foresee and it's it's good because we always learn something new and it allows us to sort of we orient ourselves better to what the users actually means the a of
questions so hello and I have a question regarding could reverse queries for language classification so basically um less accessible as n-gram indices so could use those actually for classification of languages uh so n-grams have
that problem that they have very wide spread so they might give you some correlation with the language but they will definitely not be not be parasites as just to explain n-gram is essentially if I split word into all of all the tuples of of of letters for example with the things I would have th a a J on the and then I would essentially query for these for these triplets and it will obviously have a have a correlation but it will in no way be the decisive enough especially for something like language classification where you're really interested in in the in the probability n-grams are very good for as in 2 as an addition to something else because because of the nature because they the always match something and that's why you typically don't wanna use them alone but they're fine if you have some some more optimistic methods like exact matching and then the regular like fuzzy matching and everything and then you just throw n-grams into the mix to sort of push the signal if it matches and sort of to catch something's if nothing else matters so I definitely wouldn't using GM for language classification and I will die typically only use them with a combination of other other query types and the other analysis process make sense the but this think that we're we're running out of time so thank you very much and if you have more questions all the all the outside
fj
Open Source
Vorlesung/Konferenz
Biprodukt
Quick-Sort
Computeranimation
Addition
Ortsoperator
Wort <Informatik>
PASS <Programm>
Mailing-Liste
Computer
Elektronische Publikation
Computeranimation
Homepage
Metropolitan area network
Mailing-Liste
Menge
Automatische Indexierung
Datentyp
Wort <Informatik>
Information
Datenstruktur
Speicher <Informatik>
Informationssystem
Resultante
Metropolitan area network
Statistik
Gerichteter Graph
Mailing-Liste
Information
Elektronische Publikation
Datenstruktur
Gerade
Computeranimation
Subtraktion
Matching <Graphentheorie>
Abfrage
Zahlenbereich
Computeranimation
Aggregatzustand
Schnelltaste
Lucas-Zahlenreihe
Inverse
Frequenz
Term
Computeranimation
Ausdruck <Logik>
Digital Object Identifier
Standardabweichung
Mereologie
Speicherabzug
Wort <Informatik>
Information
Umwandlungsenthalpie
Lucas-Zahlenreihe
Dicke
Gebäude <Mathematik>
Framework <Informatik>
Teilbarkeit
Computeranimation
Datenfeld
Standardabweichung
Mereologie
Vorlesung/Konferenz
Wort <Informatik>
Normalvektor
Lucas-Zahlenreihe
Wasserdampftafel
Gruppenoperation
Varianz
Kardinalzahl
Computeranimation
Ausdruck <Logik>
Datenfeld
Forcing
Standardabweichung
Mereologie
Information
Abstand
Elastische Deformation
Inhalt <Mathematik>
Normalvektor
Ext-Funktor
Metropolitan area network
Lineares Funktional
Spannweite <Stochastik>
Datentyp
Abfrage
Vorlesung/Konferenz
Elastische Deformation
Große Vereinheitlichung
Computeranimation
Standardabweichung
Inklusion <Mathematik>
Metropolitan area network
Managementinformationssystem
Matching <Graphentheorie>
Division
Quick-Sort
Computeranimation
Metropolitan area network
Videospiel
Bit
Reelle Zahl
Mereologie
Zahlenbereich
Vorlesung/Konferenz
Auswahlaxiom
Teilbarkeit
Computeranimation
Haar-Integral
Zufallsgenerator
Inklusion <Mathematik>
Resultante
Wellenpaket
Matching <Graphentheorie>
Zahlenbereich
Abfrage
Kartesische Koordinaten
Teilbarkeit
Computeranimation
Konfiguration <Informatik>
Eins
Metropolitan area network
Virtuelle Maschine
Menge
Inverser Limes
Vorlesung/Konferenz
Automatische Indexierung
Reverse Engineering
Indexberechnung
Abfrage
Vorlesung/Konferenz
Computeranimation
Resultante
Lineares Funktional
Web Site
Browser
Indexberechnung
Abfrage
Nummerung
Quick-Sort
Computeranimation
Loop
Automatische Indexierung
Inhalt <Mathematik>
Perkolation
Analytische Fortsetzung
E-Mail
Adressraum
Formale Sprache
Zahlenbereich
Zellularer Automat
Extrempunkt
Polygon
Räumliche Anordnung
Computeranimation
Metropolitan area network
Reverse Engineering
Vererbungshierarchie
Vorlesung/Konferenz
Große Vereinheitlichung
Softwaretest
Shape <Informatik>
Division
Logarithmus
Abfrage
Ausgleichsrechnung
Ereignishorizont
Systemaufruf
Datenfeld
Menge
Forcing
Automatische Indexierung
Wort <Informatik>
Räumliche Anordnung
URL
Perkolation
Inklusion <Mathematik>
Web log
Kategorie <Mathematik>
Formale Sprache
Abfrage
Oval
Extrempunkt
Ereignishorizont
Quick-Sort
Computeranimation
Metropolitan area network
Deskriptive Statistik
Menge
Mereologie
Canadian Mathematical Society
Dateiformat
Wort <Informatik>
URL
Gravitationsgesetz
Perkolation
Optimierung
Informationssystem
Distributionstheorie
Bit
Web log
Kategorie <Mathematik>
Kombinatorische Gruppentheorie
Computeranimation
Intel
Metadaten
Metropolitan area network
Knotenmenge
Vorlesung/Konferenz
Wort <Informatik>
Inhalt <Mathematik>
Perkolation
Resultante
Distributionstheorie
Mereologie
Plot <Graphische Darstellung>
Computeranimation
Schnittstelle
Statistik
Multiplikation
Datenhaltung
Hausdorff-Dimension
Vorlesung/Konferenz
Computeranimation
Metropolitan area network
Mailing-Liste
Zeiger <Informatik>
Term
Computeranimation
Inklusion <Mathematik>
Resultante
Distributionstheorie
Statistik
Web Site
Gruppenkeim
Frequenz
Term
Computeranimation
Eins
Metropolitan area network
Trigonometrische Funktion
Nominalskaliertes Merkmal
Stichprobenumfang
Mereologie
Wort <Informatik>
Information
Diagonale <Geometrie>
Gerade
Haar-Integral
Resultante
Einfach zusammenhängender Raum
Freier Ladungsträger
Schiefe Wahrscheinlichkeitsverteilung
Gemeinsamer Speicher
Relationentheorie
Gruppenkeim
Nummerung
Speicherbereichsnetzwerk
Computeranimation
Richtung
Metropolitan area network
Rechter Winkel
Gruppe <Mathematik>
Stichprobenumfang
Bildschirmsymbol
Hill-Differentialgleichung
Term
Einfach zusammenhängender Raum
Subtraktion
Gruppenkeim
Wort <Informatik>
Binder <Informatik>
Term
Speicherbereichsnetzwerk
Computeranimation
Homepage
Statistik
Web Site
Mustersprache
Frequenz
Term
Computeranimation
Programmfehler
Flächeninhalt
Strömungsrichtung
Frequenz
Computeranimation
Distributionstheorie
Matrizenrechnung
Statistik
Dicke
Schaltnetz
Abfrage
Mailing-Liste
Information
Computeranimation
Konfiguration <Informatik>
Metropolitan area network
Menge
Festspeicher
Leistung <Physik>
Wort <Informatik>
Information
Primzahlzwillinge
Cluster <Rechnernetz>
Fitnessfunktion
Leistung <Physik>
Güte der Anpassung
Zahlenbereich
Vorlesung/Konferenz
Ereignishorizont
Computeranimation
Resultante
Zentrische Streckung
Stichprobenumfang
Basisvektor
Formale Sprache
Abfrage
Zahlenbereich
Wort <Informatik>
Computeranimation
Zentrische Streckung
Quader
Festspeicher
Natürliche Zahl
Endogene Variable
Vorlesung/Konferenz
Perkolation
Zentraleinheit
Computeranimation
Aliasing
Formale Sprache
Güte der Anpassung
Abfrage
Kartesische Koordinaten
Technische Informatik
Biprodukt
Quick-Sort
Computeranimation
Virtuelle Maschine
Reverse Engineering
Automatische Indexierung
Vorlesung/Konferenz
Elastische Deformation
Indexberechnung
Perkolation
Speicher <Informatik>
Cluster <Rechnernetz>
Metropolitan area network
Addition
Prozess <Physik>
Existenzaussage
Natürliche Zahl
Schaltnetz
Formale Sprache
n-Tupel
Abfrage
Matching
Quick-Sort
Computeranimation
Fuzzy-Logik
Datentyp
Mixed Reality
Wort <Informatik>
Korrelationsfunktion
Analysis
Roboter
Metropolitan area network
Red Hat
Computeranimation

Metadaten

Formale Metadaten

Titel Beyond the basics with Elasticsearch
Serientitel EuroPython 2015
Teil 65
Anzahl der Teile 173
Autor Král, Honza
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/20127
Herausgeber EuroPython
Erscheinungsjahr 2015
Sprache Englisch
Produktionsort Bilbao, Euskadi, Spain

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Honza Král - Beyond the basics with Elasticsearch Elasticsearch has many use cases, some of them fairly obvious and widely used, like plain searching through documents or analytics. In this talk I would like to go through some of the more advanced scenarios we have seen in the wild. Some examples of what we will cover: Trend detection - how you can use the aggregation framework to go beyond simple "counting" and make use of the full-text properties of Elasticsearch. Percolator - percolator is reversed search and many people use it as such to drive alerts or "stored search" functionality for their website, let's look at how we can use it to detect languages, geo locations or drive live search. If we end up with some time to spare we can explore some other ideas about how we can utilize the features of a search engine to drive non- trivial data analysis including Geo-enabled search with relevancy.
Schlagwörter EuroPython Conference
EP 2015
EuroPython 2015

Ähnliche Filme

Loading...