Merken

Lessons learned from building Elasticsearch client

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
OK at the end so welcome back home I hope everybody had a nice lunch break and now I'm very happy to introduce a general quadratic but more importantly for this talk employee of the success of the company and on the other a so I'd like to tell you a
story it's a story about how we developed 5 clients for elastic surge in 5 different languages without losing our lights in the process so much and so as any good story starts a long
time ago in a galaxy and it starts actually when we look at the current landscape of of the clients for elastic search and then that there were some things that we like and
what we've seen are good and some things not so much for example in the in
the Python landscape very many clients and but none of them actually implement the entire set of API it's none of them did everything that we would like to see you in a client and none of them did of on a scale that we would we would be comparable as a result users had inconsistent experience with ElasticSearch itself and naturally the they blame ElasticSearch because of the
way they interfere interface with it was not ideal so we decided to create our own clients sort of to control the the last mile how people talk to at search so we can make sure that their experiences good and consistent so we started with the with the design obvious and we sit down and and said all all the things that we want our clients to be and for that we need to start with sorry ElasticSearch itself
elastic is distributed that brings up a lot of problems with it and a lot of opportunities it talks
about by the REST API for over HTTP which is both good and bad because it's good that it can be deceptively easy create your own clients than the number of clients just phone or just in Ruby was staggering just because everyone thought that always just http right I I can just do an HTTP requests and everything will be good but there are a lot of cases there that they didn't help with there's also a lot of diverse the deployment of all 6 some people just deployed of the clustering their networks and talk to directly others would use a load balancer some people would use alternative transport like threads of to gain some to gain some speed or would use a set of client nodes in a distributed multi rack set up or something like that it also just the set of endpoints and also research has is quite staggering it's almost 100 API and endpoints with almost 700 parameters so that the client folding to support and documents but more than that we wanted to clients to be true to the language that's why we only doubled the 4 or 5 in the beginning because all for the people that we have we we had we had a Python engineer yours yours truly here if we had we had an excellent Ruby double propro level we even managed to find a really great PHP got believe it or not so those where the clients that we started because we felt confident that we can actually make it feel like a Pythonic library not like a library written by by a job guy in his spare time by and we wanted the client to be for everyone we wanted people not to have any excuse to use it so
that's my 1st lesson that we learned no opinions no decisions in order to make sure that everyone would use the client we had to abstain from any making any observations any decisions because whatever you have an opinion there's someone out there who would disagree with you so the only way how to make sure that that won't happen is not to have any
opinions so we decided that line should low level of just essentially a one-to-one mapping to to the rest later and they should be extensible I with that we should design them in such a way that where you don't like some aspect of it you should be able to replace it or just put into it and change so he came up with this this is for
4 and for an HTTP client is the kind of complicated diagram that specifies how the plant works you have you have declined itself that has about transport class which has a serializer to serialize and destroys the as the goal over the wire then you have a connection pool that actually stores a list of connections are connection pool in this case is a misnomer because it actually doesn't pool collections just have holds collection of connections to individual nodes in the cluster i You can see why we would have the naming problem there and that we have that we have connection by default we use your 3 because that ended up to be the best 1 4 of 4 . com and we also have a connection selector so when you connect to multiple nodes that they control the strategy and how would you do a load balancing do use random or juice from problem by default we do round robin over a randomized lists of notes and the goal what of why we didn't up this way is so that we are able to give you the options of override any simple of
component in you just by crew subclassing the default implementation and filling in all the blanks so some examples of if you
want to create your own selector just create a class and you pass it in everything is essentially using dependency injection so you can just pass it in as a constructor parameter and we will use that instead so you see 3 examples here the was the 1st 1 is not really not really injecting your code it just setting of the options so the first one constructs blind to talk to the talk to the cluster and get the little currently some notes on start up then whenever a node fails and then also every every 60 seconds this is excellent for long-running process let's say about a website so that even when you keep changing your you're also search cost you keep adding nodes and nodes keep dropping out you still talk to all the nodes that are available the 2nd 1 is where you want to control the the load balancing for example imagine a scenario where you have to rack and you want to about the default only talk to the L 6 search nodes in the same rank as the application server and only fall back to the nodes in the other right if
none of those are available you can do that you can just write a simple class that will that will do this so that's that's what we mean by mean when we say that we're module and X and Y are the last example is just our using a thrift
connection which we actually provide as an optional plugin and using the difference arises in this case because want some people like you know better than GA some for some reason so that was that that was sort of the 1st lesson no opinion so what other people have no excuses not use the 2nd lesson
was to provide everything
because you just don't come up with something like this without some preparation without some prototyping and more
importantly you don't come out with a for a single language and having to be applicable for all the other it's very difficult to find a pattern that would work for for both by by some people and we the people for
example that's why we created a spike and a prototype implementation in both Python and Ruby sort of to make sure that the design will work the design will hold for both of these languages and also so that we have a reference that we can talk about that we can and we
can have the same terminology and then when we talk about connection pool we know what we mean even though it means different things in different languages and even the how we use it is not exactly correct but we have a code that actually show what it does so we could have at that point a clear
conversation even with even with a phd in Proc even with the job of the people that came on and even later with with . net of people who were developing the new blind now so far for the everything not just you see if your if you're design works but also that that you have something to to talk about so that you are absolutely certain that you're on the same page because you should never trust with humans and just their understanding if you can do more so that's the that's the next
lesson the next lesson that is don't sentimental machines jobs humans are amazing the amazing and a lot of things consistency not 1 of them consistency
and repetitive tasks so you will want to have something that doesn't get tired that doesn't get frustrated that don't doesn't mind doing the same thing over and over again to the that sounds like a computer so this lesson states that you should automate as much as possible in this is why I'm talking about this is I already mentioned we
have almost 100 API and with 700 parameters that's very difficult to track that's a lot of work a lot of boring tedious work that you don't really wanna be doing you
don't hire of decent Python being and force them to maintain a list of 100 API some 700 of parameters if there is any other way so what other ways that so 1st we thought that we could do a reference implementation of can't just arbitrarily choose 1 of the client and decide this
is how it should this is the reference implementation for our API this is the authoritative collection of all the guys all the parameters of the possible values and descriptions but that doesn't really scale that well 1st of all we only have 1 person per language
we only have 1 Python on developer we only have 1 program we only have 1 phd so the but what if you leave what if he's on vacation and there is a change that needs to be made like and also how do you how does
that person make sure that everything is everything is simple we found that even with the spike implementation of the of the transport layer that maintaining it when we add more features and we need to add it to both Python and Ruby even though it was just 2 of us and go to the 2 of us that lived in the same city which is not true for any other 2 people on the project it was very difficult it was difficult to keep in sync so we discarded reference implementation as an next we looked at documentation because
obviously uncensored has documentation and all the API sir document as all the parameters are documented as well but again there are documented for humans it's a documentation that's intended for for the developers
to read and to understand it to make sense of so again and require at random and this of manual labor just to make sure that everything that we need is there someone who would have to read actually all the documentation and collect all that all all all the stuff and not just 1 person but each and every all decline we have to do that that's a very tedious job a job that I I haven't signed up for a nite I doubt we would ever find a person would sign up for it for a job like that so what are
the options where they we follow the the progress from from the reference implementation to documentation had some had
some problems but it wasn't area so we decided to take it 1 step further to actually
extract all the information that's already in the documentation that's already in the and presented in in a structured format so we chose a format that's the human readable and machine washable and we had a lost we will so we just decided that we should document everything into a song and create perspective before specification all for our API this is 1 case where I was super happy that our security is written in Java but as a static Legion type language because it
provides you with a bunch of tools so we were actually able to write a tool that would just all parse the source code for all the API eyes and extract the 90 80 per cent of all the API said parameters in in in an automated fashion
and we then just have to go once over it and we could actually sharing this effort all declined people and just fill in the gaps FIL that the documentation for each of the parameter FIL and of the option that the type with this option is required or not whether it's a list of or single valued pulling are integer so that may therefore so much easier and going forwards also so much easier to define so what did we choose to to capture in this in this document 1st of all the of the Europarl all the
different variants of Europass if you if if it was dynamic which most of the urals analysis search are it can optionally include an index name a list of indices on on which to perform the of the action it can the angle of the dynamic dynamic parts so we have to document those including all the different options how the euro can look as part of it also so we had to do the HTTP ACP methods
so is it a GET or POST we decided to do their little than that just to list all the parameters a list all the ways how to combine them into Europe we didn't actually capture all the dependency that this parameter is only valid if this parameter is set to block we
just have found out that we don't really want to have this information this validation in the client and will instead decide on our users to use it directly so that way we would have less overhead with maintaining it and it would be the code would be
the covering much simpler so how does it look of this is this is an
example for for the suggest
API this is just a fraction of it and you can see that we have a link to the documentation we have all the possible HTTP methods so in this case poster again we have the all the different forms of the euro with the description for each of the each plot so here there is only 1 dynamic part which is the in optional index and we have a description of all the parameters we also have a description of the body what it contains an information where is required not and this is all the information I need to write or in my case to regenerate and a Python method I have the name I have the list of parameters I know which ones are required and which ones are optional so I can actually choose that these ones will be positional this 1 will be word and I have all the way how to actually put all this information together to create the URL and send it over to the server the last major thing about this is it minimize the effort to maintain it because we start getting into the same repository has also search codebases at the same repository where where the documentation is and that meant that updating this just meant that whenever you make a change in policy search itself whether I adding new API or just add a parameter inside the same Commodore all will request I also provide changes towards this but specification and then all the client people all they need to do is just monitor this 1 hour training on can help to see all the changes that they need to implement in their clients but again we we approach the the difficult things like people need to watch something and do something in and whenever you rely on people and you would get into trouble sooner or later hopefully later but you probably get into trouble so that brings us
to our last lesson test everything don't trust I'll just verify everything In our case a we needed to verify that all the clients are consistent and that they work well with the server so again we create
our own solution and we created a unified tests with we again and of
2 of machine possible language in this case yellow and created a simple a simple test suite with the set up and a and a bunch of actions and a bunch of assertions that enabled of the code run not only against policy search itself but also against all the clients so this is how it looks so this is again a
test for the such API and you can see there is a set of that will actually of colon action index with the parameter of index high ID and body and it will then you were fresh so we will make the document available for search and then it will there's 1 test the basic task force I just API and it will actually performed the such as operations and then run to assertions so this test validates that the suggests API is actually capable of correcting our type and this is a test that is of run as part of the job at the Swedish part the integration tests n so
that means that it's version specific it's in the single base so whenever you have a branch of elastic searching can have its own and its own test I just like just like any other and also all the clients have an interpreter for these tests that makes us sure that we have the same naming we have the same API coverage so we have the same exception handling because we can no just 1 assertions but we can also well assertions that that this should fail with this error code and just by specifying it 1 so that we can make sure that all the clients are consistent with the set of tools together of led to that when we decided to develop the 5th line after the orginal for it only took a few weeks for the for the job of the item right as an interpreter for that for the test suite and the for the API specifications and mature that everything is working as so these strongly
lessons that we that we learned a lot during during
this process it was it was good times and bad times but we we made it through and I believe that the clients are a are working well for people and now we're sort of approaching the next stage and that is to actually create a more high-level opinionated clients that would be more helpful to an hour and users but also for those clients we are OK with people not using them so that you were much and if you have any questions and I'll be happy to hear them things so Christians the come thanks for
that of the US and try to think about the view was possible to generate declines completely he sees you already have everything here you the parameters you have that value as so some clients actually do that for example the jobless required it doesn't actually contain much source code it just actually internalizes the Jason specification and it
generates the method on the fly but for Python I actually wrote a script to generate the entire client and use that as as the 1st graph and then I edited manually because it's great to automate
everything but usually it's OK to order they just 90 per cent and don't try to catch catch it all and just do the 10 % manually it's the classic 80 80 20 problem so I started with the generated code and then I filled in all the exceptions all the exceptions to the rule so now when there is a change I when the generation process again and a manually look at the data and and see what parts represent of of an actual change and what was just a manual added that I had to do In order for the API to actually feel more like thought OK have you been out China constrained to use the Protocol Buffers from Google as so well yes we have we have actually consider that as an alternative transport currently we're fine with just http and and some the we provide some alternatives there is an ad there is an experimental of transport with a metaphorical protocol and we haven't looked that much longer on because of the trade off didn't seem to be that used to to and that that investment however we're still we're still looking for more effective transports an encoding scheme so it might still happen is definitely something that you can implement yourself in applying for both clients and the server questions no more questions OK then that's it thank you again if you're not
Client
Kontrollstruktur
Computeranimation
Client
Prozess <Physik>
Formale Sprache
Client
Programmierumgebung
Elastische Deformation
Vollständigkeit
Widerspruchsfreiheit
Computeranimation
Resultante
Zentrische Streckung
Client
Menge
Client
Programmierumgebung
Vollständigkeit
Ganze Funktion
Widerspruchsfreiheit
Widerspruchsfreiheit
Client
Distributionenraum
Parametersystem
Vorlesung/Konferenz
Programmierumgebung
Quick-Sort
Widerspruchsfreiheit
REST <Informatik>
Schnittstelle
Formale Sprache
Zahlenbereich
Computeranimation
Übergang
Lastteilung
Multiplikation
Knotenmenge
Client
Prozess <Informatik>
Distributionenraum
Programmbibliothek
Äußere Algebra eines Moduls
Luenberger-Beobachter
Thread
Delisches Problem
Hilfesystem
Parametersystem
Datennetz
REST <Informatik>
Güte der Anpassung
Programmierumgebung
Entscheidungstheorie
REST <Informatik>
Menge
Formale Sprache
Parametersystem
Client
Ordnung <Mathematik>
Einfach zusammenhängender Raum
Browser
Klasse <Mathematik>
Übergang
Mailing-Liste
Lastteilung
Konfiguration <Informatik>
Knotenmenge
Diagramm
Modul <Datentyp>
Injektivität
Strategisches Spiel
COM
Serielle Schnittstelle
Speicher <Informatik>
Default
Gerade
Funktion <Mathematik>
Parametersystem
Konstruktor <Informatik>
Prozess <Physik>
Zwei
Klasse <Mathematik>
Implementierung
Kartesische Koordinaten
Code
Computeranimation
Konfiguration <Informatik>
Knotenmenge
Modul <Datentyp>
Rangstatistik
Last
Injektivität
Server
Zusammenhängender Graph
Default
Einfach zusammenhängender Raum
Subtraktion
Modul <Datentyp>
Klasse <Mathematik>
Vorlesung/Konferenz
Modul
Quick-Sort
Computeranimation
Mustersprache
Formale Sprache
Computeranimation
Prototyping
Prototyping
Einfach zusammenhängender Raum
Subtraktion
Punkt
Formale Sprache
Implementierung
Code
Quick-Sort
Computeranimation
Implementierung
Prototyping
Virtuelle Maschine
Umsetzung <Informatik>
Gewicht <Mathematik>
Prozess <Informatik>
Vorlesung/Konferenz
Widerspruchsfreiheit
Computeranimation
Homepage
Task
Parametersystem
Vorlesung/Konferenz
Computer
Extrempunkt
Zentrische Streckung
Deskriptive Statistik
Parametersystem
Formale Sprache
Implementierung
Vorlesung/Konferenz
Mailing-Liste
Computeranimation
Implementierung
Mathematisierung
Implementierung
Projektive Ebene
Softwareentwickler
Optimierung
Synchronisierung
Computeranimation
Implementierung
Parametersystem
Arbeit <Physik>
Prozess <Informatik>
Vorlesung/Konferenz
Softwareentwickler
Umwandlungsenthalpie
Hydrostatik
Computersicherheit
Formale Sprache
Virtuelle Maschine
Parser
Applet
Implementierung
Computeranimation
Konfiguration <Informatik>
Hydrostatik
Flächeninhalt
Perspektive
Datentyp
Dateiformat
Information
Hydrostatik
Parametersystem
Ganze Zahl
Datentyp
Virtuelle Maschine
Applet
Parser
Quellcode
Computeranimation
Konfiguration <Informatik>
Parametersystem
Retrievalsprache
Subtraktion
Mereologie
Winkel
Gruppenoperation
HIP <Kommunikationsprotokoll>
Mailing-Liste
Konfiguration <Informatik>
Automatische Indexierung
Mereologie
Parametersystem
Indexberechnung
URL
Analysis
Retrievalsprache
Client
Mereologie
Parametersystem
Validität
HIP <Kommunikationsprotokoll>
Information
Overhead <Kommunikationstechnik>
Code
Computeranimation
URL
Wellenpaket
Mereologie
Mathematisierung
Computeranimation
Eins
Deskriptive Statistik
Repository <Informatik>
Bildschirmmaske
Client
Skript <Programm>
Umwandlungsenthalpie
Bruchrechnung
Parametersystem
Softwareentwickler
Dokumentenserver
Indexberechnung
Plot <Graphische Darstellung>
Mailing-Liste
Boolesche Algebra
Binder <Informatik>
Suite <Programmpaket>
Automatische Indexierung
Mereologie
Server
Client
Wort <Informatik>
Information
Softwaretest
Client
Softwaretest
Suite <Programmpaket>
Server
Computeranimation
Softwaretest
Suite <Programmpaket>
Parametersystem
Nichtlinearer Operator
Gruppenoperation
Formale Sprache
Indexberechnung
Code
Computeranimation
Integral
Task
Virtuelle Maschine
Client
Softwaretest
Forcing
Menge
Automatische Indexierung
Prozess <Informatik>
Mereologie
Datentyp
Hill-Differentialgleichung
Flip-Flop
Term
Softwaretest
Umwandlungsenthalpie
Interpretierer
Suite <Programmpaket>
Server
Fehlererkennungscode
Ausnahmebehandlung
Desintegration <Mathematik>
Versionsverwaltung
Applet
Ausnahmebehandlung
Client
Umwandlungsenthalpie
Softwaretest
Suite <Programmpaket>
Menge
Prozess <Informatik>
Rechter Winkel
Versionsverwaltung
Gerade
Umwandlungsenthalpie
Parametersystem
Client
Sichtenkonzept
Prozess <Physik>
Güte der Anpassung
Quellcode
Quick-Sort
Computeranimation
Prozess <Physik>
Protokoll <Datenverarbeitungssystem>
Graph
Mathematisierung
Ausnahmebehandlung
Nummerung
Schlussregel
Benutzeroberfläche
Transportproblem
Code
Computeranimation
Puffer <Netzplantechnik>
Client
Generator <Informatik>
Mereologie
Elektronischer Fingerabdruck
Server
Äußere Algebra eines Moduls
Skript <Programm>
Vorlesung/Konferenz
Decodierung
Ordnung <Mathematik>
Ganze Funktion
Normalvektor

Metadaten

Formale Metadaten

Titel Lessons learned from building Elasticsearch client
Serientitel EuroPython 2014
Teil 118
Anzahl der Teile 120
Autor Král, Honza
Lizenz CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/19982
Herausgeber EuroPython
Erscheinungsjahr 2014
Sprache Englisch
Produktionsort Berlin

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Honza Král - Lessons learned from building Elasticsearch client Lessons learned when building a client for a fully distributed system and trying to minimize context-switching pains when using multiple languages. ----- Last year we decided to create official clients for the most popular languages, Python included. Some of the goals were: * support the complete API of elasticsearch including all parameters * provide a 1-to-1 mapping to the rest API to avoid having opinions and provide a familiar interface to our users consistent across languages and evironments * degrade gracefully when the es cluster is changing (nodes dropping out or being added) * flexibility - allow users to customize and extend the clients easily to suit their, potentially unique, environment In this talk I would like to take you through the process of designing said client, the challenges we faced and the solutions we picked. Amongst other things I will touch on the difference between languages (and their respective communities), the architecture of the client itself, mapping out the API and making sure it stays up to date and integrating with existing tools.
Schlagwörter EuroPython Conference
EP 2014
EuroPython 2014

Ähnliche Filme

Loading...