We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

From HTTP to Kafka-based microservices

00:00

Formal Metadata

Title
From HTTP to Kafka-based microservices
Subtitle
How we enabled switching our microservices from HTTP to asynchronous, Kafka-based communication
Title of Series
Number of Parts
118
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
HTTP or asynchronous communication in microservices? This question is frequently repeated and discussed. Obviously, HTTP-based communication is easier for developers and architects. Even if your developers have no prior experience with microservices, they will probably understand how to implement an HTTP service well. While asynchronous communication has a lot of advantages that allow to design and implement really robust microservices system, they also bring new challenges not so obvious for people who didn’t have a chance to work with such an environment before. In FLYR we mostly have HTTP-based Inter Process Communication (IPC) in our infrastructure. At some point, we realized that to provide the functionality required by our product we needed something more flexible and more… asynchronous. We designed and implemented a Python library to facilitate switching to asynchronous IPC, supporting one- or two-way or even single-request – multi-response communication. An important thing in the design process was to provide developers having HTTP experience a tool that would ease the process of switching to asynchronous communication. Consequently, to switch an HTTP server-side endpoint to asynchronous IPC is a straightforward task. We selected Kafka for our message broker, not surprisingly by comparing its performance reports with our very rough, but no less demanding performance requirements. But we also took care to hide the details of the broker logic from developers. Yes, we don’t use all Kafka capabilities, if you need e.g. Kafka Streams, you will have to use another solution. But we can decide what capabilities are used in our microservices and how we can make changes in the way our services communicate in a single place. There are also hooks, Kubernetes health checks and more with a lot of flexibility available out of the box. We plan to opensource our library. At the moment of writing it ‘opensourcing’ is still a work in progress and we didn’t have sufficient time to do it due to strict time constraints we have on delivering functionality to our customers, but we hope to be able to do it soon. In this talk I would like to describe how we solved particularly important problems, what solutions we developed, how we use them and what problems still need to be addressed by developers. In other words, I would like to describe you the journey we made from HTTP to Kafka-based microservices.
Keywords
Local GroupNeuroinformatikSystem programmingPhysical systemOffice suiteMachine learningPoint cloudInterprozesskommunikationHand fanQuery languageService-oriented architectureDependent and independent variablesSoftware developerScalabilityConcurrency (computer science)Condition numberAxiom of choiceDevice driverInterprozesskommunikationEndliche ModelltheoriePoint (geometry)Service-oriented architectureGroup actionFront and back endsPhysical systemQuery languageInteractive televisionDecision theoryCASE <Informatik>Cartesian coordinate systemDependent and independent variablesMultiplication signOffice suiteNumberConcurrency (computer science)2 (number)Condition numberMessage passingInformationDevice driverSoftware developerCodePlanningProduct (business)Process (computing)Line (geometry)Different (Kate Ryan album)Hand fanSlide ruleSynchronizationComputer scienceStudent's t-testData managementVirtual machineMereology1 (number)Digital photographyRight angleTime zoneChemical equationException handlingBasis <Mathematik>Asynchronous communicationComputer animation
Device driverService-oriented architectureSystem callSoftware developerInterprozesskommunikationLibrary (computing)Decision theoryKolmogorov complexityVapor barrierMessage passingConfluence (abstract rewriting)Software frameworkObject (grammar)Computer wormDependent and independent variablesServer (computing)Client (computing)InformationServer (computing)Decision theoryPoint (geometry)Dependent and independent variablesAbstractionDifferent (Kate Ryan album)CodeRing (mathematics)System callInterprozesskommunikationParameter (computer programming)Client (computing)Queue (abstract data type)Software developerService-oriented architectureLibrary (computing)Latent heatFunctional (mathematics)System identificationAddress spaceComputer programmingQuery languagePrice indexDevice driverPhysical systemMessage passingComplex (psychology)Computer scienceSynchronizationVapor barrierUnit testingConfluence (abstract rewriting)CuboidSoftware frameworkObject (grammar)Software testingMultiplication signBasis <Mathematik>Stability theoryCausalityFilm editingPlanningPattern languageSoftware bugImplementationSoftware maintenanceComputer wormDirected graphMathematicsAsynchronous communicationIdentifiabilityComputer animation
Server (computing)Event horizonClient (computing)Dependent and independent variablesProcess (computing)Software testingAsynchronous Transfer ModeExecution unitHill differential equationSystem callService-oriented architectureInterprozesskommunikationContext awarenessMeasurementError messageKolmogorov complexityDevice driverConcurrency (computer science)Condition numberAxiom of choicePersonal digital assistantStandard deviationDependent and independent variablesQuery languageService-oriented architectureProcess (computing)Data managementMeasurementPattern languageFunction (mathematics)Asynchronous Transfer ModeLevel (video gaming)HoaxSoftware testingSystem callHookingStandard deviationClient (computing)CASE <Informatik>Streaming mediaSanitary sewerServer (computing)Device driverMultiplication signProjective planeLatent heatInterprozesskommunikationMessage passingSoftware developerFlagDifferent (Kate Ryan album)NumberComputer wormCartesian coordinate systemExecution unitCuboidContext awarenessBuffer solutionComplex (psychology)Concurrency (computer science)Error messageWeb browserGenerating functionRule of inferenceSynchronizationException handlingBitState of matterRoutingAbstractionLogicUnit testingComputer animation
Message passingRoundness (object)Level (video gaming)Functional (mathematics)Service-oriented architectureException handlingLatent heatInterprozesskommunikationError messageComputer wormDependent and independent variablesServer (computing)Multiplication signSoftware developerView (database)Library (computing)Different (Kate Ryan album)LengthLecture/Conference
Transcript: English(auto-generated)
Hi nice nice to see you all here How was the coffee? Great nice, okay, so I'm a person that always wanted to be an IT guy computer science guy and so on so I Was working for 15 years in
Academic world doing science teaching students and and so on in the meantime I did my PhD but always the practical things were most important for me so even in doing some science I was always trying to do it more practical available for developers, so you can google something about it also
So now I'm now I'm working for flyer and Doing microservices for them. I was always interested in distributed systems So it's perfectly aligned with what I what I was doing before. I'm also one of the people that organize It'll be user group in my country, so please be friendly, okay
Okay, what about trial what we do in flyer We do revenue management system for airlines so actually we take a lot of data from them So we do big data ETL pipelines and so on then we do machine learning on this data, and we tell them
What prices should though they sell the tickets to earn most? That's our goal we have we have an office in San Francisco, and we have office in Krakow Poland So you can work for flyer
from European time zone and have work-life balance Okay, so That's some things we use not not everything and one more thing You know my colleagues from crack office told me that I must bring to you
we were recently on a Python conference some of us were in in check on Python conference And we've met there very interesting guy So we hired him You know the guy in the middle of the
Photo who knows this guy Yeah, that's right Kratek in Polish Kratek is For the ones that didn't seen it It's a mole and it's a character from cartoon in our childhood very popular in our part of the Europe So we we've met this guy we took him on a conference. He was on a talk on different talks
He even gave his own lightning talk so if you are wondering if you should or not he did it you can do it Okay, okay So if he is actually an exceptional data digger as a mole, so you know that's why we hired him
but to the point the history begins like a year ago a little less than a year ago, and I started to work in flyer and by then All microservices in our product were communicating over HTTP and flyer
Felt quite comfortable with this solution, but they also have the feeling that for some Applications it won't be the best solution And that we will at some point probably have to switch We already had some places where the services communicated over a rabbit MQ
but it was not perfectly implemented and we knew it so There was a feeling that we should find an asynchronous way of communication and The thing that actually caused the decision was a new requirement required to our e-commerce
E-commerce use case And the requirement can be described More or less like this we had an UI Interacting with the user this there was a backend for this UI So there was some interaction and at some point this interaction between the UI user and UI backend
Caused the UI backend issue a query to the service our team was maintaining and to Fulfill this request to respond to this request we had to
We Knew that we will have to fan out a Number of requests to the other services some of them most of them external So we knew it will be time-consuming to get the responses, but we wanted to give their user anything
To show to have anything to show to the user whenever we have anything useful for him And we when we get anything better for him, so we will update what whatever we are showing okay? So that was the use case so we actually wanted to implement something like this we get the original query We fan out the sub queries and whenever we get the first response response to a query
we sent a partial response to the original query to show something to the user and Whenever we get another response to a sub query we will send another partial response to update this Whatever this information that is showed to the user and so on okay, so that that was the use case we wanted to implement
So that's the first thing second thing was related to performance, and I can't give you the precise numbers, but From what I can tell you was was that when I looked at rabbit MQ we wanted persistence in our communication
when I looked at rabbit MQ at 5,000 messages per second that can be handled when you turn on the Persistence it was definitely not enough so the requirements were were quite Quite high okay, so that was that was the second important thing so of course you can do it using that using HTTP
Okay, but we already felt that we will anyway need a sycophonic communication So that's the good point to to start with and how we did it how we approached this situation, okay? We decided yes, okay. We we need to do we need to do a well, but but what and how?
We have HTTP based infrastructure, and it works okay. We have experience with it. We have developers experience with this way of communication They have habits with implementing this way of communications, and you know Competence is you know important, but old habits die hard, so it's the hardest thing to
to overcome in some situations of course we knew we lack experience with this kind of communication because we always did it using HTTP so one more requirement Of course we must do it well and the first time do it well well hard to do it well first time
but maybe And of course we knew we will get all these goodies when we switch to the asynchronous Communication and even more for example more opportunities anyone knows. What's the first year opportunity we get in this situation?
Anybody what do you think? Sorry also sure you can have always refactor the code But I have some plans for you. Can you can you catch your plane? Thanks, oh, sorry, but it's still perfectly operational
Anybody tries it was probably on the previous slide, okay
That's not along my line of thinking Sure, yes, you do you can have all these things, but you also you have the perfect opportunity to make any mistakes
Isn't it true Completely new mistakes completely new things can go wrong when you go to asynchronous communication So We can have different concurrency issues race conditions because we do asynchronous things okay And in the places we didn't have it before
There is a problem you know we should choose a broker We don't have experience we can read a lot we can do research But there is always a chance we will choose around the broker because we didn't research for the right thing If we choose the right broker, there's probably more than one driver we can choose
Okay, so which one should we choose and what on what basis what what? How should we decide We can use the correct driver But use it incorrectly Okay, if you know if it's just in your one simple service
That's basic, but if it spreads all over your system, and then you realize well You have to do this and that to make this communication stable And now you have to find all these places the other people just copied and pasted their incorrect code that's hard to to overcome and
Finally we can have correct rival and correct broker but we can use the broker incorrectly and a lot of different things may also go wrong, so We decided to contain all these horrible things in one place okay, a
library and Called this library a sink calls, and there's another hard question I have a cup for you. I Won't be throwing. I won't be throwing it. I'll walk I deliver it by you know Not by plane
What? Why I think calls Okay, don't answer don't answer don't answer. There's a cup for you. No we don't use a single below
For some reasons Okay, the reason is I just didn't do one I just wanted to give you the cup before you try to answer because the answer is so strange that you You didn't have a chance, okay?
As there's always this naming thing in in computer science Okay, so we wanted to create a library that meets our functional requirements We want it for developers this for this library to wherever possible
Resemble what they already saw what they know you know And of course we wanted this asynchronous communication below so whenever it's possible we want to join this See these three requirements And why? Why the library you know for maintain a maintainers of a library?
Or of this switch you know the Sauron you know the guy One ring to rule them all so yes one place to fix all the box. Yeah One place to change decisions, so it's much easier to change decisions. You don't have to trace it all over microservices
implementation just one place and If we need to apply good patterns, it's not just that we Teach all these people how to use this like Kafka driver
Well, we just use it well in one place, and we don't have to Change what we taught to people of course. It's harder than just updating the code But you know Sauron had to sell this these rings somehow to the people So how do we sell it to developers? There must be something for them in this
Yeah sure and not know Kafka because it's hidden But they but they do The complexity will be hidden hidden okay if we do a good
Abstraction above it so they won't have to think about all Difficult things related to it and lower entry barrier, okay, it's Way different Communication than using that using HTTP, so Let's try to do it this way so the decisions we had to take were easier because they
No longer were final We were more comfortable with the thought that we will have to change this decision at some point The decisions were that we will choose Kafka as message broker not super singly for performance Reasons and for performance it offers just out of the box, and we will choose confluent Kafka as a driver
Big cause the performance reasons, and we also hoped that something supported by confluent will be Really stable and well So well That's what we use
We wanted to make it just a library no framework approach no put everything inside just a communication library make it simple If we need something more complicated, maybe we will put another library above it So that was that was first thing we wanted to make it testable first. Okay, this Kafka is nice
How do I see you to this Kafka? Well you don't? you can't Issue HTTP to Kafka queue, so how do I test it? We need to give people a way to ad hoc send something to
To just let them test their service, and we wanted to make it testable automatically So we wanted to provide some reasonable mocks for unit testing to to make implementing tests easier and If possible, maybe we could make it resemble flask just to
To easier to let developers easier get used to the the new approach Okay, but it's like half of my time, and I'm talking talking talking, and it's developer conference So I probably that's what you think okay? So yes, okay. I'm showing you the code
To use it how do you how do you use it? We Create an object given that Service name as a parameter. It's just an identifier of this of the service it should be unique across your system and When you have this object and you want to create a server endpoint so an endpoint that will be
asynchronously responding for some requests So you just create a function and decorate it with a sink call server callback for The parameter for the callback is the name of the endpoint it resembles HTTP endpoint, but it doesn't have to the slush is not
Not necessary there just a convention This function will get the request object as a parameter you can do with the request whatever you wish And you can use this request object to create a response More than one response and each of these responses can be sent back, and they will be delivered to whoever sent the original request
So you can send zero or more responses to an to a request? Well to create to send a request so to know that's about identification. That's an ID of the
Service and address of the service, and this is name of the endpoint so you send the request to the service ID and Specific endpoint you have you can have a lot of endpoints in a single service To send a request You use a sink calls clients to create new message that will be sent
There's a destination ID in this message target endpoint and of course a payload And maybe some more things and you send this request But wait it's all asynchronous the sending is asynchronous. It is not non blocking So how do we get a response for this for this request now before we send the request?
We should define a callback to handle Response we expect so we define a function this time the decorator is a sinkholes client not a sinkhole server and we define callback for The service that will be sending responses to us That's why that's the first name and the end point that will be queried and will respond to to our queries
So well we can handle response response this way, so that's just about How you can how you can use it? Of course in the in the Most basic most basic approach of course the last thing you should do is you should start listening
And it's all in the client in the server actually you can have client and server endpoints in a single service, so you can get some requests Send some responses to fulfill them and then when you get the responses you can Response to the original request, it's it's all feasible
So you just call to the a sinkholes listen at the at the end of your program, and it will make a sinkholes receive messages Route them to correct to correct callbacks So what we have we have server which is like HTTP server even driven the callbacks
We we know from from from flask for example We also have client which is not like an HTTP client because it's not blocking. It's asynchronous you send a request and Just nothing happens. You should have a callback to handle a response
So if your request requires query from another service you get the request you send a query to another service and Your Process is not not blocked by waiting for a response it can
Actually serve another request waiting for for that original for the response for the previous previous request a Single process can be a server and the client of course and We can handle one request and any number of responses so we can have
More than one response to the original request, but we also can have no response to original requests we can just send notifications this way and if the Receiver expects just to handle the notifications. It's not a problem. You don't have to send the response
So okay, I know when I showed you the basics. I could talk a long time About different things that and the details that are inside, but I just want to tell about one single I for me. It's one of the most important things. How do you test it?
Is there a way to easily test this as unit tests? So yes, I think calls have a testing mode You need to enable testing mode and you can use it in your unit tests to enable testing mode you just Set the testing flag to true
Then you import your application which defines all these callbacks And finally you must have a fixture that will reset the testing mode between the tests Because we need to clear some buffers and when you have the testing mode enabled You can start testing. Okay, so we can have different
In different use cases that we want to test the most basic is we have a server Server endpoint and we want to check if it gives correct responses So we want to send it a request and verify what responses we got and if it's correct the most basic thing
so we need a way to send a test request to a Arbitrary service we can do it using that Because if you turn on the testing mode, yeah, I think calls you get something which is Showed here. It's test client and you can use this test client to create arbitrary
Messages and to send these arbitrary messages wherever you want in your unit tests Okay so that's how you send the request to to a tested service you send this request and you when you Send it the test client will receive the responses for you so when you send the request you immediately immediately can check what responses were received and
Then you can just assert for Correctness of these of these responses so we don't have to think about this Kafka below About messages or anything you just sent a request according to business rules and verify the response according to business rules
That's all That's this that's the simplest simplest thing simply simplest thing to do The more complicated is when we have Another service and we are testing the service in the middle a money broker
and this service is When when it serves our original request is expected to notify another service a notification receiver about some things So we are testing the original that the money broker, but while testing it we want to Verify that correct notifications would be sent outside of course
We don't want to spin out all the infrastructure you want to have it mocked in our unit tests We don't want such a communication So we need a way to mock this Other service just to verify what it would receive if it worked If it really was really started
So Aside from test client a sinkholes in testing mode gives you also a test server So in this test server we can with this test server we can in this test server. We can read just register an endpoint For this service we want to mock it was called notification receiver
There was a not if you can notify endpoint and this test server We'll just receive these messages for us and will allow us to retrieve these messages and check if correct messages were retrieved So we are disturbed then we trigger the end point. We want to test we can
Do some assertions about responses as in the previous example and? We can use the test server received requests to obtain the request that Were received by the service and the most complicated thing is when this third service is actually expected to
Respond to some queries and these responses should be used to generate the response we are testing So we want to mock all this service together with its responses to produce to check that correct Output will be produced so we did here we need to mock the random service
And to do it we just define a generator function and when we register the end point in the test server We just give it the fake responses generator And it will generate responses, and we will be able to to verify that payload on the output are correct
It's correct So we have testing tools out of the box we this calls are in testing mode I are made on stock, so they are deterministic the tests are deterministic We don't need my message QB och broker We don't have to think about how this IPC is actually done below just think about the business level on in testing
We have also many more features like before and receive before send hooks like endpoint context managers if you want to Measure how long performance of your endpoints so you can hook a context manager
around this endpoint we have error handlers for For endpoints kubernetes health check because we run it on kubernetes We have sewer like client and And a lot of more more things of course if we hide
some complexity We also hide some opportunities not only to make errors But so you know if you want Kafka streams for example So sorry we won't be able to deliver it because we hid this specific of Kafka below the obstructions
Okay, so we still have Can have these problems like concurrency issues we will have because we have a synchronous communication We won't run away from it, but it's all for developers It's all on the level of the business logic. They are implementing. They don't have to think about Kafka usage pattern draft
Driver usage pattern and so on and if there are some problems below That they can be solved in one place and actually were solved in one place without bothering a lot of developers so Switching from HTTP to async calls for server is straightforward. It's not a problem for client. It's a little bit more complicated
We Support one-way communication if we use more complex use cases yes It's a matter of doing it well. We have callbacks, so we always can have a callback hell But we also know that there are patterns to do it well okay, so we can build something above it if we if we need
We have easily disabled services so because we have we have tools to do it And we have now a standard project white layer to make a synchronous communication between between the services Thank you. I have three more cups. I won't be throwing it, but please
Grab them and don't make me take them back by playing you know to my place Any questions we have time for a few questions one or two maybe
Yeah, thank you for a talk an interesting thing is how to design exceptions and errors How did you approach that? We have as I said we have exception handlers So you can register an exception handler for specific exceptions for all your service
So exception handler is a function that will be called when an exception in your in your end point of course Okay for the HDP Developers they don't see much of it of the of the Kafka specifics But there's like a layer your your library layer between throwing a value specific errors
exception handlers are for the exceptions that are going out of the callbacks as for Kafka exceptions Mmm. Well they shouldn't see them because we should debug the library if not we must fix it Yeah, like Message length is fixed and stuff like that
Have too much payload and stuff like that you need to abstract that probably you know because in message-driven communication the exception the errors are handled on the in the different place rather on the level of Receiver than the level of sender of the request. That's also the tricky thing
To switch from HTTP you don't have error 500 because your service is down Your message is just waiting when the server in the service is up after like 10 minutes It will serve your stale requests, and maybe it will also send responses to you
Unfortunately, that's all the time we have but you maybe can grab I'm around I'm around so Questions after this one big round of applause. Thank you very much