Live build: How to harness streaming data in real time
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 56 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/67173 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202231 / 56
22
26
38
46
56
00:00
Level (video gaming)Demo (music)SimulationIntegrated development environmentExtreme programmingProcess (computing)Endliche ModelltheorieMusical ensembleDemo (music)Real-time operating systemDisk read-and-write headRight angleBitCASE <Informatik>Different (Kate Ryan album)DatabaseStapeldateiFactory (trading post)BuildingTelecommunicationDecision theoryXMLUMLLecture/ConferenceComputer animation
01:37
StapeldateiProcess (computing)Physical systemInformationSlide ruleGoodness of fitStreaming mediaDataflowDifferent (Kate Ryan album)StapeldateiPhysical system2 (number)Computer animation
02:25
Process (computing)Physical systemInformationStapeldateiProcess (computing)Moment (mathematics)Water vaporFluid staticsPhysical systemBitOperator (mathematics)StapeldateiTransformation (genetics)Multiplication signSet (mathematics)Lecture/Conference
03:26
Streaming mediaProcess (computing)BuildingElectric currentStapeldateiStapeldateiOperator (mathematics)Streaming mediaProcess (computing)Computer architectureProjective planeBitTerm (mathematics)DatabaseProgram slicingSlide ruleComputer animationLecture/Conference
04:10
Transformation (genetics)Process (computing)Streaming mediaService-oriented architectureDatabasePhysical systemScalabilityComputer architectureEndliche Modelltheorie1 (number)Mathematical analysisMultiplication signWave packetStapeldateiQuery languageSoftware frameworkSemiconductor memoryWebsiteMultilaterationComputer animation
05:06
State of matterRead-only memoryScalabilityQuery languageProcess (computing)Streaming mediaTask (computing)Software engineeringResultantArrow of timeService-oriented architectureBitRectangleDiagramLecture/Conference
05:40
Read-only memoryScalabilityTask (computing)Process (computing)Streaming mediaQuery languageMobile appDisintegrationScale (map)ResultantMoment (mathematics)Complex (psychology)NumberSoftwareoutputLibrary (computing)BitFormal languageDatabase normalizationEndliche ModelltheorieInformation engineeringMultilaterationTransformation (genetics)ScalabilityPartition (number theory)Goodness of fitDifferent (Kate Ryan album)Computer animation
07:22
Process (computing)Partition (number theory)Structural loadService-oriented architectureMereologyProcess (computing)Streaming mediaComputer hardwarePhysical systemoutputGroup actionInstance (computer science)Message passingLecture/Conference
08:25
Scale (map)Demo (music)Gateway (telecommunications)Computer hardwareScaling (geometry)Demo (music)Group actionService (economics)Structural loadComputing platformPartition (number theory)Internet der DingeUniform resource locatorStreaming mediaSoftware bugBit rateCodeInstance (computer science)Table (information)Connected spacePoint cloudWeb browserForm (programming)Computer animation
10:53
Gamma functionMetreStreaming mediaSlide ruleSlide ruleDiagramDatabaseObservational studySynchronizationLecture/ConferenceComputer animation
11:36
Addressing modeIntrusion detection systemLine (geometry)Branch (computer science)Inheritance (object-oriented programming)Template (C++)Library (computing)Transformation (genetics)Computing platformService (economics)Software development kitSoftware testingOpen sourceLaptopBit rateRepository (publishing)MathematicsMessage passingCodeTotal S.A.Web browserBitProjective planeMalwareMedical imagingCuboidUniform boundedness principleComputer animation
13:39
Optical character recognitionGamma functionNumberLine (geometry)Information engineeringInheritance (object-oriented programming)Frame problemCodeTable (information)Stability theoryService-oriented architectureFile formatSoftware development kitLecture/ConferenceSource codeComputer animation
14:50
GEDCOMMIDIDimensional analysisFunction (mathematics)Line (geometry)Frame problemAbsolute valueMessage passingTotal S.A.LaptopSource codeComputer animation
15:57
Level (video gaming)LaptopPoint (geometry)TouchscreenPerfect groupInformation engineeringGoodness of fitLecture/Conference
16:35
InformationBit rateStreaming mediaEstimationFood energyWeightPhysical systemReal-time operating systemTransformation (genetics)Food energyDisk read-and-write headBit rateExpert systemMetric systemComputer animation
17:09
EstimationFood energyBit rateWeightFood energyZirkulation <Strömungsmechanik>Term (mathematics)Well-formed formulaComputer animation
17:44
EstimationFood energyBit rateWeightBit rateFood energyAverageDisk read-and-write headMetric systemFrequencyWeightSpeech synthesisRight angleReal-time operating systemTransformation (genetics)Multiplication signStapeldateiDatabaseMathematical analysisStreaming mediaInternet der DingeComputer animationLecture/Conference
18:48
Bit rateStapeldateiDemo (music)Internet der DingeGoodness of fitDatabaseBitBit rateTable (information)Point (geometry)Disk read-and-write headView (database)Right angleMathematical analysisComputer animationSource code
19:39
Bit rateMaximum likelihoodVacuumMathematicsMenu (computing)Mathematical analysisDatabaseLaptopCodeStructural loadStapeldateiProcess (computing)Physical systemInstance (computer science)Computer animation
20:25
VacuumCalculationDistribution (mathematics)PlotterBit rateDisk read-and-write headType theoryAverageFood energyWell-formed formulaSlide ruleComputer animation
20:59
Electronic data interchangeCalculationWell-formed formulaSlide rulePreprocessorProcess (computing)Multiplication signFrequencyFunctional (mathematics)TimestampAverageBit rateDisk read-and-write headPoint (geometry)Computer animation
21:35
Time zoneMenu (computing)Point (geometry)FrequencyAverageBit rateDisk read-and-write headFrame problemFunctional (mathematics)InterpolationComputer animation
22:09
Inclusion mapComputer clusterMultiplication signFood energyChannel capacityWeightTransformation (genetics)Right angleLaptopPoint (geometry)Functional (mathematics)StapeldateiSummierbarkeitStreaming mediaComputer animation
23:28
InterpolationPoint (geometry)CodeoutputBit rateParameter (computer programming)2 (number)Function (mathematics)Template (C++)Sampling (statistics)Multiplication signOpen sourceLibrary (computing)Projective planeDemosceneComputer animation
25:06
StapeldateiBit rateBitFood energyCalculationPoint (geometry)Bit rateDisk read-and-write headLibrary (computing)Semiconductor memoryMultiplication signSet (mathematics)AverageInformationOperator (mathematics)Transformation (genetics)BuildingStapeldateiRight angleWindowLecture/ConferenceComputer animation
26:26
Function (mathematics)WordBit rateBuildingComputer animation
27:05
InterpolationLibrary (computing)Semiconductor memoryTransformation (genetics)Demo (music)InterpolationFunction (mathematics)outputCodeFood energyInformationMultiplication signCalculationDisk read-and-write headBit rateAverageProjective planePoint (geometry)NP-hardLibrary (computing)2 (number)Goodness of fitComputer animationLecture/Conference
29:22
Gamma functionCalculationMaxima and minimaWide area networkFunctional (mathematics)Right angleFood energyCalculationLine (geometry)Goodness of fitFrame problemGraph coloringPoint (geometry)Real-time operating systemVideo gameFunction (mathematics)Video game consoleIntegrated development environmentStreaming mediaTransformation (genetics)Letterpress printingComputer animationSource code
31:21
CalculationLevel (video gaming)Service (economics)View (database)TouchscreenInterpolationBuildingCodeLecture/Conference
32:00
Streaming mediaCodeMedical imagingService (economics)State of matterMultiplication signPhysical systemView (database)InterpolationSemiconductor memoryQueue (abstract data type)Process (computing)Food energyVolume (thermodynamics)Real-time operating systemMessage passingDefault (computer science)MiniDiscMixed realityFunction (mathematics)CalculationoutputTemplate (C++)Computer animation
34:15
CalculationEmbedded systemIntegrated development environmentDatabaseMessage passingState of matterMultiplication signLibrary (computing)Core dumpSemiconductor memoryService (economics)Real numberCodeView (database)Template (C++)Subject indexingFood energyContext awarenessUniform resource locatorStreaming mediaResultantMusical ensembleIntegrated development environment2 (number)Run time (program lifecycle phase)Cartesian coordinate systemComputer fileWeb browserGroup actionProduct (business)Functional (mathematics)Socket-SchnittstelleWeb 2.0String (computer science)Touch typingComputer animation
38:28
Gamma functionProcess (computing)Slide ruleFood energyQueue (abstract data type)MiniDiscResultantLecture/ConferenceSource codeComputer animation
39:12
GEDCOMForm (programming)Service (economics)Food energyCalculationPoint (geometry)Operator (mathematics)Demo (music)Streaming mediaSlide ruleShared memoryScaling (geometry)Source codeComputer animation
40:16
TrailCartesian coordinate systemLecture/Conference
41:25
Cartesian coordinate systemPhysical systemReal-time operating systemPattern languageDiscounts and allowancesGateway (telecommunications)Lecture/ConferenceMeeting/Interview
42:00
Event horizonDiscounts and allowancesConnected spaceGateway (telecommunications)Cartesian coordinate systemWeb browserReal-time operating systemLecture/Conference
42:47
Inflection pointPlastikkarteLimit (category theory)Sign (mathematics)Software developerService (economics)PlanningType theoryDoubling the cubeLecture/ConferenceMeeting/InterviewComputer animation
43:24
PlastikkarteVideo gameOnline helpLecture/Conference
44:15
Musical ensembleJSONXMLUML
Transcript: English(auto-generated)
00:07
So, hi everyone and welcome to the most heartbreaking demo ever, because you are going to see today how to work with real-time data, but it's not going to be any real-time data. It's going to be my head right bit.
00:21
I'm wearing a band right there. So we are going to be showing you the differences between streaming, batch and much more with my own data. So I am Javier Blanco. I'm the senior data scientist at Quix. I previously worked as a senior data scientist in Orange in Spain.
00:42
It's a telecom company and before that in Jaguar Land Rover in the UK. And I have Thomas here with me today. Hi, everyone. So I'm Thomas Neubauer. I'm a CTO and co-founder of Quix and previously I work in McLaren. So in in McLaren we were getting data from F1 cars to cloud, so people in the factory can start building decision insights in real-time and
01:08
it was quite challenging use case because there was a 30 million values per car per minute. And so no database could handle that and that was how we kind of end up using streaming technologies like Kafka and Kubernetes.
01:22
But what we found out is that those technologies were actually quite difficult for the people in the data teams to use and this is how the idea of build something like a Quix started to grow. Yeah, Javi?
01:40
Yeah, so Thomas has been talking about streaming and we are in the stream path of the conference. But we wanted to ask you to raise your hands if you are working or have work with the streaming technologies. Okay. Okay, good. And with Kafka? Okay, pretty much everyone. So I'm gonna go through the next slide quite quickly because we have a
02:07
room of people who knows about this, but let's reflect on the differences between streaming and batch for a second. You know that the Reaver metaphor is one that is used quite frequently to describe this. So
02:21
we have a flow of data. This is live data coming in as any system would generate. You know that under the batch processing approach we a set of data is collected over time and then it is fed on the processing system. So yeah, the metaphor is you full a bucket of water and then you throw it to your static data lake
02:46
where you are going to perform this the processing. This is good in some ways because you have historic data here. If your transformations need from that historic data, it is good. But you are also spending resources here in the loading,
03:01
in the saving. You are also inserting some lag, some delay between your processing and the moment where the data was generated, right? Following that method for processing will be building up a pipe. So you process the data piece by piece as it is generated. Again, just to reflect a bit more on this, imagine we want to perform an aggregation operation.
03:24
That's our original data coming in under a batch system. You have to perform this intermediate operation where you load the data. You then process it into the aggregation. So you are spending some resources here and inserting some delay that you don't need to with the stream processing.
03:46
Okay, I'm gonna skip the next slide because we have a knowledgeable room. Thomas, do you want to keep talking about this on architecture terms? Yes. Thank you, Jave. So let's start with architecture and basically with the streaming we need to change a bit a mindset when we building
04:03
architecture for the project. So originally when you have a batch system, you have database in the middle and you send data straight to database and then use some batch framework like Spark to load the data, transform them, and then put them back to database. And then you, for example, use dashboards or you consume data from database with your consumers.
04:24
Nothing wrong with that really, but the problem is that database have to persist data and also serve the query at the same time. And this is where the scalability issues will rise when you get more and more data into the system. And databases are not really great at scaling,
04:43
even the modern ones. So when we do a streaming architecture, we are still having a database in the picture. It's still very useful for historical analysis and maybe training of your model, etc. But this time it's on a site and we basically syncing data from a broker to the database
05:03
for later. And we build the pipeline in memory using a broker and microservices. So in this diagram, every arrow is one topic in a broker and every rectangle is one microservices consuming from a topic and publishing results back to different topic. Let's talk about that a bit more later.
05:25
So what's our approach? So our approach is using Kafka, Kubernetes, and Python in the marriage together to solve these problems. So microservice approach is quite well established now in the software engineering teams, but it isn't really in data teams and people are not using it for
05:45
you know, the data engineering teams and data scientists are trying, well, are not really into this at the moment. So what we're trying to do is make it a bit easier, a bit less frictional to start using it. Then using Kafka to get scalability,
06:03
resilience, and low latency for our pipeline. So Kafka is very, very fast. It's introducing around 10 millisecond delay between microservices, if they are working on the same network. It's very well designed for scaling, which I will talk about a bit later, and
06:22
also, including a lot of resiliency features in it. And Python, because Python is number one language in for the data people, data scientists, data engineers, etc. There's a massive ecosystem of all different libraries, ML ecosystem is really there. So that's why Python.
06:42
So Okay, my mouse. Sorry, so the main concept in streaming is pub and sub. It's a microservice that's subscribing for input, getting the data into the microservice, doing some transformation that could be very simple, data cleaning, data
07:02
normalization, data filtering, up to very complex model that's, for example, using ML and then publishing result to another topic. So this is everything that microservice has to do. Why this good idea? Well, first of all, it's very scalable because those topics
07:20
the input and output are formed from smaller topics, we call them partitions. And those partitions basically redistributing load of your topic to different nodes in a broker. So broker is formed from 5, 10, 20 nodes. And so imagine if you have 10 partitions in this input topic, that means that
07:40
10% of your traffic will be in each partition, roughly, and those are then redistributed around. They are also replicated, so you can set replica 2, for example, which will then send every message twice but to the different nodes in your broker. That means if you have a hardware failure in one of the nodes,
08:01
then the stream processing is ensured. Then the middle part is where we use the Kubernetes and our microservice is scaled using replica system. This is basically how you scale the compute part of your processing. So when you have more and more instances of your processing in a consumer group,
08:23
what it will happen is that the partitions will be redistributed equally in your consumer group. So if you have nine partitions, each instance will get three. And essentially 33% of your traffic going to each instance of this consumer group. It's also resilient because if you have
08:42
a bug in your code or maybe a hardware failure in a Kubernetes cluster, the others would take the load until the third one is get restarted. Let's build a simple demo together today to show you how this looks looking in a
09:06
real example. So I will get to our platform. Cool. So here we have our topics and as you can see
09:21
the form data topic is already getting data and that's because I have here my phone which has been converted for the purpose of this demo to IoT device. So it's sending acceleration data, temperature, GPS location, but also it's sending a heart rate from HAVIS sensor or
09:42
the strap sensor using a Bluetooth connection. So if I go to our topic, you can see here active stream. And first of all I can start with the GeForce data. Cool. So if I shake it out of scale because we get some GeForce data there and now it's
10:05
calm because it's on the table, I can get rid of that and look at the heart rate which is now coming from HAVI, 79, 80, et cetera. So what you're looking at right now is data being streamed
10:26
from this heart rate sensor through the Bluetooth to my phone and from the phone to the cloud using a WebSocket gateway service to the Kafka. And you might ask why we're using gateway service
10:41
with WebSocket connection. Kafka is not really designed to connect to end devices. It's an internal backbone rather than IoT hub. So then it's going to cloud and we subscribe for data in this browser. So quite a journey. So now if we have this data, usually what you need to do
11:05
is analyze them. You need to look at what's going on, what's coming in, what I can do with this data. So that's why we have this persistence here and this is exactly what we show you in the slides. If I go back to slides, to that diagram, that column is this. When I enable it,
11:25
you get the database and you sync that topic into it, which means that then you can basically go here and observe data that had been streamed before. So you can see that we were practicing this morning. So I can go here and you can see GeForce data that happened at 11.22 when we were
11:49
testing here. Now this is something that give you more idea what's going on. You can also bring it to Jupyter notebook, but Javi will show you that later. So how are we assuming that we have
12:01
idea what we want to build? How do we build this pipeline? So let's start with something super simple like the one-on-one example of Pub and Sub service. So you go to transformations because we solved the source. We have a data in platform already. And here we have the template
12:20
for Pub and Sub service. Now this library, this is the idea of making the microservices a bit easier. It's like a project generator where you just answer some questions and you get the whole microservice with the Docker image, with all credentials, everything pre-configured to work with your infrastructure out of box. So that means that if I press next and I call
12:43
it something like GeForce total and I listen to phone data and send it to GeForce total. Basically here you are calling the topic names, right? Yes. That we are listening to or
13:01
producing tunes. So basically what you see here is a full kit repository with a microservice code populated in a master branch. And now you can clone it locally to get it into your IDE and debug it line by line. Or if you want to do that, you can also use our web browser to
13:25
write the code here. So without any changes, if I just press run, this will connect to the heart rate and you see GeForce data. It's a lot of messages, heart rate, but you see that we're getting data in this line number 18.
13:46
And now we can do some magic. Now I quite often show this line. So data scientists and data engineers like to use data frames. So you can do this. And why this is actually possible?
14:10
Because the quick SDK, which is what we're using here, introducing a tabular format for data in Broker. So you don't have to deserialize it and serialize it by yourself. It's using
14:23
Protobuf under the hood, but you don't have to care. It's basically doing all the serialization for you. So you have it served as a data frame. So now this is working. So let's create a new feature, something super simple. So I have here a small code snippet, which I'm just going to place here. And there we are. And what this
14:52
does is if there's a GeForce data, I calculate a total acceleration for all dimensions
15:02
in absolute value. Just a simple feature. So if I press run, as you can see in the console, we're getting 10 Gs. And if I shake it, it's getting higher. And now you can also check the data output. How the message looks like. There we are.
15:27
And here we have a GeForce total as a new feature. And it is an output topic because of this line. So this is how you sell sorry, send a data frame to the output topic
15:43
very easily in one line. So I hope that this makes sense, because Javi will now use this approach to build something more real. So Javi, do you want to yeah, take it over? Sure. So I'm going to take my laptop. Thomas, can you connect that there?
16:07
No. Okay. Good. So you can see my screen there. Perfect.
16:28
Let's say that up to this point, we follow a quite typical approach here. You are playing the role of data engineer. I'm going to be playing the role of a data scientist. So Thomas has created this system that ingests data in real time. We have just seen it. We've been played to
16:44
do some transformation. And now it's my turn as a data scientist to try to add some value to this data. So what I'm going to try to do is calculate the calories that I'm burning. We have all the data we should need, which is basically my head rate and some biological
17:04
metrics that I know. And obviously I'm not an expert in calories burning, but I did some little research and apparently there are these two ways to burn calories. One is really just by keeping alive and it is unintuitively expensive just to be alive. We have to keep
17:24
our thermoregulation. We have to respirate. We have to keep our blood circulating. And that is really expensive in terms of energy. And then we also burn calories by moving. I'm not going to go through the details, but you see we have a formula here by
17:43
Knox et al that give us the amount of estimated calories burned over a minute, depending on the average head rate in that period and those other metrics like my weight, my age, etc. So I'm going to create a transformation that will generate the calories that I'm burning
18:02
as I'm speaking here to you. What Thomas has just built is something like this, right? We have the real time data coming from the phone in that topic. That topic we are persisting it a database. This will allow me to do a historic analysis if I want it. I'll show you that in a
18:22
second. Basically, if I access to the data here in this database, I will work with a traditional batch approach. But what I want to do is to react to that real time data and build this transformation there in our microservices approach. Again, we are following all the time
18:41
this batch versus streaming. I'm going to build this transformation first in a batch approach and then we'll turn it to streaming. So to work in a batch approach, where am I? IoT demo. Here it is. Good. So as Thomas said earlier, we are saving the data in our database. So I could
19:09
come here, check the head rate and look at the latest point of data and you see there how I got a little bit nervous when I started talking and I was a bit calmer while Thomas was doing it.
19:24
No, actually here is where I started and here is what I'm back to talking. We could be refreshing that and you see I get a little bit nervous when I speak. We could explore more data in this table view or do any other visual analysis here but we
19:44
could also query that data from the database with this code and take it anywhere really. So because I'm a data scientist, I love Jupyter notebooks, so I'm going to come here to my Colab. I'm going to paste this piece of code that I have just shown you and I have just
20:03
copied and there it is. That is the data from the database load now in my Jupyter notebook. So this is the traditional batch approach. We have the data stored in a database and I'm bringing it here to my processing system and I could do some processing now. Here for instance,
20:23
I'm bringing some other historic data to compare, merging those two together and I could check at the distribution plot of my head rate. One is me practicing, the other one is me giving a talk to you. You could tell which one is which, right?
20:43
Yeah, or we could do another type of aggregation. Again, look at my average head rate versus normally or we could do what we are here to do which is calculating the calories. As you can see there, that is exactly the formula that I showed you in the slides.
21:03
Doesn't have anything complicated on it. The only pre-processing that we need before applying that function is that because that function is estimated for a period of time, we will need to calculate the time between each time stamps in our data there. So I'm doing that
21:24
here and we have the time delta between each point of data and then I have the heart rate between those two points. So you see here it was like two cents of a second
21:43
between this data point and that one and the average head rate over that period was 80.5 because the previous one was 81 and now we are in 80, right? Now with that we can apply the function that I was showing you. So I'm gonna create this new column in the data frame
22:02
called cal for the head rate. I'm passing the data frame column with the interpolated head rate and for the time delta, yeah, the time delta column that I have generated then this is some oxygen capacity when resting. This is my weight, my age
22:22
and here we have the calories that I'm burning between each of the time, between each of the data points and I could do something like calculating the sum of the current data. So this is my
22:41
stream ID. I'm going to copy that and look, put that there and this says that I've spent pretty much 19 calories, right? But is this right? Well, it was some time ago
23:06
when we loaded the data into the Jupyter notebook but by the time that it has taken us to process the data, generate the function, check everything, this is now outdated. This is the traditional batch up problem, right? So let's bring this same transformation
23:25
to a streaming system and for that we are going to come back to quakes. Here I am and the first thing that I'm going to be doing is I'm going to create that time delta interpolation. So I want to create a new column in our phone data
23:43
that tells me the seconds that have been since the last data point. We don't need to build it ourselves. As Thomas introduced earlier, we have this library. This is open source. We have a community contributing here. There is something like interpolation. So I can preview that code.
24:08
I can check the documentation and yeah, this does exactly what I would like, right? So I'm going to keep it as interpolation. The input topic is going to be phone data. There it is.
24:23
The output topic is going to be phone data heart and the parameter that I'm going to be wanting to interpolate on is heart rate and this is a no code sample that I can just deploy. I could
24:41
save it as a project and then edit it and use it as a template or I can just hit deploy and this should be working. Let's wait for it to get built and we'll see if this is working. Again, what this is doing behind the scenes is working, creating the microservice with
25:05
Kubernetes so that I as a data scientist don't have to bother to do this. It's taking a little bit longer than I expected. So I'm going to show you briefly what we have done. Remember,
25:22
with batch approach, we really have a set of data like this with times and heart rate and what we did was a window operation where we calculated the time delta. So for t2 it was t2 minus t1 and then the average heart rate between those two points. Then having that transformation
25:42
done, we could do the calories calculation. Now in streaming data, we only get the latest point, right? Like that. In t1 we get that, in t2 we get that, t3. So how are we doing this calculation of t4 minus t3? That's what the library item that we have shown does for you.
26:03
So it has a memory bill inside that only keeps the little amount of data that we need for the calculation. So when we are at t2, we are also saving in memory the information from t1 so we can perform that operation. But only as we need it in t3, we will only save to two
26:22
and so on. So this is what the library item that I'm building is doing for us. I don't know what has gone up, so I'm gonna check again. Here, great, phone data, great, phone data, heart, deploy. Yes, why this is building? Is it clear
27:07
how we are doing this transformation? How we are keeping just the amount of memory that we need in place? Yeah? Cool. Any idea, Thomas? I'm not sure. That's why it's called heart
27:37
breaking demo. No, we have an interpolation there. Yeah, I don't know what this is showing
27:44
because, yeah, it is actually working. Here we see the deployment, you see it's listening to the phone data and it's outputting to that head rate phone. There you see the data coming in. We are not persisting the data coming out, but it is working. So if we check the data view,
28:06
as Thomas did earlier, I can check the inputs and I should be able to check the outputs as well. There we are. So check. So head rate, yeah, that was original, but now we also have
28:22
delta time in seconds, which has been calculated, and the interpolating head rate. This is the average head rate between the previous data point and the current one. And with this information we are now able to do the calories transformation. So I'm gonna click there to add
28:41
a new transformation. This will take me to the library again. I'm gonna select Python and hello world to keep it simple, yeah? Good. I'm going to call this calorie calculation and what is this listening to? To the topic what I was outputting before, right? That's where
29:03
I have the latest transformation. And then I'm gonna output this to phone data heart calories. I'm saving this as a project. I'm not deploying this time straight away and this takes us to this git project. Just to keep it simple, I'm gonna copy and paste the code from here, but
29:23
the first thing that I will need is this function that I was using, right? So I'm pasting that function there that does the calorie calculation and now I'm just gonna copy and paste this here. So that and we'll see in a second what this is
29:47
doing. Good. So check there line 26. We are listening to the latest data. We are converting
30:01
that to a data frame because it just makes our life easier and then I'm calculating the calories burned over the last two data points using the exact same function that we saw in Colab. Then I'm printing in the console the calories burned. I'm adding those calories burned to a new
30:22
column into the data frame that I created and then I'm sending that data frame to the output topic and check how simple it is, line 47, to send this modified data to an output Kafka topic, right? I'm gonna run this in this live IDE. Let's see. Cross fingers.
30:49
Okay, listening to stream. This should start in a second. There it is. Nice. So those are my calories burned as we speak in real time. That's what
31:09
I'm burning here right now. So we built this transformation in real time. Now, Thomas, I'm gonna take it back to you so you put the data into some place, right? Thank you, Jave. So basically what you saw was Jave building two microservices,
31:24
although he wasn't really building the first one, but every microservice was deployed as a container into Kubernetes and this second one I'm just gonna now deploy as well. Now because it's a code, I have to also first build it into Docker. Luckily that's not that
31:41
hard. So hopefully you can see my screen in a minute. Yeah, cool. So going back to home screen here into pipeline view. Yeah, we have here two interpolation. I'm gonna delete
32:02
one of them and I'm gonna go back to the code of Jave. Yeah, I'm deploying it. Yeah, so what you see now is that this is being built into Docker image and then deployed as a port in the Kubernetes as a container. So what we want to do now, well, we want to
32:24
consume this data somehow. So I'm gonna build more microservices and more stuff. So going through this again. Now we have interpolation, then we have a calories calculation and now we
32:43
want to see how much calories we have burned for each session. So as Jave is here today, this is one session from the start and we're accumulating calories burned and we want to see real time as it's moving, as it's accumulating. So for that we're gonna create in-memory view. What the in-memory view means? It's a stateful processing microservice. So this time
33:05
we're adding into the mix a state. So we discussed this before. We have input topic, output topic, microservice in the middle, but this time we're also gonna have persistence volume under the hood. Now how it works? We're relying on Kafka checkpointing system.
33:23
What the checkpointing system means? Well, the way how it works is that each topic is like a and as you're consuming data through it, you're committing offset, which means I have processed my data until this time, this message. Now when this happens, which is by default 10 to 20
33:44
seconds interval, but you can configure it a different one, we also persist the state to the disk. So we are not saving it every message we process. That will be wasteful and slow. We're only when the checkpoint is committed. What that means? When the service gets restarted,
34:02
it would start from the last checkpoint and we have the last checkpoint state persisted as well. So we load it and we continue in-memory only. So going back to Quix, I'm gonna press here and you note and we have another template called
34:22
in-memory view. There we are. It's another microservice and here I'm gonna listen to this topic we've pre-selected and I'm gonna call it calories view topic. There we are.
34:42
So now we have yet another microservice and we can here look at the code. So on the left you see the docker file as I was talking about last time and dependencies here in a requirements txt. So you can add any pip packages you want. It could be numpy, pandas,
35:06
or tensorflow, you name it. So going back to our function, here I'm gonna do some changes. So we're not gonna listen to engine rpm but cal. So that's the we don't want we don't need this
35:27
and we're gonna group it by rider. So rider is actually a person collecting the data. It's a bit difficult in one hand but hopefully it's gonna work. Cool. So it's almost there
35:49
and I will just try it first to see if I have everything correct. Run. So what this code does and I will try to make it a bit smaller. There we are. It's working. So what it does
36:04
is basically accumulating every message as it's coming from a calories microservice and just adding it to one value per stream per device. So we're just accumulating this state. So now because it's working I'm just gonna stop it and deploy it as a service with one core
36:25
and there we are. So we are almost there. So now we have all this whole pipeline here and you can see that it's being built. So I'm gonna create a destination. So what destination means in this context. It could be a database. So you're just seeing your result into snowflake,
36:45
red shift etc. It could be your application. So you string results back to your product. It could be for example a shopping recommendations and for that you probably use web sockets if it's website or other services. Or it could
37:02
be a dashboard. So but this time it is an in-memory dashboard. So you're not going to touch database here. We're just gonna have this state in memory and in a similar fashion we're gonna bring it to website. So I'm gonna use library dash. Anybody using dash here? Oh that's unexpected. Okay.
37:29
So realtime dashboard. So what dash is, is a simple python library that helps you to build dashboards without using javascript.
37:43
It's all what it is. So I'm gonna listen to coloris view and I'm gonna use rider as a index. So that's it. This will just now install dash and its dependencies and when it's there
38:01
we will run it and get the website address from quix so we can print it in a browser. So this would take like a 10 or 20 seconds. So what this is basically doing now is that we got a port in Kubernetes for ourselves for this session of IDE which has all the
38:22
runtime and all the dependencies that you need for building python code and it's now inside that port installing all the pip packages. It's done. So I will press run. Cool. So it's working. So if I press this button there we are and you can see it's moving. This is how we're
38:45
accumulating heavy calories as he's standing here. Obviously sketching up with the leading edge. That's how Kafka works. You have a queue and that's now being processed. So going back to our slides, this is the final result. We have the pipeline that's not touching disk anywhere
39:09
and that's why it's very performant. If we go to our pipeline and if I just share the resources that one of the one of the services are using. So for example the
39:24
calorie calculation is using the zero point yeah it's almost nothing and that's because basically what's happening is that data being deserialized, processed and sent the topic there's no IO operations it's using protobuf and that's why it's so performant. Also if it would
39:50
have more data if all of you were sending data what we basically do is scale all those services to multiple replicas with the replica slide you saw in the dialogue and that would mean
40:01
that they will share the streams coming from devices and this is how you can basically scale it pretty much forever. So this is for this demo. I believe we have ten minutes. So
40:25
do you have any questions regarding this? Was it clear? We are here to answer them.
40:47
This is pretty neat. I'm actually trying to understand how would I use this exact thing
41:05
because it's pretty neat in e-commerce search like I'm actually trying to understand this because it looks pretty neat. So in e-commerce there's various applications for streaming
41:22
it could be recommendations for example so you're tracking people in your e-shop in your application you're tracking with their behavior that doesn't necessarily mean only what they putting into their basket but maybe how they clicking to buttons how they moving
41:40
and you can detect patterns like if the person is about to bail out there could be some pattern in their behavior and you for example can in real time give them some discount or maybe suggest some other good in your shop that might keep them in your system and not leaving them so and this all basically works in a way that you're sending data to the WebSocket gateway
42:06
as we did with our application to the Kafka where you have set of microservices as we show which are looking at the data trying to understand them predicting certain situations and what the situation happens when for example your model detect okay this person is about to leave I need
42:25
to save it you send event with discount 10% and you send it to the same WebSocket connection to the browser and in the browser it suddenly apply 10% or show dialog look we're just giving you 10% discount and this is how you can do it real time before person actually leaves
42:45
and something I was going to say but for you and everyone we are terrible salesmen Thomas because our marketing team prepare something special for this occasion I don't know if we said it earlier but quick sees for free for developers it has an amount of credits that you
43:03
can use to to try it but for the people attending to the conference where marketing people have prepared some nice double credit thing so if you sign up now or if you're planning to give this a go I will sign up just now because that will give you double credits you will be able to
43:22
build any type of streaming service as we show so yeah if you are interested I will sign up there and you can contact us through through the life help in there so if you are stuck or don't know really how to use it there is that life help where Thomas I and many others will will answer
43:42
yeah yeah so do you have any other questions was it something not clear or cool so that means that we thank you thank you for coming here and I hope you like it
44:00
and if you have any questions or you want to just chat with us just click on that button that how we show you and we're gonna be around yes and we are gonna be around so thank you very much thank you