Uncovering Arcon: A state-first Rust streaming analytics runtime
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 287 | |
Autor | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/56824 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FOSDEM 202232 / 287
2
4
6
8
12
17
21
23
31
35
37
41
44
45
46
47
50
62
65
66
67
68
71
73
81
84
85
86
90
92
94
100
102
105
111
114
115
116
117
118
121
122
124
127
131
133
135
137
139
140
141
142
145
149
150
156
164
165
167
169
170
171
172
174
176
178
180
183
184
189
190
192
194
198
205
206
207
208
210
218
220
224
225
229
230
232
235
236
238
239
240
242
243
244
245
246
249
250
253
260
262
264
267
273
274
277
282
283
287
00:00
DiagrammTechnische Zeichnung
00:48
AggregatzustandStreaming <Kommunikationstechnik>Maschinelles SehenDemo <Programm>t-TestSystemprogrammierungDatenbankAbfragePhysikalisches SystemFunktion <Mathematik>Interaktives FernsehenDifferenteWiderspruchsfreiheitVersionsverwaltungInformationsspeicherungEreignishorizontProzess <Informatik>DateiformatMiddlewareZeitrichtungDesintegration <Mathematik>ROM <Informatik>Nichtlinearer OperatorKompakter RaumHybridrechnerKontrollstrukturEbeneOperations ResearchRechnernetzLaufzeitfehlerAggregatzustandMaschinelles SehenProdukt <Mathematik>GruppenoperationNeuroinformatikFormale SpracheBitDateiformatRechter WinkelSerielle SchnittstelleGraphPhysikalisches SystemQuellcodeKomponente <Software>DatenstrukturIntegralAutomatische HandlungsplanungUmsetzung <Informatik>ModelltheorieKugelkappeEreignishorizontMinkowski-MetrikGüte der AnpassungDifferenteAnalytische MengeATMMiddlewareFunktion <Mathematik>ZeitrichtungMultiplikationsoperatorStreaming <Kommunikationstechnik>BildschirmmaskeAnalysisWiderspruchsfreiheitKontextbezogenes SystemFormale SemantikRichtungFigurierte ZahlAuswahlaxiomDatenbankt-TestSynchronisierungTabelleStapeldateiDatensatzHybridrechnerNichtlinearer OperatorVersionsverwaltungBildschirmfensterAbfrageKartesische KoordinatenRegulärer GraphÄhnlichkeitsgeometrieObjekt <Kategorie>Puffer <Netzplantechnik>Framework <Informatik>BlackboxKrümmungsmaßDatenflussArithmetische FolgeOffene MengeEinfach zusammenhängender RaumAbstraktionsebeneGebäude <Mathematik>InformationsspeicherungCoprozessorMinimumSelbst organisierendes SystemGerichteter GraphMetropolitan area networkEINKAUF <Programm>Arithmetisches MittelDreizehnFortsetzung <Mathematik>RoutingWasserdampftafelBaumechanikMailing-ListeCASE <Informatik>Elektronisches WasserzeichenMehrwertnetzPatch <Software>DatenverwaltungDatenfeldCOMHalbleiterspeicherZeichenketteProzess <Informatik>BeanspruchungRechenzentrumVerteilte ProgrammierungComputeranimation
10:07
Kompakter RaumHybridrechnerOperations ResearchEbeneKontrollstrukturRechnernetzStreaming <Kommunikationstechnik>Message-PassingNichtlinearer OperatorAggregatzustandProgrammschemaEreignishorizontFront-End <Software>Modul <Datentyp>Interface <Schaltung>TypentheorieKlassische PhysikATMProtokoll <Datenverarbeitungssystem>Befehl <Informatik>TabelleZeitrichtungMathematikDatentypLogarithmusLastIterationGruppenoperationDatenstrukturAbgeschlossene MengeBildschirmmaskeBeanspruchungKontextbezogenes SystemUnternehmensarchitekturFormale SpracheSurjektivitätPunktwolkeTransformation <Mathematik>BitSchlüsselverwaltungMAPHash-AlgorithmusZeitreihenanalyseFront-End <Software>VektorraumDatenflussMultiplikationsoperatorKonfiguration <Informatik>Kompakter RaumSoftwareAggregatzustandKartesische KoordinatenRechenschieberArithmetischer AusdruckProjektive EbeneFrequenzArithmetische FolgeRohdatenWarteschlangeZahlenbereichKomponente <Software>SchedulingDifferenteCASE <Informatik>Protokoll <Datenverarbeitungssystem>Message-PassingTypentheorieBefehl <Informatik>Streaming <Kommunikationstechnik>App <Programm>Funktion <Mathematik>Nichtlinearer OperatorLesen <Datenverarbeitung>Physikalisches SystemInformationsspeicherungLoginQuaderDichtematrixDefaultAutomatische HandlungsplanungMereologieRadikal <Mathematik>Demo <Programm>Figurierte ZahlDickeEreignishorizontModul <Datentyp>MathematikLaufzeitfehlerModelltheorieProzess <Informatik>LastVersionsverwaltungEinfache GenauigkeitFortsetzung <Mathematik>Rechter WinkelInterface <Schaltung>BestimmtheitsmaßBitrateFunktionalGruppenoperationGamecontrollerBeobachtungsstudieGraphComputerspielFormale GrammatikTabelleSoftwareentwicklerProdukt <Mathematik>QuellcodeZeichenketteNeuroinformatikt-TestCOMMinimumSelbst organisierendes SystemDatenverwaltungObjekt <Kategorie>XML
19:15
Innerer PunktSpieltheorieNichtlinearer OperatorQuellcodeAggregatzustandKonfigurationsraumt-TestKartesische KoordinatenQuellcodeAdressraumEreignishorizontFormale SpracheNichtlinearer OperatorGenerator <Informatik>InformationsspeicherungFunktionalZeitstempelAutomatische HandlungsplanungSchlüsselverwaltungStreaming <Kommunikationstechnik>BinärcodeUmsetzung <Informatik>VersionsverwaltungProgramm/Quellcode
20:46
Physikalisches SystemRadikal <Mathematik>Termt-TestKartesische KoordinatenZweiMultiplikationsoperatorSoftwaretestAggregatzustandApp <Programm>NeuroinformatikZeitzoneÄhnlichkeitsgeometrieComputeranimation
22:30
Informationsspeicherungt-TestDifferenteZeitstempelWeb SiteStellenringAggregatzustandMultiplikationsoperatorOrdnung <Mathematik>StrömungsrichtungInformationsspeicherungVersionsverwaltungDatensatzKartesische KoordinatenSoftwaretestLaufzeitfehlerAbfrageDateiverwaltungComputeranimation
23:42
Bridge <Kommunikationstechnik>EreignishorizontSystemprogrammierungData-Warehouse-KonzeptEchtzeitsystemFunktion <Mathematik>Gerichteter GraphStreaming <Kommunikationstechnik>PrototypingGebäude <Mathematik>DatenverwaltungAggregatzustandRechnernetzZeitrichtungE-MailPhysikalisches SystemDesintegration <Mathematik>Prozess <Informatik>Protokoll <Datenverarbeitungssystem>Selbst organisierendes SystemZweiFormale SpracheMultiplikationsoperatorPhysikalisches SystemData-Warehouse-KonzeptPrototypingDifferenteArithmetische FolgeTermFunktionalEchtzeitsystemTypentheorieMAPFunktion <Mathematik>SoftwareentwicklerAggregatzustandRechenschieberStreaming <Kommunikationstechnik>FlächeninhaltAnalytische MengeKreisbogenE-MailBeanspruchungZeitreihenanalyseXMLComputeranimation
25:28
Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:49
Hello everyone, my name is Max Meldrum, and today I'll be talking about Archon, a state-first Rust streaming runtime. So let's get started. So the agenda is, I will shortly introduce myself,
01:03
then we'll get on to something more interesting, which is the product motivation of Archon, and also a bit about the vision for the future. Then I'll talk about the Archon runtime itself, and then I'll dive into something more unique about Archon, which is the TSS language and this is something that is in work in progress, and
01:24
then I'll short demo, and then end off with a summary. Okay, so short about me. I'm a PhD student in the distributed computing group at KTH in Sweden, and we have experience working with data, I mean, analytic systems in the past, such as Apache Flink,
01:42
and my interests lie in the intersection of a few fields. So it's distributed systems and streaming systems and databases. So in short, I like building things, and I want to build things that actually are used, and so they don't end up as toy research systems.
02:03
Okay, so I'm assuming that most of you are aware of what streaming systems are. Some of you have heard of probably Apache Flink and Apache Spark Structured Streaming. So I'll just do a short recap of what streaming systems do. Usually on the left side you have sources in a data flow graph
02:20
that ingest data from systems such as Kafka or Pulsar, and then this data flows through the graph, through operators, and they run transformations and pass the data along in the graph. And usually you have sync in the end that outputs the data to a
02:41
external system. Okay, so let's look at how they're used today, at least in the in the context of a large organization or in an end-to-end pipeline. So you have the ingestion layer at the bottom, and then you have the stream processor, and then typically
03:02
it's connected to some form of data lake, whether it's Apache Hoodie, Delta Lake, or Iceberg, I mean a format, and then this data is backed by Lake Storage, which then data scientists can run analytics on using their framework of choice.
03:21
Okay, so why another, why another system? So why are we building Archon? So one of the main reasons is that existing systems are output centric. So if we look at the figure down here, so usually they ingest data, they have some state inside, but this state is only used to produce outputs. We're not really working with the state directly.
03:43
And one reason we're not doing this is that in existing systems, it's more or less a black box. Yeah, there are some systems that provide a raw API or even SQL nowadays, I think, but there's no way to work with state using time semantics and supporting
04:04
data that, I mean state that is updated by out-of-order records. You have to provide guarantees. So why should we take a state-first approach? Well, if we look at this figure, we can see that there's a form of mismatch because
04:22
in an end-to-end pipeline, this might be how the system is used. You work with the outputs and you send the outputs to other systems for other form of analysis. But the problem is when you do this, is that when the data leaves the system, you don't have the same form of time semantics or consistency semantics across the systems.
04:46
So what we want to do is to move basically everything into the first system. We want to try to do as much as possible within the streaming system itself. So let's talk about the state context. So let's say that we have Anna here, which she's a data scientist, and she wants to work on state inside the streaming system
05:08
using some SQL or DataFrame API. So she might ask herself like, okay, did I access consistent state and what version of the state is she looking at? And what I mean by version here is
05:23
what version in time? So is it the latest or is it the state at one o'clock, or is it one past one? So this is what I mean by version. And the ambition in Archon and with the TSS language that we will get into later on
05:42
is that you want to be able to run form of queries to extract the state. So let's say that Anna wants to look at the state at sharp 13 today, or maybe she wants something more advanced, which would be collecting the state every five minutes between one o'clock and two o'clock.
06:01
So let's look at the Archon vision. It's a bit similar as the other figure. We have the ingestion layer, and then we have the Archon streaming ecosystem. And in this ecosystem you have streaming applications that run, I mean,
06:21
supporting regular stream analytics and using event time and windows and operators and so on. But then you have the TSS language, which can be used to extract state either from a single application or even doing time alignment across applications, and this is something quite unique.
06:44
And then this state is backed by object storage, perhaps in a open format such as Parquet, and this is what we're currently using. And then you can have your frameworks on top that run on analytics here.
07:02
So Archon is definitely inspired by other systems, and most notably Apache Flink. So we definitely have features that you may recognize. So we're also using epoch snapshotting for guaranteeing exactly once. And then we also do out-of-order processing, and
07:21
we have event time and watermarks, and of course we also have managed state, because we want to we want to have state inside the system that we can extract and work with using the TSS language. So let's talk about the runtime itself. I'll be talking about the data formats that we have inside runtime, and what kind of middleware we're using.
07:46
I'll talk a bit about the state, state management, and then briefly discuss the current API, and then mention what's in progress for the distributed mode of Archon.
08:04
Okay, so when we start looking at different data formats for for Archon, we wanted a row-based format for the main OLTP workloads, and we ended up picking protobuf, and these are the main reasons why.
08:23
If you worked with other formats such as flat buffers or cap and proto, it's not so easy to work with in memory, at least in Rust. So with protobuf we can easily work with in-memory Rust structs, and then just convert it to protobuf, serialize data, and then convert it back.
08:40
Well, this is not so easy in the others. And protobuf has quite good space usage, and while it doesn't have the best serialization and deserialization speeds compared to the other formats, it's okay. But then we also have a first-class integration with Apache Arrow, and of course also Parquet.
09:01
Right now, we're mainly using it for when we're extracting the state outside of the system. So you can think of it as we can convert protobuf data to Arrow or Parquet, but we cannot convert Arrow to protobuf. But there are plans to
09:21
try to integrate Apache Arrow further into the runtime. And this could be, for example, window operators, maybe running on the GPU, or form of syncs where you batch up table data and you output this to the object storage,
09:40
or it could even be regular operators that work with Arrow data. So I think there are some cool things you can do there. And then the middleware that we're using in Archon is called Compact, which is a hybrid actor and component framework. So some of you may be familiar with the actor model.
10:02
So the components model is a bit different, but it's also similar in a way that you can build abstractions and connect things by channels. But I will be mainly referring to the actor model. And then what Compact provides us for Archon out of the box is scheduling and networking and timers.
10:26
So I'm not going to talk about Compact in detail because that would require a whole different talk, but I would highly recommend to check out the GitHub. The project is also developed in our research group, so go check it out.
10:44
So a Archon node is one of the main concepts in the runtime, and it's basically responsible for driving the execution of operators. And in the figure you can see that it receives messages, and then it executes its operator that has a user-defined function,
11:03
and it may have some state if it's a stateful operator. And then when it's done with the transformation, it will output the stream message. So you can think of the Archon node as a part of the dataflow graph, one of the nodes.
11:20
And yeah, it has a context such as state and timers and so on. So how are these Archon nodes scheduled? So we're currently relying on Compact's scheduler. So it's using a work-stealing approach by default.
11:40
So this scheduler has a bunch of worker threads, and then they try to steal work from different components or actors. So in this example, the first thing it does is that the worker accepts a schedule request,
12:00
then it de-queues and processes a number of events in the message queue. And this is configurable, so it's X here. And then when it has processed X events, it will request a scheduling again, if the queue length is larger than zero. Right, so now I'll talk about managed state in the runtime.
12:27
So Archon has a modular stateback interface, and we're currently supporting SLED by default. So SLED is a Rust native stateback end, or embedded database. And it's the default because it's a native Rust system, so it's fast to compile.
12:48
But it's an experimental system. But Archon is also an experimental system right now, so that's why it's the default. And then you have the classical RocksDB that is more battle-tested.
13:02
Something that is quite unique in Archon, I think, is that you can have a back-end pair operator. So it's not back-end per job. So let's say that you have a stateful operator that has a workload that is doing a lot of reads. Then you could pick a stateback end that is optimized for reads.
13:21
And similar thing goes for a workload that is very write-heavy, then maybe you would want RocksDB. Right, so let's talk about different state types. Some of them you may find similar in Flink. So you have value, map, appender, aggregator, and time log.
13:43
So maybe I can talk about value. So a value just represents a single value, or a value per key, if it's keyed state. And the map is quite similar. It's a hash map, and it could be a hash map per key. And then a time log is a work-in-progress feature.
14:04
It's similar to a vector you append, but you can do time series operations on it. And I think it's quite known that RocksDB and stuff, I mean, back-ends like that are not really optimized for streaming workloads.
14:26
So I think a cool future work here is that building a specialized back-end in Rust for streaming, and also for Oracle. Okay, so let's look at the API. So currently we have a traditional data flow API.
14:42
So if you're familiar with other systems, data flow systems, then it will probably be familiar. So on the right, you can see that we set up an application, we apply a source, and we give it a bunch of operators on the stream. So the future work would be to add some form of SQL API also,
15:02
so you can compile from SQL down to Rust and Oracle. Oracle is currently just executing locally. We're closing in to making it distributed, and distributed would mean you have key streams and key states, but also deployment onto, cloud deployment onto Kubernetes and such.
15:25
Okay, so now we finally get to the most interesting part, which is the TSS language. So it stands for Temporal Stream State, and it's a language and protocol for declaring or capturing stream state that is updated by out-of-order data.
15:43
And the language, the goal of the language is to be very simplistic, so it's a very SQL-like language. And you have high-level statements such as create, show, and describe. And the supported state types are tables, change logs, and then if you just want the raw state in protobuf format, you can also do that.
16:02
In this talk, I will not go into much detail about how the actual protocol execution occurs, because that is still a work in progress. But I'm happy to take any questions after the talk. So let's look at the create statement.
16:22
So first of all, creating a TSS is about two things and with an optional. So it's about what creating, extracting state. So what state do you want to fetch? And then also at what time? So you can define the time. And then you can also apply a optional periodic statement or expression.
16:42
How often do you want to collect the states? So down below, you see the current grammar for a create statement, just to give us an example. So what we'll see on the next slide. So the first example is just a traditional, quite classic extract and load.
17:03
We create the TSS and give it a name. And then we execute an extract command. We extract a state from two different applications. So you can see that the app one and app two, and then we have to specify some form of state ID. And then we give it a time.
17:21
So this is the when to execute. And then we say, okay, we want it for each hour. We want a new version and we want to run it until the next day. And then another example is that you can, because we're working with keyed streams and keyed state,
17:41
then you can specify a specific key. So here we do our own key command and define a key. So it should only fetch state from that key. And then also you can apply for each iteration, a periodic statement. You can apply a diff command.
18:01
So this is if you wanted to see the difference between each hour in this case. So you don't want to collect the full state each hour. And then something that is under development, but I think this would be a very nice feature as well, is to have a optional transform command.
18:21
So in the picture, you can see that you can add a transform and then you have a SQL statement that would actually have access to all the states in your extract expression. So you can do a extract, transform and load into your object storage. So this is if you want to do a transformation before you actually store it in the storage.
18:45
Okay, so the demo will be showing a teaser of TSS and they will showcase these three things. And just to give a bit of more context to that demo, this is how the overview of the architecture will look like.
19:02
So you have the applications on the bottom, they contain state, they're connected to Arcon's control plane, and then the TSS language or terminal is communicating with the control plane. Right, so in this demo, we will be mocking two streaming applications in Arcon.
19:21
So the first one will be a streaming pipeline where we ingest click events as we were monitoring the clicks of students on their computers. So we have ID, name and timestamp. And then we will maintain some click state. So we'll maintain a click counter per student and the amount of clicks. And then we will hold this state inside a value state.
19:44
And this is what we will be querying using the TSS language. And then we have a basic quite naive click source where we randomize data and generate source data. Okay, so if we jump down to the main function of the Rust binary,
20:01
we have application config in Arcon. And here we give some important configuration such as the application name. We also give it the address to the control plane. Okay, so if you jump to the application itself, we give it the configuration and we define the source to be the click source.
20:21
And we're filtering out my name because you're not monitoring me. And then we will do a key by on the stream and do a key on the student name. And then we will simply have a quite naive storage operator which will aggregate the clicks per student.
20:42
And then that's about it for this application. Okay, so we're jumping directly into the TSS terminal. The other application will be running a very similar pipeline, but it's monitoring the steps of the students. So how active they are, how many steps they're taking.
21:03
So we're in the terminal, so the thing we'll do is to show the applications. And we can see that we have two applications and the latest known time. So we can do like this, we can do a describe app and take the app. And then you can see the state that we have inside the application.
21:23
So here we have clicks and we have the state that we showed earlier. And similarly, we can describe the other application as well. But let's create a TSS. So we do create TSS, my TSS, let's call it that. And then we will extract state from both applications.
21:43
So app one, clicks, and app two, it will be called steps. And then we have to specify the time. So let's, and the time is the system time right now. It's on the computer. So let's see, it should be 21.
22:04
And then let's take it in one minute. And we specify the time zone. Okay, and let's do, let's do a periodic 30 seconds.
22:22
And then we'll just execute that. So we can see that it has been scheduled. Okay, so now we're in the future and we will be looking at the state that we created. So we're using a local file system storage, sorry.
22:41
And we can go to the TSS and then we can see that my TSS has been created. And then you can have the state from the different applications. So let's go to the first application. And then the click state as we defined. And then you can see all the different timestamps that, the versionings, basically.
23:04
So let's go into one. And then you have some parquet data. So we could look inside and tada, we have some clicks per student here and the aggregated clicks.
23:21
So we can see that each state row has a embedded timestamp that is managed by the arc and runtime. So it's when the state was last updated. So we can see that it has not violated the time bound that we gave it for the TSS query for this version because we managed out of order data. So we must guarantee that it does not include any timestamp above this.
23:44
Okay, so while Archon is still a prototype system and it's in development, why would you be interested in using Archon for your future analytics stack? Well, one example is that you could bridge streaming workloads together with time series analytics using TSS language.
24:00
And then also another thing is that perhaps you have an end-to-end pipeline, you're using streaming today, and then you're outputting this data to other systems. Perhaps you could fuse these things together and use just Archon and the Arc ecosystem. And the last thing is also that you can kind of build a real-time data warehouse
24:21
with Archon using both the outputs and the state using TSS. Okay, so the Archon roadmap is currently we're prioritizing functionality over performance, but of course performance is both important and also a fun topic. So this is something we'll be looking into in the near term.
24:42
But in progress is the TSS language will improve the protocol, the syntax, and add new features. But also try to finish the distributed Archon prototype. The aim is to build a community and Archon has a lot of different development areas
25:02
that could use help. So please reach out and if you're interested in Archon, either through on GitHub or personally on email. So I'll leave this as a summary and I'm happy to take any questions. And while we stare at the summary slide for a few seconds,
25:21
I would like to point out that Archon is no sole effort, so I would like to give a shout out to all the contributors so far.