Serverless 2.0 with Cloudstate.io-stateful functions with Python - TIB AV-Portal

Serverless 2.0 with Cloudstate.io-stateful functions with Python

00:00

4

Related Material

Formal Metadata

Title

Serverless 2.0 with Cloudstate.io-stateful functions with Python

Subtitle

Imagine billions of functions, with in-memory state, distributed across a Kubernetes cluster!

Title of Series

EuroPython 2020

Number of Parts

130

Author

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/49969 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Serverless is revolutionary and will dominate the future of Cloud. Function-as-a-Service (FaaS) however—with its stateless and short-lived functions is only the first step. What’s needed is a next-generation Serverless platform and programming model for general-purpose application development in the new world of real-time data and event-driven systems. What is missing is ways to manage distributed state in a scalable and available fashion, support for long-lived virtual stateful services, ways to physically co-locate data and processing, and options for choosing the right data consistency model for the job. This talk will discuss the challenges, requirements, and introduce you to our proposed solution: Cloudstate—an Open Source project building the next generation Stateful Serverless, running on Kubernetes, Akka, gRPC, Knative, and GraalVM, with polyglot support for Python, Java, Go, JavaScript, Swift, Scala, Python, Kotlin, and more.

Speech

Text

Image

00:00

Function (mathematics)Staff (military)ZustandsgrößeState of matterTouchscreenDemosceneMeeting/Interview

00:27

Field (computer science)Point cloudCloud computingProcess (computing)Scheduling (computing)Parallel portAbstractionGrand Unified TheoryMessage passingDatabaseFunction (mathematics)Entire functionClient (computing)Power (physics)Open sourceKolmogorov complexitySystem programmingData recoverySpring (hydrology)Java appletModel theoryEvent horizonCAN busRead-only memoryPoint cloudMessage passingFunctional (mathematics)Computing platformDatabaseReal-time operating systemState of matterNumberPhysical systemOpen sourceEntire functionDomain nameDependent and independent variablesRight angleProjective planeFormal languageQuery languageSet (mathematics)Concurrency (computer science)Level (video gaming)Client (computing)Complex (psychology)SynchronizationData structureHookingSingle-precision floating-point formatData storage deviceKey (cryptography)Reading (process)QuicksortGame theoryData recoveryService (economics)System callMechanism designTelecommunicationDesign by contractEvent horizonEndliche ModelltheorieLambda calculusFunctional (mathematics)Connectivity (graph theory)NeuroinformatikPredictabilityIterationSoftware developerOperator (mathematics)Cartesian coordinate systemWritingDecision theoryProcess (computing)Scheduling (computing)1 (number)Characteristic polynomialBitAbstractionoutputOcean currentConstraint (mathematics)SpacetimeInteractive televisionSemiconductor memoryPoint (geometry)Multiplication signMereologyFigurate numberDataflowRun time (program lifecycle phase)RoutingIntelligent NetworkScaling (geometry)Field (computer science)Parallel portData typeBlack boxProgramming languageOpen setPower (physics)Revision controlMultiplicationJSONXMLUML

09:25

Spring (hydrology)Java appletEvent horizonModel theoryCAN busRead-only memoryConstraint (mathematics)Open sourceComputer-generated imageryPay televisionFunction (mathematics)Data recoveryEnterprise architectureInstance (computer science)Message passingData managementMultiplicationConstraint (mathematics)Medical imagingSemiconductor memoryOpen sourceState of matterEvent horizonProcess (computing)Functional (mathematics)Message passingService (economics)Point cloudMultiplicationTelecommunicationFunctional (mathematics)Uniform resource locatorEndliche ModelltheorieEncapsulation (object-oriented programming)Single-precision floating-point formatFormal languageSeries (mathematics)BuildingGame controllerBitComputer architectureCASE <Informatik>Scaling (geometry)Time seriesObject (grammar)Multiplication signMehrplatzsystemCombinational logicPhase transitionTheory of relativityCodePay televisionComputer hardwarePhysical systemData storage deviceDomain nameReal-time operating systemInstance (computer science)WritingMereologyReading (process)Term (mathematics)Different (Kate Ryan album)Characteristic polynomialComputing platformReal numberOverlay-NetzBus (computing)Concurrency (computer science)Ocean currentNumberRow (database)Replication (computing)Tracing (software)LogicCartesian coordinate systemProxy serverOperator (mathematics)Flow separation1 (number)Right angleSpeicherbereinigung

18:20

MultiplicationInformation securityEnterprise architectureMultitier architectureGame theoryCodeNumberGreatest elementCuboidMereologyComputer fileSampling (statistics)DatabaseSystem callProcess (computing)BitSocial classLimit (category theory)LastteilungState of matterMultiplication signMedical imagingEvent horizonLevel (video gaming)Tracing (software)Reduction of orderDecision tree learningComputer architectureSet (mathematics)Front and back endsInstance (computer science)Interactive televisionMultitier architectureNoise (electronics)Row (database)Electronic visual displayKey (cryptography)Semiconductor memoryGUI widgetCartesian coordinate systemView (database)Projective planeSynchronizationPhysical systemCodeNP-hardDomain nameUtility softwarePoint cloudLink (knot theory)Software repositoryInformation securityFunctional (mathematics)Extension (kinesiology)Library (computing)CountingFunctional (mathematics)Data storage deviceFormal languageMessage passingShared memoryLine (geometry)Product (business)MiddlewareFlow separationCommunications protocolCASE <Informatik>Real-time operating systemData typeMiniDiscAttribute grammarLevel of measurementString (computer science)Table (information)Cloud computingBusiness objectService (economics)Enterprise resource planningComputer configurationImplementationDefault (computer science)Arithmetic meanSheaf (mathematics)Real numberEndliche ModelltheorieAutomatic differentiationInstallation artMilitary baseOpen sourceExistenceRight angleDifferent (Kate Ryan album)TelecommunicationRevision control

27:15

BitMultiplication signRight angleCASE <Informatik>Queue (abstract data type)Volume (thermodynamics)Meeting/Interview

Transcript: English(auto-generated)

00:08

So, SceneTalk is about serverless 2.0 using closed state.io and stateful functions with Python. Yeah, really interesting. Yeah, I'd like to talk about – should I start?

00:23

Yeah, you can start, but I don't know if you are sharing your screen. All right, cool. So, yeah, I'd like to talk today about CloudState.io. It's an open source around what we like to call serverless 2.0, which is actually stateful serverless versus the stateless serverless we've been used to with things like Amazon

00:45

Lambda function as a service. I'm Sean Walsh. I'm field CTO and Cloud Evangelist with Lightbend, who is behind this effort. So, Berkeley recently made a prediction that serverless computing is going to dominate the future of cloud, and we agree.

01:03

So, why serverless 2.0? Why the next iteration? Function as a service was a great start. It gave us the mechanisms, a way of thinking around creating these components that we can begin to manage and take away the operational difficulties on behalf of developers, but

01:20

it was only the first step and we need to iterate. Function as a service is not equal to serverless. Serverless could be much more. We need to be able to allow coarse-grained, what we call general purpose applications to exist in serverless. So, not exactly what you would call a little fine-grained function, but maybe an entire

01:40

application might be able to be deployed to a serverless platform. So, Function as a service to revisit, great for embarrassingly parallel processing, orchestration, stateless applications, job schedule, orchestration, things like that, especially things that are very low impact on the database, quickly being able to retrieve

02:03

data, make a decision, and write data back. What it's bad at is reasoning about, as a holistic application, making any sort of serverless platform guarantees around two reactive tenants. One is called responsiveness and the other one's called resilience.

02:21

You need to be able to make these assumptions that these characteristics exist to be able to have any kind of a serverless platform, and again, general purpose applications. So, Function as a service gave us the abstraction of communication, and it works great as long as everything is fast flowing and smooth, and any given function isn't

02:44

probably trying to do too much. So, the message is input, the function is hosted somewhere, it does some thinking, and then a message comes out, simple as that. And the operational concerns are handled for us. It's the first steps of being opsless.

03:05

So, here's a little bit of a beginning of the problem. So, message in the function now is doing something in the middle. It's reading from a database, maybe more than one database, maybe it's doing joins, and then a message goes out. The big problem here is that that database interaction is a really big black box.

03:24

We have no idea what's going on, there's no guidelines to manage it. That means if you're equating one function to another, you really can't do it, because they do very different things. There's no systematic way to reason about what each one's doing.

03:41

The function is a black box. What is missing here is state. So far, when we talk about stateless applications, they really are stateful, but that state exists in your database. It's a little bit unnatural, because things in space like us and our cars and our phones, those things have a current state.

04:01

They're not separate from their state. I think that's a problematic concept from the beginning, but something we're very used to as developers. So, Serverless 2.0, what we propose is that real-time database access has to be removed to allow this sort of autonomy and reliability of our functions, to be able to reason about them in a way that's uniform.

04:25

We can't make these guarantees if we're passing an entire data set to a function, saying, hey, here's a little data set, do what you need to do, because we're trying to create an abstraction, or to allow unbridled reads from within those functions, as can exist in Function as a Service.

04:44

So Function as a Service, again, abstracting over communication, the message comes in, the function does some thinking, reads some data, and the message comes out. Stateful Serverless, we do the same sort of thing. The message comes in from a user or another system, the function does some thinking, and the message comes out.

05:02

But also, we are sending state in at some point in time, probably at initialization time, we're sending state in, and the user function is then holding its individual state on behalf of whatever domain it's serving. It's able to make the decision without having to talk to a database,

05:21

and then when it makes a decision, the new state goes out somewhere, because it will need to be re-instantiated at a future point in time, or it might have to be re-instantiated because it's on an unhealthy node, and it has to be re-instantiated somewhere else. So we've really just introduced this concept of state, but that's not quite enough.

05:40

Again, we can't pass the entire data set in as part of this flow, we have to figure something out. It's under cloud state. So cloud state is distributed, clustered, and stateful cloud runtime, providing a zero ops experience with polyglot client support.

06:02

What we'd like to say is essentially serverless 2.0. It's open source, best of breed, harnessing all the power of open source technologies, while removing the complexity as much as possible from things like Kubernetes and whatever database you're going to be using, be it Span or be it SQL, NoSQL.

06:23

We really just lift it up to make it so developers don't need to think about the ins and outs of all these things, you leave it to this platform. So you wouldn't worry about the complexity of distributed systems, high scale systems, managing your service meshes, your databases, the state,

06:41

how does the state get to the function? Those things are all matched for you. Routing, recovery, failover, all those things are inherent. And then operationalizing and running your applications. It's really just a matter of hooking into a CLI, into your build process, and it automatically will go into whatever environment

07:02

and be running. And then you'll have all the benefits of a stateful platform that is elastic and scalable and all this. So some of the technical highlights of cloud state is it's polyglot. Any computing language that has GRPC capabilities is fair game

07:20

to be a client for cloud state. So no longer do you need to have a team that is comfortable in a language and you need to find platforms for that language. This is a language agnostic platform. Everyone should be able to play. And I think it's important enough that that's a really important concept.

07:40

It's got really great state models. Event sourcing, that's really important for us. I alluded to the fact that you can't pass the entire data set in. There's one useful constraint that we found to make this all possible. And that is event sourcing. I'll talk more about that in a few. Command query responsibility segregation, which we're also calling domain projections.

08:02

So your reads are separate from your rights of your system. So your events are modeled and the events are the events. And then any number of interested parties could take those events and paint whatever picture across the system they want the synchronously. Key value store, create update, create read update, delete.

08:20

And in advance, one of the advanced topics I find is CRDTs, conflict free replicated data types. If you're not familiar, they're a highly available distributed sort of a read source. And it's multiple structures that keep in sync. So when you go to read something, it'll be in memory.

08:42

And if you're talking in a cluster, it'll be very highly available in almost every single node that you're running. And we're also poly DB. So it's whatever database you choose, it'll hook into cloud state seamlessly. So at a high level,

09:01

the technologies that we're using are ACCA open source concurrency toolkit, gRPC, which is the way we're able to have a low level communication between the cloud state mechanisms and internals to whatever language you're implementing it in, as well as a contract with the outside for your users.

09:23

You could have people call into cloud state services using gRPC or rest or anything else. Knative growl, growl is important because across these different languages, some of them are JVM languages. They have a little bit of baggage. There's garbage collection, things like that.

09:41

We need to be able to compile everything with a native image that will be able to guarantee sub-second startup of pods and Kubernetes, which really again, gives us these guarantees that we could be elastic and quickly scale up new nodes. And all this of course, running on Kubernetes.

10:03

So I alluded to the fact that we've got a useful constraint and this is by a theologian, believe it or not. And he said that freedom is not so much the absence restrictions as finding the right ones, the liberating restrictions. So sometimes a restriction can actually set you free as we think in this case.

10:22

So this constraint for us is event sourcing for cloud state. So some of the benefits of event sourcing, it's a single source of truthful history. It allows for this memory image, this durable state running inside of some encapsulation.

10:41

In this case, it's a cloud state we call entity or CRDT or CRUD entity. And it allows the building of the state from the events over time because events are a time series. It avoids object relational mismatch. A lot of that's also in combination with CQRS,

11:02

which is the way to separate your reads from your writes. I don't know how many of us have gone and designed a system and we've laid everything out according to domain and we're very proud of it. And then the UI people come over and say, hey, I need this and I need that. And we just pollute our domain. The read and write concerns of your system are two completely different things

11:21

and they were equally important. One shouldn't affect the other. And it allows subscriptions to the state changes. So you subscribe to events and the event is useful for different parties, for different reasons. I like to use the term state is in the eye of the beholder. You could have a state of something,

11:41

let's just say an airline flight and there's all kinds of characteristics of flight, but ground control cares about very different things than in flight control. So it's important to bear that in mind. State is not something to be shared across different processes. It has mechanical sympathy. You're only ever appending with events.

12:03

So this is how event sourcing works with cloud state. We have our user function. We're also calling then an entity. The entity is that holder of state. Now, when you instantiate it, the event log is replayed. So all the events in the past on behalf of this entity are replayed to it

12:21

and it's building up its current state. I don't know if I talk about snapshots here. I probably don't, if you have the question. If there are a real lot of events where there's a concept of snapshot, you start with a state snapshot and then you overlay the events since, just in case. So the events come in, build up your state,

12:41

and now you're ready for business. Your commands are coming in. Somebody's saying, hey, add a contact to a customer and you're looking at your state. You're saying, okay, I can do that. You add the contact and then you say, hey, I added the contact or I'm about to add the contact. And then the event is contact added

13:01

and it's now in the system of record. And when you instantiate again, that'll then be played back to you so you can then build up that state. So you'll see that contact in memory in the future. And the state will also be reflected inside the entity that has just written the event. It'll update a state in memory as it does so.

13:21

So the happy path for one of these functions is the user issues a command to do something on the domain. It goes into a mailbox. All of these entities are bounded by a mailbox. So there's no issues with concurrency. There's no blocking at all. These functions fully process a message and event it out

13:45

before even thinking about going in and getting another message off the mailbox. And so that command does some thinking, looks at its state, issues a new event, which goes to the event log, which may be subscribed to through some event bus. Now let's talk about the unhappy path.

14:05

So the sad path, this is recovering from failure. So we have our event log, we're replaying our events. It's actually building up state in the function. And now we're ready for business again, in comes our command and out goes events.

14:20

And you can also do CRUD. So in some cases, event sourcing, CQRS, CRDTs, they're all pretty advanced concepts. You might have just a subsystem, which is just a user, maybe a user and a phone number or something like that. How many things do you really need to do on that? Does it really need events?

14:40

You can use the current state model and you can use CRUD, we can handle that. So in that case, we just use snapshots. Your snapshot comes into you, you put it in memory, and then you're processing messages and you're sending the snapshot back out every time. So what is the architecture for something like this?

15:01

So again, we're running on Kubernetes and we've got a series of pods that represent your user functions. So you've got replication here. You can have any number of these as required. If you're running one user function, you probably would never need more than a couple pods, one pod, but you can host multiple user functions in one image.

15:22

And therefore it's useful to have more than one pod and you can scale it up and down. So your user functions live on these pods in whatever language you've implemented, communicating via gRPC. And then we have the cloud state proxy, which is the Akka sidecar on these pods, which spans these pods. And so Akka is actually receiving the messages from the users

15:44

and it's communicating to your user code via gRPC. Your user code is doing all the logic, all the thinking. It's just all of the traffic control and the ability to write the events and to be able to play back your state.

16:01

All that is on the left side. User functions doing all the business logic. And the Akka sidecar is also communicating in real time to the data store whenever necessary or synchronously. So that Akka sidecar lives on each pod alongside your user code. It spans those pods,

16:22

but it is also a cluster in itself. So Akka cluster exists for your application, which is a series of these functions. You've got a cluster that could be expanded and contracted to do its work. And so that's how the communication location, you might have a user function on pod three,

16:43

that all of these are singletons really. They might be represented by your user function in multiple pods, but in Akka you've got a cohort, which is a persistent actor, which is also a singleton. And so it needs to be quickly located within that cluster across these pods.

17:01

Akka takes care of all of that. So again, GRPC communication, gossip, location routing to wherever things are located, talking to your data store, all that's happening. So if we look at cloud state as a managed service now,

17:22

you could pay as you go, as you can with the function of the service. On-demand instance creation, passivation failover, auto-scaling up and down of pods, only paying for what you're using at any given time, just like function as a service. Zero ops, so automated,

17:40

really automated everything. All that state failure, provisioning, routing, deployment, all the upgrading, canary deployments, things like that, all would be part of the platform. A little bit about multi-tenancy. In my opinion, function and service has inadequate bulkheading.

18:03

Maybe not in all cases, but I know it has happened in AWS, where your neighbor's function can have resources. In cloud state, if you're doing Kubernetes correctly, if your hardware is set up right, you've really got this clean separation of things via the pods and really good bulkheading.

18:22

And even at the data level, where you're assigning different databases to different tenants, you don't share big databases across tenants. I think that's probably the wrong way to go. And complete security, to the extent that Kubernetes has security, do these clear separations.

18:42

So quick look at what a three-tier architecture looks like, what we're so used to, what we call a stateless application, looks a lot like this. You've got the middleware in the backend running in the middle there. In a number of pods, you've got a load balancer in the front on the left there,

19:00

and you've got a big database on the right. Every single request will have to go to the load balancer to one of the pods, one of the nodes in the middle. It's gonna definitely have to hit a database at least once. It's probably gonna have some chatter in the middle. It's probably gonna hit the database again after,

19:21

and then it's gonna return some data. So it's very noisy. Noise equals risk. Reactive architecture is a lot different in that your database is still there. It's needed as an event log, but it's not needed in real time. The database interaction is never needed

19:42

for your functions to do their job in real time. The data is already in the functions, the state's already in the functions. They're doing all the work, and when they've done some work, they say, hey, by the way, database here for next time. So you really have a much reduced risk in a noise factor here.

20:02

So just a very high level of the architecture. From the bottom up, you've of course got any number of, if it's a database, it could be part of a cloud state instance. Spanner, NoSQL, NoSQL, all running on Kubernetes. Knative, I'm gonna have to take that box away.

20:22

We're not actually utilizing that right now. We are utilizing GraalVM for native image and very hard utilization on Akka because Akka's clustering capabilities and its persistent actors are the underpinnings of the cloud state entities.

20:41

Now, above that are the actual methodologies we're able to create with that, which is the event sourcing, CRUD, domain projections, which are views based upon events happening across your system that are kept in sync and built for you by cloud state. That way when you go to read for a display, it's already waiting for you in memory

21:02

or in a record in a database somewhere. Key value store, similar to Redis and these conflict free replicated data types. If you have the need, you know what it is. CRDTs and then all these languages plus plus across any language that has GRPC can be supported.

21:20

Istio for your load balancer and then any mainstream communication protocol, GRPC, HTTP REST, Kafka, what have you will work. So let's look at some code. I'm gonna make a little admission before I go into this. I did a lot of Python. It was in health and wellness. It was for loading a new system

21:43

with billions and billions and billions of records. That's the extent of my Python expertise, but I did take some Python sample code from our Python cloud state library. So you can glean something from it a little bit, but please do check out the GitHub repo. When I'm finished, I'll give you the link.

22:00

So to be able to have a cloud state application, regardless of what language you're gonna be implementing it, you've got to set up your protobufs for the GRPC protocol. This is gonna be all the behavior of your application at a functional level. So in this case, we're talking about a shopping cart and the shopping cart, we're gonna model the messages

22:21

for interacting with a cart. The one thing you'd like to do with a shopping cart is add a line item and it would have a product ID and name and a quantity and a user ID. If you see that, we're also marking it as a cloud state entity key. So any entity in cloud state needs to be sharded

22:41

with a unique key. So what you're doing is instead of having a database on disk that has a unique key, which would be user ID, you've got these distributed functions in memory in a cluster that are sharded by user ID. So it's a similar concept to a database table. If you're not familiar with protobufs,

23:01

you'll see that when you say string user ID equals one, you're saying that the data type of course is string. What am I calling it? And then what is the ordinal of it? One, two, three, and four, the ordinal of these attributes. Remove line item. Again, you have to tell us the user ID, which is the ID of the function.

23:20

And then what is the mandatory attribute for removing a line item? It's a product ID in this case. There's the ability to get the shopping cart if you'd like to view it. In that case, you just need to be able to give the user ID key. You have line items here, which are part of a cart. And we model that here with line item,

23:42

and then it's used as a repeated collection of items inside of cart. And so now we can have our service that uses those messages in our GRP service called shopping cart. So we can add an item, and we can remove an item,

24:01

we can get our cart. If you're not very familiar with protobufs, this is a really cool feature. So out of the box, if you do a cloud state implementation using Python, any other language, you're going to implement this gRPC backend for this service. What you're going to get for free here is if you include this optional section, you're going to get rest just by default.

24:23

So you get gRPC and rest at the same time. That's what the meaning of those things are. And then there's also another file here, which is actually your domain. So you can model your domain objects

24:40

also again in protobuf. Makes them much easier to share and return as data when you have callers. So again, line item, an item added events. So I talked about event sourcing. This is our event for item added. It's got a line item inside of it. This is our event for item removed below that. And then we also have our shopping cart state.

25:02

So we'd like to be able to return that to a user. So we just have a message called cart repeated line items. So this is what it looks like in Python to actually model one of these now. So it's not a real lot of code. It's a shopping cart. You're going to be able to have billions of them in memory,

25:20

given that you've got enough cloud hardware in place and cloud state installed. It's fairly guaranteed that you could model the world. There's no limitation if you've got enough of these nodes. And all you need to do is mark it appropriately as an entity, a data class. There's a little more code in the sample that I think you should probably take a look at

25:42

where it specifies what this underscore shopping cart and the file descriptor are. It's a little bit busy. But this is some of the code, what it looks like to create a shopping cart in there. You'll see that you'd like to snapshot it. This is a callback function that is going to actually call for your snapshot

26:01

when appropriate. If we look further, we'll see that there's a snapshot handler, handle snapshot. This is your callback from the cloud state saying, hey, here's your cloud start, your state. And you'd set your state in your cart, your internal state to that. And then you'll have the event handlers item added.

26:23

You would then every item added would be called back into you and you would add them to your state. So it's all very callback based. You just implement these annotations and you're in business. And remove item if you're interested, that's what remove item would look like. You take your item out of the cart

26:41

and you return empty. So on behalf of the cloud state team, we'd like to say thanks. We'd love to see your interest. We are always looking for contributors. Any questions you might have, we'd love to hear about them. The full sample, I'm gonna leave this up for a few secs. This is where the Python support

27:02

with the shopping cart sample exists. I encourage you to pull it down from GitHub and take a look and run it and play with it. So that's all I have. And let me start my volume here.

27:22

Thank you very much. Okay, so we have time for questions.

27:40

There's a first one. It's for Andre. He's asking if it only runs in Kubernetes. Yes, yeah. Kubernetes in our experience is the way the world has gone. We made a little bit of a bad bet using DCOS some years ago and it's very clear that Kubernetes

28:02

is where the world is going for cloud. Okay, so any other questions? If anyone wants to use a microphone to ask a question, we can also do that. Just raise your hand, click the raise hand button and I can enable that

28:23

or just click in the Q&A and ask a question. Let me check on the channel just in case. Okay, so last call, no takers.

28:46

All right, thanks everybody. Yeah, so thank you very much. Thank you for presenting and enjoy the rest of the conference. See you. All right, thank you.