Stairway to Cloud: Orleans Framework for building Halo-scale systems - TIB AV-Portal

Stairway to Cloud: Orleans Framework for building Halo-scale systems

00:00

7

Norwegian Developers Conference (NDC)

Formal Metadata

Title

Stairway to Cloud: Orleans Framework for building Halo-scale systems

Title of Series

Number of Parts

96

Author

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/51847 (DOI)

Publisher

Norwegian Developers Conference (NDC)

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Orleans is the next most popular open source project of the.NET Foundation after CoreCLR/CoreFX/Roslyn. It wascreated in Microsoft Research and is now developed within Microsoft Studios. Orleans has already redefined how modern scalable interactive cloud services and distributed systems should be built by introducing theVirtual Actor model. Orleans has been running in production since 2011 powering high throughput cloud services forHalo Reach, Halo 4, Age of Empires, Skype, Azure Monitoring, and several other Microsoft products. It is a bedrock technology for the cloud services ofHalo 5: Guardians. Since the public preview at Build 2014 and going open source in January 2015 Orleans got used by a significant number of customers ranging from small startups to large enterprises. At the same time, it attracted a group of talented engineers from companies around the globe that have formed a vibrantcommunity around the project. The Orleans core team maintains strong ties with Microsoft Research to keep the stream of innovations going. Come hear how you can leverage Orleans today, what's been recently added to it, what new functionality is coming soon, and about our future plans.

Speech

Text

Image

00:00

Principal idealSoftwareSoftware frameworkBuildingSystem programmingSoftware frameworkStreaming mediaImmersion (album)BitScaling (geometry)DemosceneGoodness of fitGame theoryHand fanComputer animationSource codeMeeting/Interview

00:52

Real-time operating systemUniverse (mathematics)InformationMultiplication signScaling (geometry)NeuroinformatikExtreme programmingGroup action2 (number)Real numberServer (computing)Cloud computingPhysical systemPower (physics)Order of magnitudeService (economics)InfinityDebuggerThread (computing)Concurrency (computer science)Form (programming)CodeStreaming mediaVirtual machineDeadlockLoginSet (mathematics)Game theorySoftware testingMusical ensembleRight angleTunisCodecLevel (video gaming)Array data structureNormal (geometry)Term (mathematics)PlastikkarteDependent and independent variablesComputer animationLecture/ConferenceSource codeMeeting/Interview

04:23

Model theoryComputer programmingCloud computingCovering spaceTerm (mathematics)Open sourceView (database)YouTubeData managementCloud computingBookmark (World Wide Web)Programming languageWeb serviceCASE <Informatik>Military baseGraph coloringPoint (geometry)Connected spaceVirtual machinePartition (number theory)SoftwareFacebookLevel (video gaming)Reading (process)Disk read-and-write headInformation technology consultingCodeState observerScaling (geometry)Formal languagePerspective (visual)Factory (trading post)DatabaseTunisComputer animation

07:27

StapeldateiInteractive televisionUDP <Protokoll>Local ringRemote procedure callSpherical capHill differential equationScale (map)Multitier architectureData storage deviceCache (computing)Cloud computingModel theoryPoint (geometry)CodeComputer programmingMultiplication signCache (computing)Software developerFacebookData storage deviceMyspaceArmstrong, JoeCloud computingReal numberBridging (networking)Operator (mathematics)Denial-of-service attackModel theoryClient (computing)Physical systemServer (computing)Web 2.0Process (computing)Multitier architectureDebuggerSpherical capScaling (geometry)Erlang distributionPattern languageEnterprise architectureFlow separationTheoremSubject indexingPartition (number theory)Goodness of fitConsistencyStapeldateiProduct (business)Power (physics)Category of beingTerm (mathematics)Complex systemSystem callStructural loadPoint (geometry)Key (cryptography)NeuroinformatikDatabase transactionFundamental theorem of algebraTriangleRight angleAxiomDifferent (Kate Ryan album)Division (mathematics)AuthenticationConnected spaceGame controllerQuicksortSequelLine (geometry)Service (economics)Front and back endsState of matterDatabaseTunisAxiom of choiceUser profileProjective planeSemiconductor memoryRange (statistics)SoftwareLevel (video gaming)TwitterReading (process)Channel capacityComputer animation

15:51

Cloud computingModel theoryPoint (geometry)CodeData storage deviceMultitier architectureField (computer science)Message passingAddress spaceVirtual realityServer (computing)Dependent and independent variablesPhysical systemPhase transitionModel theoryCloud computingKey (cryptography)CASE <Informatik>Interface (computing)Directed graphIdentity managementType theorySystem callDependent and independent variablesPrice indexWindows RegistrySoftware developerCondition numberConcurrency (computer science)Product (business)Subject indexingAnalogyResultantErlang distributionCycle (graph theory)SpacetimeProgramming paradigmDescriptive statisticsWordExecution unitSocial classCodeImplementationUnit testingDebuggerQuicksortModule (mathematics)Task (computing)Machine visionOperating systemInheritance (object-oriented programming)Fundamental theorem of algebraVirtual memoryRevision controlNetwork topologyVideo gameComplex (psychology)Scaling (geometry)Web pageCartesian coordinate systemPlanningGame controllerMessage passingComputer fileLocal ringDistribution (mathematics)Object modelCoordinate systemDefault (computer science)Process (computing)Data storage deviceVirtual machineRun time (program lifecycle phase)Semiconductor memorySingle-precision floating-point formatFacebookMultiplication signDivisorArtificial neural networkCloningProjective planeStructural loadSoftware bugGoodness of fitCuboidDirection (geometry)ScalabilityService (economics)Line (geometry)2 (number)String (computer science)Different (Kate Ryan album)Library (computing)AxiomTwitterSimilarity (geometry)Special unitary groupArray data structureOperator (mathematics)Software testingTouch typingArmstrong, JoeComputer programmingMultilaterationFluid staticsFault-tolerant systemComputer animation

24:15

Single-precision floating-point format8 (number)Multitier architectureData storage deviceServer (computing)Run time (program lifecycle phase)SpeicherbereinigungData typeVirtual machineFinite-state machineInheritance (object-oriented programming)Semiconductor memoryComputer programmingWindowConfiguration spaceStress (mechanics)Arithmetic meanType theorySystem callGame theoryState of matterFrequencyMessage passingSerial portMultiplication signPoint (geometry)Line (geometry)Data storage deviceConstructor (object-oriented programming)Different (Kate Ryan album)CodeError messageRun time (program lifecycle phase)DemosceneCondition numberReflection (mathematics)QuicksortSoftware bugSynchronizationVariable (mathematics)Concurrency (computer science)Process (computing)Computer hardwareDependent and independent variablesPhysicalismLogicNormal (geometry)Interface (computing)Library (computing)Social classObject-oriented programmingVideo game consoleGame controllerThread (computing)Cartesian coordinate systemData managementSingle-precision floating-point formatParallel portOverlay-NetzResultantLocal ringCompilerCovering spaceModel theoryAnalytic continuationSeries (mathematics)Proxy serverAsynchronous Transfer ModeTable (information)MathematicsScaling (geometry)Form (programming)LoginPartition (number theory)Remote procedure callObject (grammar)Streaming mediaCuboidIdentity managementWritingInformationUniform resource locatorElement (mathematics)Electronic signatureMeasurementSoftware testingMechanism designQueue (abstract data type)Software developerClient (computing)Domain nameEvent horizonMobile appPattern languageFilm editingParameter (computer programming)Exception handlingSet (mathematics)CASE <Informatik>Cloud computingDefault (computer science)Design by contractIP addressService (economics)Right angleQuaternion groupReplication (computing)User interfacePhysical systemPairwise comparisonComputer animation

32:39

Parameter (computer programming)Local ringDistribution (mathematics)Semantics (computer science)Task (computing)Process (computing)Ideal (ethics)Lattice (order)Model theoryAsynchronous Transfer ModeCAN busElectric currentMotif (narrative)Category of beingLogicMessage passingService (economics)System callException handlingVirtual machineTelecommunicationMultitier architectureDebuggerWeb serviceIdentity managementType theoryValidity (statistics)Dependent and independent variablesLevel (video gaming)CodeLattice (order)Run time (program lifecycle phase)PhysicalismTask (computing)Parallel portTable (information)Server (computing)Hand fanSeries (mathematics)Maxima and minimaRight angleElectronic mailing listLine (geometry)Web 2.0Different (Kate Ryan album)Theory of relativityGame theoryTwitterParameter (computer programming)Data managementGraph (mathematics)Internet der DingeSocial classState of matterDeclarative programmingErlang distributionSynchronizationClient (computing)SpacetimeDescriptive statisticsError messageFacebookStudent's t-testNeuroinformatikLibrary (computing)Process (computing)Software developerIdeal (ethics)AbstractionPhysical systemMultiplication signMereologyResultantComputer programmingCartesian coordinate systemWeb pageBlock (periodic table)FehlererkennungPattern languageCore dumpInheritance (object-oriented programming)2 (number)CodecCASE <Informatik>BootingLink (knot theory)Design by contractCache (computing)Matching (graph theory)BuildingGraph coloringBitInternet forumFluidDefault (computer science)GradientPropagation of uncertaintyModel theoryDatabaseCategory of beingSingle-precision floating-point formatArchaeological field surveySemantics (computer science)QuicksortMobile WebWritingMobile appComputer animation

41:03

Parameter (computer programming)Category of beingLink (knot theory)Table (information)Message passingDirected setEvent horizonQueue (abstract data type)Streaming mediaVirtual realityBlogParallel portCodeScheduling (computing)Partition (number theory)Adaptive behaviorCache (computing)TunisCartesian coordinate systemEnterprise architectureGroup actionProduct (business)Drop (liquid)Streaming mediaForcing (mathematics)Service (economics)Client (computing)Direction (geometry)Multiplication signComplex (psychology)Virtual machineQuicksortIntegerDecision theoryProjective planeFlow separationPoint (geometry)NamespaceIdentity managementElectronic program guideAxiom of choiceRevision controlCodeSeries (mathematics)Event horizonStapeldateiDependent and independent variablesCategory of beingRun time (program lifecycle phase)Mechanism designQueue (abstract data type)Different (Kate Ryan album)Scaling (geometry)Partition (number theory)Internet service providerData storage deviceRemote procedure callMessage passingAttribute grammarState of matterLatent heatSingle-precision floating-point formatWritingVirtualizationConfiguration spaceSemantics (computer science)Interactive televisionModel theoryTable (information)WindowInternet der DingeInformation securityFront and back endsUniverse (mathematics)System callSmoothingExpected valueSoftware bugGoodness of fitDebuggerLink (knot theory)Game theoryoutputString (computer science)Standard deviationConnected spaceLine (geometry)Procedural programmingLoginPattern languageSynchronizationSequelBlogComputer animation

49:26

PressureInternetworkingInteractive televisionContent (media)Cloud computingGraph (mathematics)Data storage deviceCollaborationismScale (map)CodeIntegrated development environmentPressurePhysical systemProjective planeNear-ringContent (media)Latent heatComputerVirtualizationService (economics)Cartesian coordinate systemStructural loadSimilarity (geometry)Different (Kate Ryan album)Analytic setThermische ZustandsgleichungReal-time operating systemGame theoryCheat <Computerspiel>PlastikkarteData storage deviceCloud computingDrop (liquid)Green's functionMultiplication signAreaGoodness of fitPrimitive (album)LogicScalabilityType theoryPattern languagePower (physics)Honeywell-HoldingOpen sourceAngleScaling (geometry)Decision theoryReading (process)Interactive televisionEntrainment (chronobiology)Computer animation

52:42

Modal logicCore dumpProjective planeOpen sourceExpected valueProfil (magazine)Computer animation

53:51

Cloud computingModel theoryInternet service providerOpen setProcess (computing)OrbitCloningImplementationVirtual realityCollaborationismSingle-precision floating-point formatComputer programmingExtension (kinesiology)CodeNumberBernstein polynomialGraph (mathematics)Elasticity (physics)Scale (map)Natural numberNumberNatural numberModel theoryData storage deviceGraph (mathematics)OrbitService (economics)Open setView (database)CodeTouch typingLinearizationPartition (number theory)Multiplication signRemote procedure callSocial classSerial portInternet service providerDatabase transactionACIDIndependence (probability theory)Group actionOrientation (vector space)CloningMathematical optimizationLogicOperator (mathematics)Row (database)Projective planeCartesian coordinate systemConfiguration spaceImplementationLibrary (computing)Data centerProduct (business)Similarity (geometry)Different (Kate Ryan album)Gene clusterVirtual machineTheory of relativityData managementComputer hardwareDatabaseContext awarenessObject (grammar)GradientField (computer science)Projektiver ModulPower (physics)Physical systemComputer programmingLocal ringDistanceOpen sourceCASE <Informatik>Distribution (mathematics)MultiplicationSoftware developerSingle-precision floating-point formatLink (knot theory)ScalabilityComputer animation

01:00:21

Computer hardwareMeasurementSoftware testingInformationElement (mathematics)Line (geometry)Model theorySingle-precision floating-point formatProcess (computing)CASE <Informatik>Message passingMechanism designStreaming mediaService (economics)Replication (computing)Virtual machineState of matterConfiguration spaceDomain nameLogicMobile appConstructor (object-oriented programming)Software developerDependent and independent variablesSemiconductor memoryCartesian coordinate systemDifferent (Kate Ryan album)Limit (category theory)Pairwise comparisonIntelligent NetworkPattern languageMathematicsQuicksortTable (information)Scaling (geometry)CodeEvent horizonQueue (abstract data type)FrequencySet (mathematics)System callException handlingXMLUMLComputer animation

01:06:51

JSON

Transcript: English(auto-generated)

00:04

OK, welcome everyone. So I'm going to be talking about Project Orleans, this Orleans framework that I've been dev lead since the beginning. So it's very dear to my heart. Who in the audience ever played the Halo game?

00:20

Oh, that's good. So the experience you're dealing with is very different from what's behind the scenes. So instead of explaining what I mean by saying Halo scale, I have this video that I'd like to show you at the beginning, also to wake you up a little bit. So let's see if it plays.

00:42

Halo is a rich, immersive story with millions of loyal and dedicated fans. We deliver an exciting and engaging experience to these fans. They need to know what the hot playlist is today. They need to know what the challenges are. They need to know where their friends have been, what their friends have been playing. Have their friends gotten more medals than them? They need to know all of this, and they need to react to it and interact

01:01

with their friends in real time. We need to deliver hundreds of thousands of updates per second to millions of players across the Halo universe. We need to get them the right information to the right device at the right time. There was nothing off the shelf that solved the problems we needed to solve at the scale we needed to solve them. So we turned to Microsoft's Extreme Computing group.

01:21

Hundreds of thousands of requests per second across thousands of servers in real time? These guys are crazy. But in Extreme Computing, those are the kind of challenges we like to tackle. As you can probably tell, that video was a couple of years, maybe a couple of years before, so I was younger.

01:41

But I think who gives a very good idea of what actually we're talking about. We're talking scale of Halo and those kind of services. So we're gonna be talking about the cloud, obviously, and people give these definitions of the cloud. By the way, we're also gonna be playing the game. We're gonna be playing the game Name the Tune. Who knows Name the Tune?

02:02

No? So when you see in the top right corner sentence in quotes, if you know what song it's from, or at least what the band that played it, just yell it. And somebody who gets the most answers right

02:21

will get beer at the party. So there's a prize, so just yell. This one's the hardest one, I promise you. Anybody knows anybody who's got the power? No, that's actually from David Bowie. Actually, that's also my test for the age of the audience, just to get a sense. Well, no. No Justin Bieber, no Taylor Swift, no Abba either.

02:46

You don't want me to sing on stage next time. So when we talk about the cloud, really the essence of the cloud is that you get these enormous resources available to you to rent. So that's why everyone's got the power

03:01

to get almost infinite amount of resources. So as long as you have a credit card, you can charge to pay for the services. This power has been available to major corporations or governments for a decade, but now anybody can do it. A small startup can suddenly grow from nothing to called unicorn. I hate the term, but they call them unicorn, right?

03:21

But with great power, as they say, comes great responsibility. So to build systems at that scale, you face new challenges, or old challenges in a new form. Like for example, concurrency. Who in the audience enjoys debugging multi-threaded code and data races and deadlocks? I don't, I'm just kidding.

03:42

But now, who likes to do that on distributed setting, when you have logs from, say, 20 machines and they're trying to figure out what happened? That's the order of magnitude more difficult than just attaching debugger and finding that deadlock or the deadrace. So you have these issues of distributing your computations, concurrency, scale.

04:01

Failures are the norm, right, in the cloud. What used to be happening maybe every few years or a few months, now those failures happen every day, depending on your scale, because machines get rebooted, they get patched. You see it as a failure oftentimes. So there is a set of new challenges that we haven't seen before.

04:21

And when businesses look and try to figure out what to do with this, all that glitters is gold, name the tune. Thank you, that's great, one point. So you hear this cacophony of this analysts and consultants and talking heads saying,

04:44

here's the solution. Like for example, a few years ago, people were saying, you see, Facebook was built with PHP and MySQL. So if you use these technologies, you can build anything, right? They built Facebook. So even a few years even before that, web services and SOAP, they were supposed to solve all the problems in the world.

05:01

All good technologies, don't get me wrong, these technologies are fine, but when somebody says that this technology will help you build a cloud-scale solution, I look at it as they're trying to sell you this elevator or if you watched Willy Wonka and the Chocolate Factory, like Wonkavator, where there's a button and you go up and out, that solves all the problem.

05:20

Like for example, Go, right, is the new, the hipster language, programming language. I don't know why it's... Because Docker is written in Go, so again, if you write in Go, you need to learn Go and that will solve your problem. Of course not, that's not the case. And then you see other comments like, oh, you have to be stateless

05:41

or observation that micro sources as a term, a good term, a good architectural term, got abused too fast. And this is my favorite, so Mary Jo Foley, thanks, Mary Jo, she said that the release would solve all the cloud problems back in 2010. That's my favorite one. But then you see this picture.

06:03

Who has heard Kyle Kinsbury talking about Jepson, Call Me Maybe? Great, so if you have never watched, go to YouTube, search Kyle Kinsbury, Jepson, Call Me Maybe. You will not regret. Everyone who deals with the cloud has to watch this talk. It's a brilliant guy.

06:21

He just single-handedly showed that all this, pretty much all open source distributed databases that are available, they all don't maintain their guarantees in case of network partitions. He got his beefy machine in his apartment and run all this commercially available open source software inside of VMs and he recorded reads and writes

06:42

to these databases while he was partitioning connections between those VMs, simulating actual network partitions and node failures. And he shows that every single one, like on MongoDB, Redis, Elasticsearch, all these technologies break down and violate, lose data, violate their guarantees.

07:01

So he showed this picture of tire fire. And he explains that at the top of it, the API level of the database, you have this rainbows and unicorns. Everything's fine from the API perspective. But if you look underneath under the cover, there's this tire fire of code that doesn't really maintain its guarantees.

07:20

So you look at that and it's very hard to decide what to do. That's the reality of our industry. In my view, if you stack back, there's this triangle of really concerns. You have compute, you have state, and you have connectivity. And there are many choices. Like you have to make these trade-offs. Who are you, what have you sacrificed in the tune?

07:43

Jesus Christ Superstar. Because you need to sacrifice something to get something. For example, batch processing is very efficient. If you can afford high latency, if you can process within minutes, hours, you can be extremely efficient by putting a lot of data and processing in the MapReduce way with Hadoop.

08:00

But if you need sub-second latency, that doesn't work. You have to sacrifice this efficiency for low latency. And these challenges and trade-offs go on and on. Like databases, SQL is very good at transactions and guarantees, but doesn't scale well. So key value stores are very good at partition and scaling, but they don't provide usually secondary indexes that SQL does for you for free.

08:22

So again, you need to get something, you need to sacrifice something to get something. I just highlighted what we were concerned with in the Project Orleans. But then, if you've heard of a CAP theorem, I hope everyone heard of the CAP theorem that says that you cannot get consistency and availability at the same time in the distributed system.

08:40

That's pretty much the axiom. So this is the real challenge we deal with when we talk about the cloud. And the solutions are different, right? So we can hire hero developers. Years ago at Microsoft in developer division, we had a different term. We had Einstein developers, the category.

09:01

These are people that can build very complex systems. So somebody built Google, somebody built Facebook, somebody built MSN and Hotmail and those kind of systems. So it is possible to tackle these challenges and to build stuff, but those developers are rare and they're expensive and they're all happily employed. So if you try to build business

09:21

by hiring a bunch of hero developers that can solve all these problems, you can run out of budget very fast. But most likely, you won't be able to hire them because for them to leave their job they like and join your company. So in reality, when you try to hire people, you need to look at the available poll.

09:41

Like who here a program in Erlang? Okay, there's a couple of people. Yeah, I know Jan did. Scala, yeah, one person, two people. F-sharp, okay, more, but still a minority.

10:02

So I have really sincere, deep respect for people that master these technologies. Really, like Joe Armstrong is giving a talk, I think about Erlang. But if you look at reality, you can hire people. You cannot find people that have these technologies. Try to hire young to your company, you'll fail.

10:22

But also, even at the hero level, this developer's not immune to make mistakes. And the pattern of successful higher-scale services, if you look at Twitter, if you look at LinkedIn, if you look at Facebook, they have the same pattern where they re-architected and rewrote their system three or four times as their usage grew.

10:42

So they had to throw away the whole solution, essentially, not just incrementally improve, not just refactor, but throw away the architecture and put the new solution in place at the most critical time where the business was growing. And some people argue that that was the failure of Myspace, that the reason Myspace lost competition to Facebook is because they weren't moving fast enough.

11:03

They couldn't scale with their users and their experience suffered. They were too slow. So I would argue it's not a scalable solution to try to hire more than a handful of hero developers for a company. But then we look at the problems as engineers, but if you talk to business people, they look at it from a very clear business lens.

11:22

They see time to market, the return on investment. Those are the terminologies that they use, which means I need to build systems fast, I need to build them cheap, and they need to be reliable so they're cheap to operate. So capital expenditure versus operation expenditure.

11:40

That's why my mental picture is those people are trying to sell you this elevator, one elevator where you push a button and you open the cloud, which is not realistic, it's just up. Oftentimes, a bunch of people that don't know what they're selling or the charlatans that are trying to sell you this bridge to nowhere. In reality, you need like a stairway where you can walk or you can run

12:01

because you're in a competition. If you work and your competition is running, then you're losing the competition. So you have to run to stay in the competition. That's an interesting quote from Alice in Wonderland, where the queen says, it takes all the running you can do to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that,

12:21

which is, I think, was about our business. It's not about Alice in Wonderland, it's not for kids. So that's my mental picture, is we need a stairway, something realistic, not climbing with the ropes, not a magical elevator, but something real. If we step back and see how we've been building services,

12:41

so we didn't call them cloud services a decade ago, but for 15 to 20 years, we've been building them as N-tier, three-tier architecture. This picture must be familiar to everyone, I assume. So you have a stateless layer of front ends, so the web servers that terminate client connections, do authentication, DDoS protection, admission control,

13:04

and then forward the request to middle tier, or several tiers, but still it's a middle tier, stateless, again, middle tier, that goes and talks to storage to pull data in, perform an operation or not, and potentially write data back to storage. So if the request comes for a user profile,

13:21

middle tier call storage, give me profile for that user, and then I do some update and write back to storage. Or maybe I don't even write update, I return back to the front end to render a webpage or respond to the mobile client. So this is a wonderful model, it's beautiful, it's very simple, you can scale easily by adding more servers in the middle tier,

13:41

more servers in the front end. The problem with storage is much more difficult to scale, so especially if you have a database, like a SQL, Oracle database, at some point you exceed its capacity, and it burns out, so you cannot scale. And as an industry, we realized that a long time ago, so we put the solution, the cache layer, in front of it,

14:01

memcache-d, Redis, all those solutions, they reduced the load on the storage, because now, first time you read it, put data in cache, and then after that, you read it from memory, which is much faster, you move data between memory, and you only go to storage to update. But in reality, that complicated the solution so much, now you talk to two storages,

14:20

you have your cache storage, and you have still your call storage, and you need to coordinate that, and you need to write updates to both, and as you probably know, cache invalidation problem is one of the hardest problem in distributed systems, fundamentally. So programming this is not really nice.

14:40

Really, I think this is what we want. We want to have a stateful middle tier, where data would be cached, but also the computer would execute. So this is what I call the stateful middle tier. Have benefits of both. Instead of putting data in cache and running compute somewhere else, can we have them together? Name the tune, anybody knows? Could be so good together?

15:01

No? The doors? So I would argue that's what we want. And that's how we approached Orleans when we started working in Orleans. We really tackled this kind of two challenges. We wanted to have a probity model, which is easy and attainable for a wide range of developers. So you don't have to be a hero developer

15:21

to understand and write successful software with it. But also, we didn't want to turn away expert developers, those heroes. They should like the model as well. And the model needs to be flexible enough and powerful enough to empower those developers as well. So that's the trade-off between simplicity and power.

15:42

But also, we didn't want to make developers 20% or 30% more productive. We wanted to have qualitatively better productivity. And which means three times, five times, ideally 10 times more productive. And the main way we know how to make developers more productive is to have them write less code.

16:01

Because the best code you write is the code you don't write, because you're going to have bugs there. That's paradoxical, but it's true. If we can eliminate code from our code base, we eliminate bugs that we didn't introduce there. So that's what we targeted. Our goal was to reduce the amount of code you write, but also make the code you write much simpler,

16:20

so you're much less error-prone and more productive in writing and debugging it, and testing it. The second pillar of the project was to make this code scalable by default. Which means if you write code following some simple guidelines, there's a good chance you'll build it in such a way that it will scale. So if you have suddenly 10 times more business,

16:41

10 times more customers, or 100 times more customers, your code will work. You may need to tweak if you optimize a few places, but you wouldn't have to go and re-architect and throw away the whole thing, like in those cases, like with LinkedIn and Twitter and Facebook. So those are kind of conflicting goals in a way.

17:03

Who has heard about the actor model? Excellent, so I hope people attended yesterday's talk by Roger Johansson. For those who don't know, you can think of actor model as just the distributed object model. So you have these isolated entities that do not have access, direct access

17:20

to each other's memory. They have to send messages to say, hey, do this for me, or give me this value. And of course, they can create other actors. So the model was invented in 1973 by Carl Hewitt a long time ago, and you can imagine there was no cloud and it was built for a very different purpose.

17:40

So Hewitt invented it as a concurrency model for single machine, single process systems for AI, artificial intelligence application. But as often happens in our industry, nothing is new under the sun. So the approach got rediscovered in the late 80s and 90s by Joe Armstrong at Ericsson.

18:01

They built Erlang, a new implementation of the actor model that they built some control plane systems for their telco equipment. Then later, some distribution features were added through OTP. OTP, in the cloud space, people rediscovered again this model. Because if you think of it, because you

18:21

have these independent entities and they exchange messages, they don't have any assumption of locality. So if I am sending a message from actor A to actor B, I don't assume that they are on the same machine. The implementation of the rhyme time could have been that that's how it's implemented. But fundamentally, the model allows me to run these actors anywhere they want as long as I can deliver messages

18:41

between them. So it's easy to distribute these models, these modules, these actors. So that's what we took as the base approach for Project Kirlines. Name the tune? No? Also the doors.

19:01

So we didn't want to just go blindly and look at the models. We took an independent approach. And we came up with this, what we later called the virtual accuracy. As we work on the system, as we try different approaches and threw away some of the earlier versions

19:20

and work with early customers, we realize there are these challenges, fundamental challenges in existing approaches, an Erlang approach. And Akka is a GVM clone of Erlang. The fundamental difficulty was that in the distributed highly concurrent system, it's very expensive to write code to coordinate this actor.

19:41

So you need to create an actor for a user you the front end receive a request for. But what if your three front ends receive requests for the same user? First, they need to go check, do we have an actor for this user in some registry? So you do it concurrently. And they get a response, no, we don't. All of them three independently realize, I need to create a new actor for this user.

20:02

And of course, they in parallel try to create an actor. And then they need to register in the registry. And all of them but one should fail and should handle this gracefully. Of course, there's a lot of coordination to get right. And of course, that kind of code works fine in the simple unit test. But when you're on the scale, suddenly you have this concurrency and race conditions.

20:21

And that's what we heard from Erlang developers later, that that's actually indeed one of the biggest challenges to build distributed systems with Erlang. So the idea behind a virtual actor is very different. So the analogy is like virtual memory. When you write code to, say, touch or update the value in the array index x, you never

20:43

check with the operating system, is this memory page in memory or is it in page file? You don't write code, you say, load this page file for me and then I'll set the value. You just set the value. And it's the operating system's job to realize, oh, this page was in the page file, I'll bring it, let you update the value. And then once it gets called, I write it back to page file.

21:02

So it's the same basic idea. So all actors in Erlang, we call them grains instead of actors, to differentiate that actors in Erlang are very different from what people used to think about actors. So that's why we call them grains. So those grains, they kind of always exist virtually. So you can always make a call to any actor in the system

21:21

so long as you know identity of the actor. And a call generally will always succeed regardless of whether the actor is in memory or in storage or in the process of being activated. All this complexity of coordination is done by the runtime. So Erlang's runtime really performs the heavy lifting. It's interesting that what we discovered,

21:43

people equated Erlang's approach with actor model save. So when we started talking about Erlangs, the first reaction was what you build is not an actor model, because you don't have supervision trees. So they thought it was an axiom in the actor model that you have to have a supervision tree to be an actor model, which is actually not true, which Karl Hewitt no less

22:02

said, no, that's not the case. His complaint about Erlang was kind of similar. We didn't know that to what Erlang did eventually. So you remove all this complexity of managing lifecycle of these actors, give it to the runtime. As a result, you write less code,

22:20

and you write simpler code. So that's how we're achieving the goal of developer productivity. So let's look how the code looks in reality. When I ask to explain what Erlang's is in one sentence, I say distributed C sharp, which is any kind of two word or one sentence or 30 second

22:41

description is not accurate. It's not about C sharp. It's just distributed dot net. But that kind of works for people, because you're programming with the same paradigm. You have interfaces, and you have classes. So you start with the interface. So you define what we call a grain interface. And you define it by extending one of the marker interfaces.

23:00

In this case, I use I grain with good key, which says actors of this type, the grains, they'll have good as their identity. And then within this interface, you can have one or more methods. One requirement for those methods is that they're asynchronous. They return a task, a promise for a value. Who is familiar with TPL, task, async, await?

23:22

Great. Great. So the majority. Those that are not, I think that's the best way in C sharp 5.0. That's the best innovation. When I talk to JVM people, when I first started talking to them, they didn't believe me that was true. I don't see the code. It's just brilliant how it works.

23:40

So that's the requirement that all calls are asynchronous. Whenever you make a call to, in this case, like hello, world, say hello, the result you get right away. Before anything happens, you get this promise, task over string means a promise for a string that will arrive later. Maybe milliseconds later, it may be seconds later, but later. So you're not blocking on this line.

24:02

That's why one requirement is that everything is asynchronous. So when we invoke the grain, this is an example. I just need three lines. I use static class GrainFactory and say, give me a grain that implements this interface that we defined above for a user with this identity.

24:22

So I pass GUID, and what I get back is under the covers a proxy object, the variable user, which implements the interface that I asked for. It's returned immediately. It's constructed locally. There is no messages involved. And then I can make a call, in this case, user.sayHello

24:41

right away. So the first two lines, they will take probably nanoseconds to execute because they do nothing. You just say, OK, here's a promise for a future result. And then through the magic of await keyword in C-sharp 5.0, can say, execute the rest of the method when that result comes back without blocking the thread.

25:00

So this is very simple. The code looks straightforward and sequential. But in reality, it executes very efficiently because we're not blocking the thread. We're giving up under the covers compiler reaps out the remainder of the method as the continuation and executes it asynchronously later. So that's all I need to write to make a call to a grain.

25:22

And once I get the response back, once the await returns, I can do something with this value. When I implement the grain class, it's also very simple. I extend the base class grain that's in the library. And then I implement one or more grain interfaces that I defined. So again, it looks just like your normal object-oriented

25:43

programming, unlike one-way message passing state machine and things like that. You just implement interfaces and classes. But notice also that this method sayHello has a counter. It increments a counter on the last line. And the reason I can do this without any logs,

26:02

any synchronization, is because every method in the grain executes with a single thread guarantee. So the release runtime guarantees that your code never runs in parallel on more than one thread within a single grain. So you always have full control of your private state. You can always assume that nothing else is touching it.

26:21

So you don't need to put any in the locks, semaphores, any other synchronization mechanisms, which simplifies your code and removes lots of, again, the bugs. That's the way, and that's sort of the reflection of the original idea of the concurrency model of actor model, that you can write safe code. Nobody else will go and touch your variables as you execute.

26:40

Even your own methods will not touch it, because they only run one at a time. So what happens behind the scenes, so grain is kind of a logical construct. It always exists. But the physical incarnation of it goes through this lifecycle. So it can be in persistent storage. And probably most of the time, it's there, not in memory. And only when the call arrives for a particular grain,

27:01

the runtime gets and instantiates a physical incarnation of that logical construct we call activation. Goes through initialization, through activation process, where if needed, blows its state and calls the method that is kind of like a constructor. Say, hey, I'm activating you. Do your initialization. And then delivers the request that triggered activation.

27:23

So for a while, that grain stays in memory. And the runtime checks what was the last time that a grain got touched, this activation of a grain got touched, and got a message to process. If it hasn't been called for a while, and by default, it's two hours, but it's configurable, you can set one minute, five minutes for different types.

27:40

So if it was called, nobody sent the message to this grain, there's no need to keep it in memory. So in this runtime, garbage collects. Again, we go through deactivation process. Say, hey, I'm about to deactivate you. If you want to do something, here's your chance. And then removes it from memory. So that's the model behind the scenes. On the caller side, you program as if it's always in memory.

28:00

But in reality, runtime manages resource and does this distributed asynchronous garbage collection of your resources. And I'll stress again, with no code from the application, maybe configuration how fast, how aggressive you want this garbage collection to happen. So if we go back to this picture with this actor-based middle tier,

28:23

because of this lifecycle, what we're really getting is what's in memory is just a sliding window of all possible grains, actors, only those that were recently used now or within that period before they get garbage collected. An example would be a major game like Halo or Call of Duty.

28:41

They sold probably 30, 40, 50 million copies. That doesn't mean that all those users are active. In fact, you can find very few days in the year when there's more than one million of them playing at the same time. So there is no reason to keep state of 50 million of the players in memory. You can just have those automatically

29:02

that actually turn on their console and then start the game. And as they stop playing the game or shut down their console, their grain will become cold and will get deactivated. So the runtime does this resource management for you for free, again, without you writing application code

29:21

for that. So the Orleans runtime, it runs like an overlay over physical resources or virtual machines. So on every virtual machine that you run in the cloud or in the physical machine if you run on premises, there is one usually process of Orleans runtime called silo.

29:43

And those silos, they form a cluster automatically. And they start pinning each other to see who is up, who is down. If this silo didn't respond to me three times, I suspect it's probably dead. So it does all this magic of tracking hardware status, essentially.

30:01

So if one of the machines blows up, the runtime automatically detects it and understands what grains were running on that machine. So they're gone. They're lost because the machine disappeared. Maybe the physical hardware failure or network cable got cut. There are many reasons why a machine disappears. What's important is that once the runtime,

30:21

this distributed logic realizes we lost this machine, it knows that grains that are running there are not running anywhere anymore. So it can place them when the new requests arise for a grain that used to be there to a different machine. So you can operate with the cluster without that machine for a while. And then if that machine gets repaired or restarts and comes back, it joins the cluster again and becomes another resource to place these grains

30:43

and execute more. All of that is done by the runtime again. So you don't need to write any code. Your individual request may fail. So you make a call to your grain and you may get an error back. And you may get an error back for many different reasons, like storage is unavailable or something else. Or the machine just died in this window

31:01

before the runtime realizes it's dead, you may get a failure. But fundamentally, you can keep repeating this request and eventually it'll succeed once all these conditions get recovered. So you don't need to write code to understand where things run or what state they're on. You just write your code in a simple manner as if they always exist and always in memory.

31:23

So besides Hello World, let's look at something more complicated than that. Let's see, this is a made-up social network example. So I have notion of a friend. So I have this user interface, but I have a method to add a friend myself.

31:43

So notice that in the method signature, I can use IUser as an argument type. So the runtime knows how to serialize these references and how to pass them around without you writing any code. In fact, the compile time we generate serializers to efficiently pass data types

32:01

and preserve them as if nothing happened, as if they're on the same machine. So we define the interface, then let's see how we implement executing this method. So first two lines get two references for two grains, for me and for my friend.

32:20

Like in the Hello World example, just say, give me a reference for this grain with this type or this identity. And then what I do, I just call my grain, say add a friend and pass directly this reference. What's important to understand here is that the reference is logical, it's always valid. It doesn't point to physical machine, physical IP address, URL, nothing like that.

32:42

It just encapsulates type and identity of the grain. So it's always valid, I can save it in a database, I can shut down my system, I can restart it a week later, I can read this record and make a call to this grain and the call will succeed because the runtime will activate the grain and the identity deliver my request and execute it and deliver a response to me.

33:00

So unlike physical references, this is a logical reference, they're always valid. One thing here which is not so obvious is we're making horizontal calls. So these grains deliver in the same level in this middle tier. If you go back to the picture of three tier architecture, if I had the logic for one user

33:21

to make a call to another user, I'll have to go all the way out to the web service and make a call to user service and pass target user ID as one of the arguments, go all the way through front ends to another middle tier server to execute this request. Here, the whole call happens in the middle tier. So just direct communication

33:40

between those grains on the same layer. The other thing that is interesting here, we put try-catch. But as you can imagine, the caller, my grain and my friend's grain can be on three different machines. So how come we can catch an exception here?

34:01

So here's the picture to demonstrate it a little bit better. So say we have a front end to receive a request, made a call to grain A to process this request. As part of this logic of processing, that grain called another grain of a different type maybe on another machine, which in its turn made a call to the grain C to do its part of logic.

34:24

And imagine grain C through an exception. For example, the friend you passed to me is already in your friend list, so you're not allowed to get him twice. Or this person cannot be your friend for whatever reason. So traditionally, what you have to do,

34:41

you have to analyze the return result and then propagate the return result back, and then propagate it back again. What happens in Orleans, if you write no code 0 for error handling here, that exception from C will be delivered to caller B. And if B has no try catch, it will automatically propagate to A. And if A doesn't have try catch,

35:02

it will propagate to the original caller. We call clients of code that runs on front end, not within the grain space. We call them Orleans clients. So this exception will be automatically propagated all the way up with no code. I can put try catch anywhere I want. For example, I may decide to put it in C or in B. But by default, it will be propagated if I put no code.

35:23

And as I mentioned before, the error handling code is the code that usually is the buggiest, because that's the code that's hardest to test. So what we get here is essentially distributed synchronous try catch semantics, the very powerful construct that I can only

35:41

put code where actually needed. In most of these cases, I can do nothing. I cannot retry or do anything to fix an error. I just need to report to end user that your request failed and here's the error code or description received from the exception. So I can do that the front end layer and just render a web page or respond to mobile client.

36:01

So that's actually a very powerful feature of the runtime. So look at another example, still staying within the social network theme. But when you say social network, don't think just Facebook or Twitter or those kind of things. Gaming, a multiplayer game is a social network.

36:21

It's just much more fluid where these relations are formed for a multiplayer session and they dissolve and users join different sessions. This is essentially a social graph. If you're talking about IoT devices, it's kind of the same but much more static. You have sensors, you have rooms, you have buildings. So you have these relations, social graph kind of relations.

36:41

So it's not limited to just a traditional notion of a social network. So imagine I need a method to return the status of all my friends. For example, my stupid UI wants to render a table with friend, status, friend, status, friend, status. And let's say I'm very popular, have 1,000 friends.

37:02

So if I were to do it naively and call one friend at a time, get response back, then call another friend and get response back, even if the latency of a single call is very short, if it's, say, 10 milliseconds, if I call 1,000 friends serially, the minimum latency of the whole series of calls

37:21

would be 10 seconds, 10 milliseconds times 1,000. So of course I don't want to do that. I want to call them in parallel. And that's what's very easy to do to fan out calls in Orleans. And there's two lines of forage where we call friend.getStatus. And remember, getStatus returns a promise, a task for a result,

37:40

which we put in in list right away. So this whole forage, again, will execute within nanoseconds or microseconds. So it doesn't do anything. It just prepares those messages to be sent. And then through the magic of TPL and async await, we can join this, in my example, 1,000 promises into one, the task that will result when all of them get responded to, and then await it.

38:02

So with this one line, we await all the responses. And then once all the responses arrive, we can process the results and render my web page, my stupid table, with friend statuses. So in a few lines of code, we fan out requests and the process results very easily.

38:21

It's just very easy to do these kind of patterns. So an ideal case, or ideal latency, is the latency of a single call. But also notice that, again, we didn't put any multi-threaded code, no blocking. We do nothing here that would be out of the ordinary. So we write as if it's a single process code, a single application running on a single machine.

38:44

But we get a lot of parallelism. So you have enough cores, all these cores will be executed in parallel. So it feels like a desktop app, but actually runs on the cluster. Who's familiar with MPI? There's a few people. So that's a library for very efficient,

39:00

distributed computations. So there's this famous professor, Dennis Gannon. He told me a couple years ago, he said, we don't want to teach our students MPI anymore, because it's very hard to get it right. With Arlenis, it's so much easier to do it. You can implement the same patterns, but with much fewer lines of code, with much simpler code.

39:24

And I was so happy when Karl Hewitt, the inventor of the actor model, wrote this thing last year in his paper. So he said that in his Arlenis couple paragraphs, said that it's an important step in the goal of the actor model, that application programming need not

39:40

be so concerned with low-level system details. So that's exactly what we tried to achieve, to raise the level of abstraction to make developers more productive and code the right simple. And I think I tweeted at the time that I'm ready to retire. And I checked my savings account and decided to stay at work, not ready yet.

40:01

But interestingly enough, in another survey, Karl Hewitt pointed to Erlang's deficiencies, lack of error propagation, which I showed you, which is exactly what we put in Arlenis, not knowing that that was his concern, and also lack of resource management. Those are his two complaints about Erlang. Which you see, without knowing that,

40:22

we implemented exactly those things in Arlenis. There are many features. I just highlight a couple of them. So there's declarative persistence. You can declare a state for your grain class as a property-back class, just very simple poco class.

40:41

And then you pass it as a type argument to the base class grain when you declare your user, in this example, user grain class. And then you get this method, the state property, of the type that we declared as the poco class. And you have this, usually use a single method,

41:01

right stated sync. This is where you say, persist my state. I set my properties persisted to storage. And how it works, there is a plug-in model. There are persistence providers. So you don't have to write code against specific storage, like Azure Blob, or SQL, or S3 in AWS. You just write a single line, and then provider

41:21

will know how to deliver this state update to specific storage. How you link them is through this attribute. You say, I want to use provider with this name. And then in the config, you can declare that this name is for Azure table storage. So you can change your storage that you targeted the code

41:43

without changing application code. You may need to migrate your data if you decided to move, but you don't have to change your code at all. You just change your config. But this is an opt-in feature. You don't have to use it. You can just write code where you talk to storage directly yourself. So it's up to you, just a convenience feature.

42:01

So we included a few providers with the code base, but there are others that are built by the community for storages we wouldn't even consider building ourselves. Another feature that we added maybe a year ago, slightly more, came from this need that when people use Orleans and see this RPC pattern, when

42:22

you call and you get a request response remote procedure call pattern, you say, well, I want to return a series of values, or I want to subscribe to values that somebody will produce. I want to produce a series of values. So you're talking about streams. And we have a stream API, which

42:41

is a single API over different delivery mechanisms. So there are three categories. There's direct TCP messaging, where you just want to deliver these synchronous updates directly over these connections between silos, just by sending messages, no persistence. Or you may do the same over durable queues, like Azure Queue or SQS, Event Hub.

43:03

Actually, Event Hub is in the third category. It's Kafka. In Event Hub, they are in a different category by themselves, because they're not really queues. They are distributed partition logs, where you can say, I want to go back to this offset in the log and redeliver messages from that point.

43:20

It's a very different, very powerful model. But we have a single API that works across all the three of them. I would say it's a controversial decision to have one API over three, because they have different enough semantics. So we question that decision. But that's what we did. We put the single API. And if you look at how it works,

43:40

I call provider, again, by name, because it's config-driven, like with the persistent provider. And I say, give me a stream of integers with this ID. ID is a GUID. So again, we took the virtual actor model and made it virtual streams. So as long as you know identity of the stream, you can always produce or consume from it. You don't need to create it.

44:01

You don't need to find it. Just say, I want to produce a stream with this ID, or I want to consume from a stream with this ID. So you have GUID and the namespace. It's easy to model things like user and user ID x, or device and device ID y. And then produce or consume messages. And you produce by just calling onNextAsync.

44:22

We model API on Rx, or async version it was supposed to be coming. That's not a controversial decision, because we took naming from Rx, which may not be obvious or the best choice for naming. We just try to be consistent with Rx.

44:40

Regardless, you call onNextAsync and produce a value, or you can produce a batch of values. And on the consumer side, you define your handler, which will be invoked, and you subscribe. Say, for that stream, I want to subscribe my handler. So it will be invoked when every value, every event on the stream arrives.

45:01

And that's it. So it's very few lines of code. And again, those streams there virtually exist the whole time. You don't have to do anything about managing them. But also, the streams work not just between grains. They work between the client, like the front end, and the grains in both directions. So there is symmetrical model. So if you have front end that terminates WebSocket connections, or MQP connections,

45:22

it's easy for that client to subscribe to event streams, front grains, and deliver updates in this low-latency interactive scenarios. So that's what it was built for. There is a lot of complexity under the hood to make this work. So this sort of pooling agent, so you need to distribute work.

45:41

If you run a cluster of 100 nodes, each node needs to pool from queues, if you're using a coven hub or Azure queues. And if machines go up and down, you need to redistribute this work. You need to cache this to be efficient. So there's a lot of complexity there to deal with that. And again, that complexity is done primarily by the runtime. So the application code can stay simple,

46:02

but the performance will still be powerful and robust, dealing with failures and redistribution automatically. For that, we'll leverage a bunch of other Orleans capabilities to run it smoothly. Name the tune.

46:21

Anyone knows when tomorrow comes? No? Eurythnics, I'm going to drink beer with myself. So years ago, when we built the first couple applications on Orleans, we started talking internally to product groups and saying, look, this thing seems to work.

46:41

But as usual, people in our industry are skeptical. They'll look at it and say, it's too simple. It's too good to be true. If it's that simple, there's probably a lot of things it cannot do. So there was a lot of disbelief until this guy came. What happened, Hoop, Somua, and those guys, they came to us and said, look, we designed this architecture for future services.

47:04

But then we discovered Orleans, new paper, and it looks like you implemented 80% of it and much better and deeper than we thought we would. So why don't we join forces and work on the remaining piece? Oh, and by the way, we need to be in production in three months. And this is where the quote in video is real, true story.

47:21

I turned to my team and said, once they left the room, these guys are crazy. If you want to take technology from Microsoft Research and put it in production in three months, I don't know what they're thinking, but let's drop everything and help them be successful. And we put this service in production, the first service for, I think it was, Halo Reach University Edition, in three months, and it worked fine.

47:41

We worked out a couple of bugs after launch, but nothing broke the experience. And they decided that this far exceeded our expectations. We're going to standardize in Orleans for the next major release, which was Halo 4. And so Halo 4, all of these services for it were built with Orleans within six, seven months with a very small team.

48:01

So it was very productive, successful launch, high scale, all of that. We proved that it worked. So that removed pretty much all concerns that Orleans is a toy, it's too simple. People were saying, if it works for Halo, it must work for me, because I'm smaller scale. And then we had these other gamers came.

48:20

Anybody played Age of Empires, Halo Castle Siege? So the back end runs on Orleans. And of course, in the fall we had Halo 5 release, which was very smooth. We were asked to be on call for the weekend, and on Friday we were told nobody needs to come, it runs smoothly.

48:42

I think it's good. And then came non-gamers. We have a couple of services that were built for Skype. We have several services in Azure monitoring and security. There was a fancy IoT project for launching this device into stratosphere at 40 kilometers, and very high.

49:04

These applications that you have on Windows or a Windows phone, if anybody has a Windows phone still, they may not look as sexy, but they all have hundreds of millions of users behind them. So it's still a lot of scale, a lot of data to deliver. Another game, Gears of War, is

49:21

going to be released this fall. It's also using the same back end. We never designed Orleans for gamers, which is paradoxical. People keep asking, oh, you built it for Halo? No, we didn't build it. We didn't even have it in mind. They came to us when we already had the system. But I think why gamers come first,

49:40

it's typical in our industry, because they have a very different environment. They're always in the bleeding age. They're always under a lot of pressure. They always rewrite a lot of code for the next release. And their load is unforgiving. So they have this spike on the first few hours, few days of launch, which is very different from any other service. Whatever you hear about Snapchats and whatnot,

50:00

they have the user base growing over time. So they have time to fix things up. Things don't scale. If performance drops down, they have time, months and years, to improve, and even re-architect. If a major game is released and has a problem the first few hours or few days, you lost the business. These users will just trash, and it's just unforgivable.

50:22

So it's a very risky business. And also the economics are shifting. So this business of selling DVDs or Best Buy and other retailers is slightly going away. So it's moving more towards virtual goods, virtual currency, content delivered through cloud. A lot of logic goes to the cloud. So they need to be in the cloud to stay in business,

50:42

stay competitive. They're good customers to work with, because they're very fast, very ambitious. Name the tune? Great. So when you talk to read analysts, they talk about quadrants, magic quadrants. I thought, why can't I have Sergey's magic quadrants

51:02

and define my own? Yes. It's also a queen. So interactive entertainment goes beyond gaming. You have interactive TV, other similar types of applications. When you have sub-second latency requirements and you have high scale, you need to deliver things

51:21

tailored to a specific user and analyze things on the fly. Which kind of bleeds into near real-time analytics. Has a different angle, but similar requirements of getting data and quickly making decisions. And then, funny enough, to look at fraud detection. Fraud detection for credit cards is not that different, actually, from cheat detection games. Very similar approaches.

51:42

IOT is the hottest area. So that's why I put this, I think, subconsciously as red. And we have this project I'm most proud of that people build all of this, these thermostats for Honeywell, Ryan Orleans. Or the project where they literally build a system to control up to two million mousetraps. Because the company services other businesses

52:02

with mousetraps. And they need to know when they need to go and come, when the mouse is there. So it's a funny IOT project. And the other one is this green power storage facility in Hawaii, in Oahu, which stores up to half a gigawatt of power, which some people wrote.

52:21

It's like a small nuclear power plant, but it's just storage for wind turbines and solar panels. But there are many more things that are possible. As I showed with these patterns, and I showed you just a glimpse of it, but there's much more that is possible. You can build all kinds of scale-out computer applications with these primitives.

52:43

We open sourced Orleans in January 2015. The experience far exceeded all our expectations. So it's a very different experience. Thank you. Yes, it is from Sting. We'll have somebody set them free.

53:02

It's just a great experience of dealing with all these people out there that collectively are much smarter than you are. So you have to be very humble once you go through the experience. Because you can never be as smart as all of them. And they're all passionate. They come there because they want to contribute, not because somebody asked them to. And that helped hiring.

53:21

I had no problem hiring five people just last couple of months. Because they say, look, you'll be paid to work on an open source project. And you'll be building your GitHub profile for your future employees. The best deal, I think, in town. That worked. That also will help move Orleans, the core CLR, and make it cross-platform.

53:42

Because there are people that want to do this work with us. So we don't have to do all the work. We have to coordinate with the community. But a lot of work can be done by the community itself. One important thing about Orleans is that it runs everywhere. It's not locked into Azure. There's this misconception that Orleans is for Azure.

54:01

No, it's not. You can run it anywhere. You can run it in your closet, in your garage, on some hardware that you purchased off eBay. You can run it on AWS. Some people do that. You can run it anywhere. And it's not tied to anything for this flexible configuration and provider models. You're not constrained by where to run. And usually Microsoft was viewed as fast follower

54:22

in a lot of technologies. I'm proud to say that in this case, JVM people were fast followers. So there's this Orbit JVM clone of Orleans. And they told us very explicitly, they wrote about Orleans, they heard about it, and they got blown away by the model. But because they were JVM shop there, BioWare is one of the electronic arts companies,

54:43

just implemented the same model in JVM. And they like it. And Roger Johansson is somewhere here. He's trying to do something similar in Go. We moved out of research. Great. Thank you. People know it.

55:01

We moved out of research about a year and a half ago to product group. But we continue working with research. Just a couple of projects that we've been doing recently. One is geodistribution. So all pictures I was showing there are about a single cluster of the Ryan and Orleans service, cluster of machines. So we went further from a single node, single cluster,

55:22

multi-cluster. So instead of one cluster, you run this kind of a constellation of clusters. And you can geodistribute them. You can put them in different geographies for locality, but also for availability. So if one of them goes down, the model stays the same. So you program, again, against these grains that are always available. The fact that one data center went down,

55:41

that shouldn't be your concern in application logic. The code should work, and the grain will be reactivated somewhere else in a different geography if needed. But you also can serve your local customers from the nearest data center automatically. The famous Phil Burstein, who co-invented acid transactions, he's working on adding acid cross grain

56:02

transactions to Orleans. And they're very far in that project and have some very promising numbers. We have other optimization paper was in the URCs this year in London a few months ago, published in optimizations. But looking why I think why this model works,

56:20

I would say that there are just a couple things that need to consider. So one is this contextual orientation. Because you have, Orleans model works when they have lots and lots of independent contexts, like users, sessions, devices. If you have this kind of application requirements, this is where the model works.

56:40

If you want to distribute databases, I would advocate against using Orleans for it when you have lots of rows and you need to write an operation that goes across them. That will not be efficient in Orleans. But when you have these independent contexts, that's easy to scale them out. It's easy to express the logic in this isolated manner of actors. But also, I would argue that this approach brings

57:04

object-oriented view back. And I'm arguing that it's more natural. The world is not service-oriented. I use that example when, say in African savanna, when the lion is talking to gazelle through his clothes and teeth, he's not talking to gazelle service IDX.

57:22

These two actors interact independently from other lions and other gazelles in the savanna. So that's the reality of natural world, where things are not service-oriented. They're object-oriented, distance-oriented. And this model fits as well. In the paper, we have a graph that we scale linearly. And actually, numbers now are 50% higher.

57:41

But that's the graph from the paper. So if we get back to this business requirements picture of time to market, return on investment, I would argue that more or less, we hit the first three requirements.

58:01

So I hope I demonstrated developed productivity. Linear scalability, you can find details in the paper. But I also didn't touch a lot on high efficiency. Our Linux code is very efficient. That's why we build our own serialization layer, one of the reasons. And people measure it against some competition found that's, I forgot, 23 or 26 times faster than something

58:23

down there. So we didn't sacrifice efficiency for simplicity. That's why I think we didn't solve the world problem. But I think we gave enough tools to address a class of applications in a very easy and very powerful way. That's my claim to you.

58:41

And I would encourage you, as a takeaway, to take a look at Orleans. Take a look at open source, if you've never done that. If you're a JVM, look at Orbit. Even if you cannot apply these kind of technologies, maybe the approach will resonate later in your work. And when you build your system, learn from our experience,

59:02

from our mistakes, but also learn that questioning established wisdom sometimes pays off. So you don't have to have supervision trees, I would argue. So that's all I have for you. Thank you. And if you have any questions, I can answer now or later.

59:29

The question is, what are the relations with service fabric actors and other related differences? Yeah, so first of all, the service fabric, the whole name conveys that service fabric is about service model, about running services,

59:46

managing services. That's the primary reason for service fabric. So Orleans is about implementing services. So yes, service fabric includes some libraries I've got to call them probly models.

01:00:00

They're more like libraries. And one of them is actor-oriented. But even though the APIs are very similar, the simple APIs, in reality, the implementation is very different. Because it's built to highlight other features of service fabric. For example, replicated local attached storage. While in Orleans, it's all remote.

01:00:20

Partitioning story is different. The placement is different. There's a lot of differences there. So that's why I would suggest you look at those differences and see which case works for you.

01:00:46

The question is, my insights into why service fabric team decides to build reliable actors. Like I said, so the reliable actors highlight features in service fabric that are specific to service fabric.

01:01:00

For example, it's replicated storage, and in general, in-memory replication. So you have these features. You have to leverage these features when you need to write service code. They have stateless services and stateful services. They just added this third model that actually can leverage these features in a different way.

01:01:20

So I think that's the biggest reason to showcase the features of the underlying infrastructure. Any other questions?

01:01:46

So the question about hosting in Orleans, can you run it on premises, on a single machine, in the cloud? The answer is yes to all of this. So yes, you can run it on a single machine. Especially a developer experience with F5 debugging is very easy, because you run two nodes within app

01:02:02

domains of the same process where your client runs. So that makes it very easy to debug and develop. So you can deploy a single machine, because it's just the process you started and the configuration you give it. You can run on premises. In fact, our nightly tests are performance measures and reliability as they run on the private cluster

01:02:21

from some hardware that we inherited for some reason. It's no problem, because it's really about storing membership information, which we recommend to use Azure Table anyway, because it's very cheap. We write just a few lines there.

01:02:40

You'll pay pennies a month. You can do it even if you run on premises. That's how we run our tests. So we run on private cluster, but we store membership information in Azure Table. So then moving this code to, say, a worker role or to a scale set, a VM scale set, is very, very easy, because the whole mechanism stays the same. Any other questions?

01:03:27

So instead of questions about the messaging and delivery guarantees, the messaging between actors is over TCP between two nodes where those grains are on, or single node if they're together.

01:03:41

The guarantee is at least once, but we have the retry logic, and it's there. You can enable it, but we turn it off by default, because in case of failures, when they keep retrying and delivering messages, you just exacerbate the problem. So we usually don't recommend to apply retry logic. But there is also the one thing I didn't mention.

01:04:02

There is a built-in timeout. So when you make a call to a grain, internally a timer starts. And when there is no response within the set period of time, then you get a timeout exception. So either your message gets delivered or you get a timeout exception. That's the typical case. So when there is no failure, you get a response or maybe

01:04:20

an exception, which is fine. But the retry, we recommend living to the application logic, because in many cases, you don't want to retry. You want to do it once, and if it failed, it's too late to retry, for example. So it's in memory. It's not queued. It's not persisted unless you use streams. So streams can go over persistent storage.

01:04:40

But messages, general messages, then method calls go over TCP. Does that answer your question? Comparison with persistent queues.

01:05:02

I think it's the throughput question. So all persistent queues, they have limits on throughput and latency or both. So this model is the most performant, because you don't write to any storage. You just send it directly. But if you need guarantees, then you

01:05:21

can use streams and go over persistent queues as easily. So that's sort of one of the trade-offs you need to decide on early on. Or you can change it, of course. Question? Mm-hmm? Mm-hmm? Does that happen inside a grain? So if that grain goes to sleep, does that handler then disconnect?

01:05:45

Excellent question. About handlers that attach to a stream, when you subscribe to a stream, what happens with its handler if grain goes out of memory, is it persistent or not? It is not persisted, primarily because we couldn't serialize delegates, and we couldn't do the magic work.

01:06:00

We won it first, but it was not possible to do. So the typical pattern is, there's this method on activated sync, which is like a constructor of a grain. It gets called when the grain is activated. So this is where you put the logic to retouch your handler. So when the message arrives and the grain is not in memory,

01:06:20

it gets activated. The method gets called. You retouch your handler, and then the event gets delivered. That's how you. Yeah, you need to persist that you subscribe to the stream and then retouch your handler. You have to do it, unfortunately. Any other questions?

01:06:45

Well, thank you then.