We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Immutability, State and Scale

00:00

Formal Metadata

Title
Immutability, State and Scale
Subtitle
Functional, Distributed Applications in Azure
Title of Series
Number of Parts
170
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Traditional approaches to build cloud applications have been very successful, but as we reach the performance limits of computing, perhaps a functional approach will yield the next quantum leap in designing and building high performance distributed applications. In this talk, we will take a traditional problem - building an accounting system - and develop a radically new functional architecture suited especially for Azure. We will apply functional concepts such as value semantics, and immutability to the Azure environment and develop techniques for super-scalable, functional workflows.
MathematicsState of matterScale (map)Functional (mathematics)Water vaporQuicksortInstallation artConsistencyMathematicsBitDatabaseData storage deviceWritingOperator (mathematics)State of matterScaling (geometry)Functional (mathematics)Military baseUniform resource locatorWorkloadRight angleElectronic mailing listLetterpress printingReading (process)Traffic reportingSoftwareCountingSeries (mathematics)DataflowView (database)Self-organizationCausalityCartesian coordinate systemNumberPrisoner's dilemmaInsertion lossDatabase transactionProcess (computing)Line (geometry)PRINCE2Table (information)Local ringMereologyPattern languageSpacetimeInformation technology consultingWhiteboardSet (mathematics)Information2 (number)Multiplication signData transmissionAlgebraFrequencyPhysicalismSequelLogic gateGreen's functionGoodness of fitDiagramDomain nameWindowLevel (video gaming)TwitterRelational databaseLeakCodeEndliche ModelltheorieFunctional (mathematics)Slide ruleComputer animation
Induktive logische ProgrammierungFlagBoom (sailing)Service (economics)Database transactionLetterpress printingMenu (computing)Computer fileChemical equationHierarchyChemical equationGraph (mathematics)Operator (mathematics)Computer fileFlow separationState of matterMessage passingHexagonInsertion lossTraffic reportingElement (mathematics)Transformation (genetics)Prime idealConsistencyScalabilityMultiplication signSource codeMathematicsDomain nameDifferent (Kate Ryan album)Network topologySet (mathematics)Film editingPentagonType theoryInformationRepresentation (politics)Level (video gaming)Serial portElectronic mailing listKey (cryptography)Data storage devicePower (physics)Cartesian coordinate systemOrder (biology)Heegaard splittingDatabaseCodeView (database)TheoryDemo (music)Point cloudService (economics)OctagonArithmetic meanGroup actionRevision controlWordProduct (business)Right angleWhiteboardVideo game consoleMedical imagingComputer programmingOctaveVideo gameWater vaporBit rateLogicDataflowDatabase transactionScaling (geometry)Event horizon2 (number)Data transmissionCivil engineeringComputer animationDiagram
Execution unitMaxima and minimaSeries (mathematics)Plot (narrative)InfinityAbelian category1 (number)Mach's principleElectronic mailing listLocal ringSimulated annealingChi-squared distributionBuildingThermal conductivityCore dumpState of matterComputer fileProduct (business)Electric generatorMereologyMultiplication signDatabase transaction2 (number)BitInternetworkingPlotterDomain nameCodeMoment (mathematics)Computer animation
Abelian categoryPlot (narrative)Indian Remote SensingString (computer science)Execution unitMechatronicsChemical equationThomas KuhnContinuous functionForm (programming)InfinityMenu (computing)InternetworkingOffice suiteDomain nameProcess (computing)Line (geometry)Cartesian coordinate systemData structureMultiplication signCodeState of matterSequenceDatabase transactionCASE <Informatik>Computer animation
Image warpingInheritance (object-oriented programming)Plot (narrative)Inclusion mapGamma functionDataflowMereologyPosition operatorShared memoryInternetworkingLevel (video gaming)Data structureExpert systemComputer programmingConnected spaceState of matterCodeProgrammer (hardware)Computer animation
Database transactionComputer fileNumberData structureBitRight angleComputer animation
Wechselseitige InformationDuality (mathematics)Maxima and minimaChemical equationDatabase transactionState of matterInformationTraffic reportingData structureNumberData storage deviceOrder (biology)Touch typingMereologyPoint (geometry)GenderWhiteboardMultiplication signNeuroinformatikDatabaseProgram slicingComputer animation
QuarkState of matterConsistencyPoint cloudQueue (abstract data type)Process (computing)Single-precision floating-point formatDecision theoryWorkloadMessage passingCoprocessorDemo (music)Pattern languageNumberSeries (mathematics)Right angleRelational databaseSemiconductor memoryExecution unitLevel (video gaming)SpacetimeFunctional (mathematics)System callConnectivity (graph theory)Data storage deviceArithmetic progressionMultiplication signMechanism designDataflowGenderPhysical systemCartesian coordinate systemFigurate numberFerry CorstenConstraint (mathematics)MereologyChainWeb 2.0WindowFunctional (mathematics)Temporal logicComputer animation
Query languageMechanism designMessage passingCoprocessorMathematicsFrequencyBitOrder (biology)NeuroinformatikIdempotentArithmetic meanCountingPhysical systemView (database)Endliche ModelltheorieScalabilityQueue (abstract data type)Category of beingAreaSequenceCASE <Informatik>Insertion lossPoint (geometry)Execution unitState of matterConsistencyGoodness of fitFitness functionSeries (mathematics)Pattern languageError messageProcess (computing)Natural numberException handlingSound effectPoint cloudMultiplication signConcurrency (computer science)Operator (mathematics)Right angleSoftware frameworkSystem callRevision controlClosed setFunctional (mathematics)Disk read-and-write headLocal ringGodPlanningLiquidDressing (medical)Computer animation
Java appletMessage passingSoftware frameworkClient (computing)Function (mathematics)Instance (computer science)ImplementationServer (computing)Moment (mathematics)WeightArrow of timeException handlingCore dumpPattern languageEndliche ModelltheorieMathematicsVirtualizationObject (grammar)Fitness functionTemplate (C++)Visualization (computer graphics)Uniform resource locatorVirtual machineSemantics (computer science)State of matterMultiplication signShared memoryConcurrency (computer science)Electric generatorFormal languageDemo (music)Control flowAbstractionLevel (video gaming)System callCodeMetropolitan area networkRight angleWebsiteConnectivity (graph theory)Single-precision floating-point formatProgramming paradigmHydraulic jumpBlogPropagatorComa BerenicesPhysical lawDot productWhiteboardPoint (geometry)Group actionExecution unitGradientLine (geometry)Green's functionProcess (computing)Local ringComputer animation
SynchronizationDegree (graph theory)Demo (music)Server (computing)Moment (mathematics)Service (economics)Figurate numberVirtual machineSource codeComputer animation
Hardy spaceExecution unitComputer wormBuildingInteractive televisionView (database)Module (mathematics)CodeInstance (computer science)Wechselseitige InformationEmailAnnulus (mathematics)Chi-squared distributionMoment of inertiaBinary fileMenu (computing)Normed vector spaceSynchronizationTask (computing)Hacker (term)10 (number)SimulationConvex hullLink (knot theory)ArmTwin primeServer (computing)Data storage deviceBlogClient (computing)Multiplication signCodePoint (geometry)Right angleService (economics)Line (geometry)Revision controlWeightInterface (computing)Process (computing)Latent heatCASE <Informatik>Point cloudCartesian coordinate systemLevel (video gaming)Group actionTerm (mathematics)ImplementationNeuroinformatikCategory of beingArmDemo (music)Task (computing)State of matterFactory (trading post)Metropolitan area networkInformationObject (grammar)Structural loadGame theoryRun time (program lifecycle phase)Chemical equationResultantMoment (mathematics)Cheat <Computerspiel>QuicksortLimit (category theory)Recurrence relationMereologyThomas BayesDatabase transactionParameter (computer programming)Basis <Mathematik>Message passingSound effectFunctional (mathematics)Cellular automatonVisualization (computer graphics)Proxy serverSynchronizationTransport Layer SecurityElectric generatorExtension (kinesiology)GradientSocial classExecution unitDifferent (Kate Ryan album)Computer fileFunction (mathematics)Source codeTemporal logicComputer animation
Cartesian coordinate systemGreen's functionBlogLevel (video gaming)Functional (mathematics)Software frameworkExecution unitPropagatorGoodness of fitMathematicsMereologyMultiplication signEndliche ModelltheorieRight angleResultantException handlingVirtual machineGodScaling (geometry)Unit testingProduct (business)Different (Kate Ryan album)Condition numberMoment (mathematics)Computer programmingSoftware bugDataflowVideo gameServer (computing)FrequencyState of matterConsistencyWorkloadOrder (biology)WritingRoundness (object)Fundamental theorem of algebraPhysical systemDemo (music)NeuroinformatikObject (grammar)Functional (mathematics)Data managementWhiteboardVideoconferencingSystem callExtension (kinesiology)Sound effectMedical imagingComputer clusterComputing platformGraph coloringCodeSoftware testingVisualization (computer graphics)Term (mathematics)Pattern languageLocal ringScalabilityAbstractionComputer animation
Transcript: English(auto-generated)
Good night, guys. You guys enjoy the keynote? Yeah? You guys looking forward to all the awesome sessions at NDC? Yes? All right, that's good. My name is, you know, both John and I have, I suppose, the privilege of kicking off NDC, at least in this room anyway.
And so, yeah, we're looking forward to it. We've come all the way from Melbourne, Australia. So both of us are from Melbourne. So, you know, compensate for the weird accent by trying to talk slowly. Any Australians here? Anyone from down under? Okay, we should get more people to come here.
Anyway, both of us are from Melbourne, and we're both Windows Azure, I should say Microsoft Azure MVPs. They changed the name. And my name is Mahesh Krishnan. That's my Twitter handle over there. I work as a principal consultant at a consulting company in Australia called Redify.
And my co-presenter is John Azariah. And he works for Applied Data Science. He's actually starting the new job over there. And he's got this fancy title of data scientist. I'm not sure what he does. Does it involve varying lab codes and looking through data in a microscope, John? Actually, there's a talk about data science at NDC. I'll find out what it is and then I'll tell you.
All right. John and I have been working together for a number of years now. And as part of my consulting engagements, I was over at a company called MYOB in Australia. They write accounting applications for small and medium-sized companies. And so some of the concepts in our talk, we touch upon the domain of accounting a fair bit.
As part of our job and we work together, we also build one of the largest SQL Azure installations in the world. But anyway, today's talk is all going to be about... The title of today's talk is Immutability, State, and Scale. Writing Functional Distributed Applications in Azure. It's a bit of a mouthful, isn't it?
First thing, of course, is, you know, we put in a lot of buzzwords. So, you know, it's officially buzzword compliant. The second thing is we thought, you know, if we strung together a series of buzzwords and put them in a slide, then people will be inquisitive enough to find out what the hell are these guys talking about and try and come into our talk. I think we've been reasonably successful. We've got a fair bit of amount of people over here.
Anyway, jokes aside, more seriously, what are we going to be talking about today? To actually explain that, I wanted to actually take a step back and explain with a story. Let's say we've got a store selling anything, maybe auto parts or whatever it may be. In fact, it's not an online store, okay? It's a physical store, bricks and mortar store,
bricks and mortar as they're called these days. You know, you come over with a shopping list and, you know, you push a trolley and you buy a bunch of things and then you go to the checkout counter and then you buy all these things. You pay for it and the person in the checkout counter, you know, in Australia we call them as checkout chicks. So they do all the billing for you and they hand over a bill.
Effectively what we call it is an invoice. So they give you an invoice. It contains all the things that you've actually bought. And this is the 21st century, right? So there's some kind of an accounting software in Play over here. So all the stuff that you actually purchase, all the data and everything ends up going to this accounting application.
Now at the back of the store, you know, our accountant turns up every now and then and he wants to look at the health of the organization itself. And he prints out a series of reports to do this. So, you know, the accountants all deal with numbers. So, you know, they look at profit and loss, credit debit, this and that. So, you know, one report doesn't do everything so he actually has to print a number of reports.
So, you know, these reports actually take some time to print. So he kicks off report one and, you know, goes and gets a cup of coffee or something and comes back and prints report two. Now while this is happening, the store is making more sales. You know, there are people rushing down and buying more things, buying more stuff and generating more invoices as well in the process.
It's a good thing for the store, right? It's great. But what happens is when these new invoices actually get generated and the database actually gets updated, the view of the world actually changes for the accountant or the report that he's actually printing. So report one reflects one view of the world and report two suddenly shows another view of the world for him.
So there's no consistency all of a sudden. It may seem like a fairly trivial thing for us, but it's actually a big deal for accountants. And, you know, typically to explain it in another way, all the data end up storing in some kind of a relational database. So, you know, when I actually generate invoice one,
it goes and does a series of inserts and updates on the relational database. And the report pulls data out of this relational database. When you actually start off triggering the printing of report two, more invoices are being generated and that actually changes the view of the world.
So report two now contains data in green and report one contains just the data in red. And this is, you know, they don't correlate together for them. The accountant really, what he wants, his view of the world to be frozen as soon as he actually starts printing report one until he finishes all the steps in this workflow,
printing all the reports. Now, I'm not sure, I mean, this problem is not isolated to printing reports or, you know, the accounting domain. If you're writing large line of business applications, you probably have this problem as well. So you can look at it another way, right?
Whenever you're, you can look at it as a business operation, as a series, as a workflow, which involves a series of steps. So, you know, when in step A, you go and update a bunch of tables and, you know, all that stuff gets changed. And then you kick off workflow two and workflow two, the step in workflow two or workflow in operation two
starts updating the database as well. And when you actually start continuing on to step B over here, notice that it now actually sees some kind of leakage of data that's from the second workflow that you have.
And this is actually a big problem. What we really want is a consistent view of your data across the full entire workflow. You don't want that changing at all. You also want another thing. I mean, you also want some kind of isolation. So, you know, you don't, you want to be changing data.
And you want to, whenever you change data, it has to be isolated to that single workflow. You also want decoupling of, you also want decoupling. So, you know, you want your workflow to consist of multiple steps. And, you know, right now with consistency and isolation,
you can actually have things like, you know, transactional level locking or, you know, you can create a snapshot and work off it. But in all these scenarios, typically they work with one step. So, you can have a transactional isolation for one step, but what happens when this is a really long-running operation, this whole workflow? Do you lock your database or, you know,
alter the transaction for that period of time? And that kind of like kills scalability. So, how do you actually get this? So, effectively we have three problems we are dealing with here. One is the problem of consistency. So, you want to have a consistent view of the data end to end from the beginning when you actually started the workflow and to when you actually ended it.
You also want isolation so that you can freely make changes to the data without having data leak over to other workflows and things like that. And you also want to have decoupling. So, you want to split it up into smaller chunks so you can actually work with it. So, in fact, that is the problem that we want to try and talk about today.
So, what we want to do is we want to offer a solution. Actually, it's not a solution. We want to evolve a solution and in the end give you an optimal pattern or a model to work with. So, I'm going to hand you over to John who is going to walk through the solution. Yeah, thanks Mahesh. So, actually let's just stay on this slide for a bit.
One of the things that we actually see over here and then I'm going to try and step away from the accounting domain as much as possible because I don't want people to fall asleep or something. We'll step back and then we'll try and see if we can apply this in a very general sense, right?
There are actually two kinds of data transfer that are taking place in this diagram. It's very likely that when you kick off the second set of workflows, it wants to actually include the information from the first set.
What you don't want is the first set to have the information from the second set leaking back into it. So, that's kind of a problem. So, there's actually a legitimate use for data going forward from operation one to two. And then there's the stuff that you don't want which is stuff leaking between operation two back to operation one, right?
And then there's the passing of state between the various steps in each workflow. You want that to be consistent. So, as Mahesh pointed out, there's consistency within the one workflow
and then you may want to do a transformation of some state to kick off the next workflow. So, that's actually a legitimate requirement. So, there is an element of wanting to have shared state but we're going to talk about how we can actually achieve that without killing our scalability.
The way we're going to do that is we're going to model the problem a little differently. So, we're going to recognize those two types of data that need to be transferred and we're actually going to embrace that and move forward. So, if we had modeled the workflow to actually pass the data itself,
if you look back in the old database world, what were you actually passing between the stages? You were passing a primary key between the various stages and the primary key was a reference to some data in a mutable store.
So, you guaranteed that you were talking about the same thing but you had no guarantee that the thing didn't change between the time you last saw it and now, right? So, what we're going to try and say is why don't you look at passing the data itself instead of passing a reference to the data
and then maybe we have a general step which takes an octagon, is that it? Yeah, it takes an octagon and converts it into a pentagon. So, there's some transformation that's taking place in operation one between step A and step B and then there's another step that takes the pentagon and gives me a hexagon
or whatever it is that you want. So, there's data transformation and because you're passing the data around, it doesn't matter what else is happening in the world. The view of the world is restricted to the data that you've been given. So, this is actually a very functional concept.
The concept of dealing with a value semantic gives you an enormous amount of power. I mean, people who've done functional programming rave about the fact that immutability is the kind of cornerstone of being able to decouple many, many things and split pieces of work up nicely.
And then we talk about going from D to D prime. Now, that looks kind of interesting because you have some information that you have. You have a hexagon with D and you're doing something to it to convert it to a hexagon with D prime. But the trick there is to make sure that you don't actually modify D itself.
You take a copy of D and introduce the changes into D prime and get the second hexagon which then kicks off its workflows. By doing this, we have actually managed to achieve the isolation requirement that we want. Now, obviously there are a whole bunch of people asking some questions.
Hang on, we have some answers for you. But the first and most important thing is that you've got the isolation piece sorted out. It's consistent for each workflow and each workflow is isolated from every other workflow. So you've achieved both the isolation and consistency.
We'll come to decoupling in a minute. But the key thing that we are recapping is that we started off with trying to get these three features or facets of the design into our application in order to make it scalable.
And we discovered that having the thing centered around mutable data store was the thing that was killing it. And we came up with a couple of ideas about how we're going to achieve getting those three things. Okay, old habits die hard.
This is all very nice to talk about theory. Now let's talk about something that hopefully will not put you to sleep. So bear with me, please. I apologize. In the accounting domain, we have this concept of a set of accounts. Now, a set of accounts kind of reflects all the various pieces of your financial state, right?
You know, where your money is coming from, where your money is going to, what are the assets you have, what are the liabilities, who owes you money. All of that stuff is represented in a chart of accounts. And in this handy little application that I've been working on, we have a nice little hierarchy that tells you, you know, there's income and expense
and all these other things in there. And we have a few accounts over there. I'm pretending that I actually had a company that sold fast cars and expensive wine, right? And I have my friendly customer who has a lot of money.
You guys pay taxes in Norway? We pay a hell of a lot of taxes in Australia. So we have separate taxes for luxury cars. So there's a special code for luxury car tax. And we have special taxes for wine. So there's one for that as well. And of course, we don't give anything away free of cost, right? Our smile costs you $180 every time you do business with us.
So we put that on the invoice as well. And what happens in the accounting domain is that in the accounting domain, that thing is called a source document. It is a representation of a business event. Now, some of this may actually directly translate to other domains.
But if it isn't, just think about what I'm actually trying to solve here. We have a representation of a business operation, business event. Something happened. People bought something from us. In the accounting world, when we want to recognize the fact that this invoice has taken place, we convert it into what is known in accounting parlance as a balanced journal entry.
Now, a balanced journal entry is actually a set of instructions of what you want to do to each account. So it says, well, my trade debtors, otherwise known as accounts receivable, should be debited by that amount of money.
And I got that amount of money by selling something from the fast cars account, from the expensive wine account, and then the service with a smile account. And, of course, the government needs a cut of the action, so there's three taxes that I have collected, which I have to account for, right? And now I've transformed a source document into something that is going to affect my state.
And if you think of the chart of accounts as the state that I wanted to mutate, every time a new invoice is created, I get one of those.
I have a workflow that converts it to one of these. And the next workflow goes ahead and updates the account list. So you can see that there's both the ideas of having serial stages of a workflow, having data being passed around, and the fact that you have to cumulatively change and maintain state.
Both of those are actually showcased in this example, right? So I'm going to walk you through a demo where we've done precisely this. I have a company file in the cloud just to make life easy. It's got 200 odd transactions on it. And I'm going to take an empty chart of accounts, and I'm going to apply each entry one at a time,
create the intermediate chart of accounts, switch over, and do something interesting. So the accountants among us, I hope I'm not offending anybody, but really the last thing I want is to generate a profit and loss report and try to explain that.
So I'm going to do something interesting. I'm going to draw a graph that shows me what the tree looks like, because we're all engineers, we like that. And then I'm going to draw a plot chart with some interesting data, right? So let's just take that. So that's what my code looks like. So I get my company file, which is there.
And then I generate each of the account activity registers, otherwise known as the chart of accounts, based on every general entry that's inside. I'm just going to comment that out for the moment. I'll tell you why.
And then I'm going to skip over the boring bits, and I'm going to build a DGML file that represents the company file after each transaction was created. And so you can see here that this is the pipeline. So the first pipeline is the incremental addition of state,
and the second part of the pipeline is the actual generation of those two things that are dependent on the account activity register here. So let me run the code. It's going to take a little bit of time because it's going to hit the internet and whatnot. And then I can take a look at the code. One of the key things to note here is,
although this example actually takes the accounting domain into consideration, and everything is based on that, it can apply to any of your own domains. How many of you write boring line office applications? All of you, put your hands up, I want your job. You can actually see some of the concepts that you can use in those applications as well.
So right now it's going off to the internet and pulling some stuff down. And what I'm going to do is show you how that state gets transformed. So using immutable data structures,
the account activity registers an immutable data structure in this case, I basically take all the journal transactions that are there and sequentially unfold them by applying each journal one at a time to generate a sequence of account activity registers.
And this gives us the ability to propagate state in a managed way without violating immutability. So in the meanwhile, the code is compiled and it should be getting ready to spit out something.
By the way, I want to call your attention to the fact that, okay, so I'm not an expert F-sharp programmer. I think I started programming F-sharp about three months ago. And it's quite easy to see how the workflow structure is represented in the code.
You can actually see that we're taking the data and just pipelining it through the various stages, right? Now this is going to be important because in a minute, when I finished with actually generating the code, I could save the individual intermediate states,
which is going to give us a good position to be in to talk about decoupling, which will happen in the next part of the talk, right? Now in the meanwhile, I'm about to give up on the internet connection here because... So it looks like...
I'm going to just run this. I'm going to hand over to Mahesh. And then when I come back, we'll go back and take a look at what we've achieved. Oh, yes, here we are. Sorry. Here is the first transaction that we got. And I'm going to open the DJML file and look at it. And this is how we start a business, right?
We have some... When we started, we put some money in the bank, and we recognized that... Can you zoom it a little bit? Yeah, sorry. Can everyone see that? A little more? All right. So that's my company file structure right there.
And I've got two heading accounts under assets and liabilities. There's a bank account, which is a kind of asset, and other current liability, which is a kind of liability. And then there's the actual account itself. And that red number says that I have now debited
$4,130.89 to that account and accounted for that as a historical adjustment, right? So it's a neat way of looking at the data. So let me pull the second thing. So what's the first thing that happens to us? Again, you guys might be familiar with this.
I don't know if you are. But the bank just socked us with $15 of fees, right? And you can tell that. And as long as the top number here is zero, you know that it's a balanced chart of accounts. So you can generate a balance sheet from this information. So creating this state, creating this information in a progressive state
allows us to process each of these things the way we want it. We can use the information however way we want. I'll just show you one more, and then I'll show you something else interesting. Let me go and pick something with about 200 transactions in there to give you an idea of what a busy company looks like, right?
And, yeah, we don't want to do that. So, again, you can look to see that there's many, many transactions happening and all of that, but it's still balanced.
And at any point in time, we've computed the balances and kept them available for you. So this is quite a useful data structure. It's quite a useful way of thinking about it. And you can do this without using a database, which basically allows you to generate a lot of stuff. One of the things that is very difficult to do in a traditional, database-driven thing
is the fact that you can generate a report like that. Now, that report, I'm going to just zoom in a touch there. That report is actually the report of how the account balance has varied with each transaction.
That's an extraordinarily difficult report to produce if you have a mutable data store because you're changing values as you go along. There's no way to actually recreate the state at every given transaction in order to be able to generate that. And usually when an accountant gets a report like this,
he'll get one slice of this. That's the point-in-time account balance, and that's what he has to recreate the behavior of the business with. Imagine how much easier it is if you show him this, where it's quite easy for him to say, hey, something happened over here. You know, the assets in the company dropped by some amount, and that's something that I need to go and investigate.
The way in which you view the information allows you to actually take advantage of immutability. So as I mentioned, we could actually store the individual intermediate states, and that's going to lead us to the next part of the talk, which is going to talk about decoupling.
All right, let's talk about decoupling. Just to summarize what we've actually seen so far, one of the things that we wanted to achieve
is three things that we actually spoke about in the problem domain. Decoupling, consistency, and isolation. Rather than relying on relational database of some sort, what we did was pull the immutable data and we pass it around from one step to another within a workflow. So we kind of got consistency across a single workflow.
And similarly, we got the isolation as well, because in a workflow, you had the ability to change things as well, and it didn't affect other workflows. And because we had this functional way of thinking, we kind of broke each and every step in the workflow,
like in a functional way. And we had some kind of decoupling as well. But what John did in his demo was he ran everything in a single process. And the state was in memory all the time.
So all he was doing was passing the state in memory into another function, into another function, and so on. So what we need to do is to have temporal decoupling. So what is temporal coupling really? It's basically the coupling associated with time. So if one component finishes executing,
and the second step actually depends on the first one to actually finish, and it has to execute straight away, so it's kind of like based on the time the first one finishes to kick off the next one, you kind of have temporal coupling. So the way to actually have decoupling, the first thing that you need to do is to take those individual steps over there
and then persist that into durable storage. So now the first thing that you do is, rather than actually pass the memory around, you pass a reference to some kind of a durable store, persist the data. The second thing that you need to do is, you know those individual components that I actually spoke about?
You need to kind of like create those into individual execution units. You know, it can be like a put it in a new zone, assembly, and so on and so forth. And third thing that you need to do is, you have to have some kind of an external trigger that actually controls each and every step of the workflow. Now what you can do is, you can execute step A or step one and step two,
and then if you choose, you know you don't have to run step three straight away. You can run it in a completely different time, and the workflow still continues to run along without any problem. Now, we've kind of like solved the problem with, you know, we have the temporal decoupling. We also want to have spatial decoupling.
That is, you know, coupling associated with space, or where in space these components actually exist. Now one of the main reasons why you want to remove spatial decoupling and you want to have temporal decoupling is the ability to scale the application out. I mean, we are in the age of the cloud, right? I mean, effectively what we want to do is
to throw this application in the cloud and then run it. So to do that, I mean, you look at the original workflow. Effectively you had each and every state stored somewhere in a durable store, and then you actually had each and every step as individual components that you executed.
So what you need to do is throw all these individual, pick these, you know, the states that are actually stored somewhere, and then throw it up in the cloud. You know, potentially you can move it up into blob storage or something like that if you're working with Windows Azure. And the second thing that you want to do is pick those individual steps and, again, throw it up into the cloud.
So you can either run them as worker roles or web jobs or whatever it is that you want to do. And the third thing that you want to do is to move the trigger mechanism across to the cloud. So in the cloud you have queues. So using these queues, you should be able to trigger the next step in a workflow. So effectively what you do is once a workflow has actually completed,
you have the ability to send a message to the queue to say, hey, you know what, I've finished executing, kick off the next step, and so on and so forth. Now, John and I actually presented a session at NDC London last year. We spoke about modern application patterns. And one of the things that we spoke about is a pattern called queue-centric workflow that is kind of like ideally suited to do this.
So I'm going to hand over to John to quickly touch upon the queue-centric workflow. Thanks, Mahesh. So one of the things that we want to achieve is the fact that we have a way of actually sending messages now through the queues,
and we need to process each stage of the workflow in the queue. So we can do a traditional approach of basically saying everybody listen to the queue. If you find one of my messages, then I'll start doing something and then put something else back on the queue. Or we could take a slightly different approach
and have a generic engine and a self-describing message to be able to figure out which state of the workflow that you're in and dynamically invoke the queue-centric mechanism to perform the workflow stage and then chain the next stages in the workflow together.
The fundamental idea is that you model the workflow as a series of commands in a command-query-type system, and there are some constraints that you have to worry about about how you break the pieces of work together.
And then you can actually use different mechanisms, and the reason why you want to use different mechanisms is dependent on how scalable you want the system to be. So you can, for example, use web jobs which monitor the queues, or you can use a queue processor that listens to the queues, have any number of them listening to the queues,
and be able to pick off the next message, process it, and then propagate the workflow through. Now, the thing about Azure Queues is that we have this technique of passing messages that's called at least once a delivery guarantee.
And what that means is, because we're working in an unreliable fabric, the cloud is actually an unreliable fabric, you build resilience into your applications, right? So when you do that, the worker that picks up the next message may actually die because of some transient error or whatever it is.
And the queue makes sure that if a message hasn't been marked as processed within a period of time, it makes it available for someone else to pick it up. So not only do you achieve the ability for being able to process multiple messages yourself,
you also are guaranteed that each message is going to be processed at least once. Now, obviously, the people who've kind of thought through this process a little bit are getting very nervous, because you've suddenly got no guarantee of the order in which the messages are processed, right?
And potentially, you may have conflicting messages that show up, and they may show up in the wrong sequence. Now, how do you handle that? It's a very good question, really. And the way you handle that is you consider the modeling of the workflow step,
and you talk about making workflow steps item potent. That is to say, if the same message gets applied twice, you want to make sure that the effect of it is exactly the same as if the same message got applied once.
So think about a scenario where we have, you know, I'm transferring money, and my best friend's transferring me a million dollars, okay? And the message gets put on the queue, the worker debits his account and tries to credit my account and dies. Now, the money is gone from his account,
but it hasn't hit my account yet. If the message gets put back on, the same thing will happen, and now you've debited his account two million, right? And to prevent that from happening, you have to make sure that you model the workflow to be based on item potent operations.
Now, this is not only a good idea for queue-centric workflow. This is actually a very good idea for any distributed application because in any distributed application, the very nature of the fact that it's distributed means that you're already dispensing
with the guarantees of order. You're already dispensing with the guarantee that I will never see this message again because there may be some odd case where stuff gets put on the queue twice, and you have to deal with it. So in our case, we actually take care of this in the accounting system. We take care of this by making sure
that inserts are never just blind inserts. They're always insured. Check to see if it's there first, and if it's not there, put it in. That way, even if I receive the same message five times, I do it properly. There's a fair bit of detail that we have to cover about queue-centric workflow if you want to go down that road.
Come talk to us because there are things like poison messages. What happens in the cloud when, God forbid, there's a bug in your code, and you throw an exception in the middle of processing a message, the worker role dies, comes back up, and you'll find the same message waiting for another worker role, which will take that message and then choke.
And if you have three or four of these, and you've got half a dozen worker roles, pretty soon all that your worker roles are doing is picking up these poison messages, choking on them, recycling themselves, and then not actually doing any real work. So there are mechanisms like retry counts and stuff like that that you can use to make that problem go away.
But this talk is not about queue-centric workflow. This talk is actually about how to model the problem, and idempotent functions are actually an interesting way of modeling the problem, and then applying them to this framework. Where have we reached so far, right? I mean, we've basically come to the point
where we achieved consistency within workflows. We have now talked about isolation between workflows. We've talked about managing state so that workflows get incrementally applicable
starting points and so on and so forth. And we've figured out how to store state in the middle and pass references to immutable data, as opposed to the old way of passing references to mutable data, right? So that we can now split up the units of work. We don't care when they run,
because they can run at any point in time. The view of their world hasn't changed. You don't care about where they run, and we have a mechanism that allows you to basically have more than one processor listening to a series of messages in a queue. And we've kind of achieved scalability on this front.
But is there a better way of doing it? And there's been a relatively new idea, as Mahesh will point out, which talks about that. And I'm going to hand it over to the actor model. Just to summarize what John summarized, right?
This is a long summary. Effectively, we set out with the three problem areas of consistency, isolation, and... What is that third one? Decoupling, right? So we kind of addressed that. But is there a better way of doing this? I mean, you could kind of individually go and implement what we actually said. Or is there a way of actually doing that,
some model or pattern that allows you to do it? So there is this thing called actor model that kind of like fits into this, and you could use that as well, use that to solve the problem that we're after. Which kind of like brings us to the question of what is an actor? Apart from the obvious person who actually turns up on TV, movies, and plays, right?
Especially now you've got to go back and look at the actor model. What an actor model is, is it's a model for concurrent computation, right? And the basic unit of computation in it is called an actor.
So what is a unit of computation, right? That's the first question. The unit of computation is, basically it's got three properties. One is it does some processing. So you can go ahead and do some work. It holds on to some state. So it knows what it needs to do next, or you know what it's working with. And it needs to have the ability to communicate
with other units of computation, basically other actors as well. So that's, if you're from an object-oriented paradigm, you can actually look at objects, and objects have basically state and behavior, right? An actor has state, behavior,
and the ability to message other actors as well. That's effectively a way of looking at it. An actor can do three things. One is, you know, it will do some processing. It can actually create other actors. And it can also send messages to actors that it knows about.
Now, you know, John said, oh, you know, this is like a new thing that we're actually going to talk about. And the funny thing, this actor model has been, you know, has been around for a while. Now, Carl, if you would propose it in the paper in 1973. I'm going to give you a second to actually, for that to sink in, right? 1973. A lot of people in this room weren't born at the time, right?
So it's been around for a fairly long time. And to be fair, there are a lot of implementations of it. You have, you know, language like Erlang and Scala and Java have had implementations of the actor framework, actor model. Even .NET, there have been implementations of the actor model in it.
So it's been around a while, and I think it's kind of like really well suited to solve the problem that we're after. If you look at some of the core concepts, right, actors actually don't share state at all. What they actually do is they exchange messages, and these messages are immutable.
Kind of like see a pattern over here, you know, all the stuff that we've been saying about immutable data and so on. This kind of like fits in nicely with it. What we can actually do is to kind of like further this, we can also pass in a reference to an immutable data if we have that kind of like doesn't break any of the tenets of,
you know, the philosophies of how the actor model is supposed to work. Another key thing is that actors are asynchronous, and they all run in a single thread, so you don't have any concurrency issues and things like that. Now, there is an implementation of the actor model, and it's been released with Microsoft Research Group.
And, you know, they've been working on it for a few years now, but they announced it and built for public consumption. It's still in preview at the moment. So it is basically nothing but an implementation of the actor framework. It's called Orlean's. It's basically built on .NET and Azure, right?
So it's really well suited for it. The concept of actors in it is implemented using grains. So grain is nothing more than a virtual actor. So actually you ask, why is it a virtual actor and not just an actor, right? Because there are some changes to it.
I said earlier that the actor can create other actors. In Orlean's, you don't create an actor, right? A grain exists all the time, virtually. It exists all the time, virtually. So if you want to get access to a grain, you basically go to Orlean and say,
hey, just get me this grain, and this is the idea associated with the grain, and it will basically give you the grain, a reference to the grain. And you don't really, and you also get location transparency with it. So you really don't know where the grain actually exists. You almost access it as if it's a local, you know,
it's sitting in the local machine and you just basically call it. The other thing about actors, as I said, was this messaging thing. So if you wanted to send messages, you need to know, hey, there's this mailbox, I need to send it to the mailbox, and what if it actually doesn't get there? Do I need to retry it? And a whole bunch of other semantics in play here.
What Orlean's does is it kind of builds a level of abstraction on all of these things. So effectively, you don't send messages. Instead, you just make a method call. And the Orlean's framework, under the hood, will do all of this stuff for you. So it will actually send the message across, and basically when you get a grain,
and you send a message across to it using a method call, it will go and activate a grain, and then it'll do the processing and send you back the output of it. And everything is asynchronous.
So if you're using the async await keywords and so on that you're so familiar with now with .NET 4.5, it all just works quite easily. And because it's built on .NET, it's got this seamless, you know, you've got this automatic arrow propagation, and you can just use things like try-catch blocks to catch these exceptions and deal with it.
Now, if you want to play with it, you can just go over to orleans.codeplex.com, download Orlean's. It's more than just an SDK. So when you actually, it's also runtime, right? When you actually install it, you get a bunch of templates that'll get installed in Visual Studio,
so you can go and create grains and so on and so forth. When you compile code, it does a bunch of code generation for you. We'll actually show that in the demo. And what else does it do? Yeah, of course, it provides an SDK that you can actually use. There are two components.
One is, of course, the client and the server. So using it from the client is quite easy. When you're running it as a server, it's basically what it's actually called as silos. So a silo will actually hold on to all these grains. And once you start scaling this out,
so if you're installing it in a worker role, so you can have multiple instances of the worker role running, so effectively you're creating multiple silos, and you have this silo cluster. So when you go over to the worker role and say, hey, just get me a grain, it may create any of the worker roles
in the silo cluster. And you don't care where it's running, and you shouldn't, really. It's completely transparent to you. So what we may actually do is we may jump into the code and show you how the grain actually looks like and how you can actually activate it and create it. So we started with the idea of pipelining stuff.
Now we're in a fabric that actually knows how to do a whole bunch of stuff with it. I'm going to run this demo first because it takes some time, as I'll explain to you why. And it's kind of frightening what it does, really.
I start up the server over there, and it spews out a whole bunch of stuff. But effectively what it's saying is that it's treating... There's a self-hosted silo now on my machine, and it's figured out a couple of grains that I wrote, and it's ready to serve up the services of those grains.
So pardon me. These guys do not have a better UI at the moment. But if you look at those... You need to zoom in a little bit further. Yeah, I'm trying to see how I can zoom in. Start with this one. Let's do that.
Yeah, it didn't work. No, it didn't work. Hang on. Maybe. So I start the server, and then I'm going to start the client, and then I'm going to walk you through the code while it's actually doing its thing. And hopefully by the time it's done... Okay, look Mano hands.
You've got to make sure that... That the container that I've got in my dev storage doesn't have any blobs, so I'm not going to cheat at this point. Oh, that's handy. Took its own time, didn't it? Oh, why?
Sheesh. Right. So we've got a server that's running now, and a client that's sitting there kind of mute, and it's doing its thing. And let's just pretend that it's... Yeah, we know that it's doing the right thing. Let's start there. How do you write a grain, right? The first question. Remember that there's a runtime,
as Mahesh pointed out, right? So what we have to do is define what the unit of work is going to look like. What does it do somewhere? And then the client is going to talk to something that implements that unit of work kind of thing without actually instantiating the grain by calling new, right?
Because you don't know where the thing is going to run. You don't know when it's going to run because it's all async. We've got, like, spatial decoupling, temporal decoupling. All of those good things are happening at the moment. So we'll start by creating two grains, and we're going to set up a workflow now. So the first grain is I'm going to sync Zero private application.
Zero is a company that also does online accounting. So this doesn't... There's nothing specific to MYV in this talk. And what this does is pulls in the company file from the cloud, like the first demo. It's going to pull all of that information into that account activity register
that we talked about, right? And then I'm going to pass that somehow, and we'll talk about the somehow in a minute, to another grain, which is going to generate a CSV with all the account balances by transaction, right? The same kind of story that we talked about. But instead of generating a plot, I'm going to spit out a CSV just so I can be interesting.
So this is the definition. Now, this lives in an assembly. It's written in C sharp because they're not cool enough to use F sharp. And as you see, you know, the spin of how wonderful this thing is,
it works really well. Up until the point you try and do something with F sharp on it. And then the rubber hits the road. So let's go in and take a look at the implementation. But first, right, the key thing over here is to create the interface, you extend the I grain interface as part of the SDK, okay? That's the first thing that you need to do. The second thing is any method that you actually have in it...
Has to be async. Has got to be async. So it has to return... A task of something. It returns a task or a task of T. That's right. And the third thing is that the message that you actually pass to the grain is basically the parameter that you have in that definition.
That's right. Sorry. I took that for granted. Now, what the... So the three parts of Orleans, as Mahesh pointed out, there's the runtime, which we started showing and all that. Then there's the Visual Studio support. And what this does is it generates...
Code generation to generate the proxy classes and so on and so forth. So it, in fact, generates a factory, as we're about to see in a minute. And we'll see the implementation of the piece of work first. And it's very similar to the thing that we saw earlier, which is go and get the company file
and save it to blob storage somewhere. And then this filthy hack is something I had to put in for F sharp. In C sharp, there would be one sweet little line of code that says, await that grain activation. And you're done. Unfortunately, the Orleans Task Scheduler
doesn't quite give us that level of support on F sharp, so I had to put this thing in there. And I probably have to go and talk to someone who knows F sharp to figure out whether there is a better way of doing this. But in any case, what we're doing here is chaining the result of that.
We're computing a grid, and we're passing it to the next grain. The next grain's probably the most boring grain in the world, apart from Hello World, which basically loads the company file, generates an account balance, and then saves it back to blob. So I expect to see when this whole thing works out that something comes from the cloud, gets saved to a blob,
a reference of that gets passed to the second grain. The second grain does something with it and saves its result to the blob, right? That's what I want to see at the end of this. I want to point out a couple of things in that as well. Notice that what we do is we actually extend grain base over here, inherit from grain base, rather. And that is part of the Orleans SDK as well.
So it contains a bunch of things that you can actually use straight away. It has methods such as what happens when you activate the grain and what happens when you deactivate the grain. And you can provide overrides for all of that stuff. So all that is provided in grain base. So effectively what you need to do, if you're going to create a grain yourself,
create an interface that derives out of iGrain. And then when you're implementing it, you basically extend grain base and implement the interface that you just declared. That's right. And in fact, the client application does not reference the implement...
In fact, nothing references the demo grains DLM. Nothing actually references that within Visual Studio because it's the runtime. It's the Orleans runtime that groks that and makes those services available. In terms of the interface DLL, that gets referenced here.
And the client is the simplest client in the world, right? Basically, what this is doing is cheating. It's starting up the server. And then we get a grain from this grain factory. The grain factory is actually a factory object that was code-generated for you
on the basis of those interfaces. So you don't have to write any of it. You make it look like magic. And mostly it works, right? So you get a grain. This argument is something that you pass if you want to have stateful grains. You can actually get different activations of them.
We like statelessness, so we're going to use stateless grains, effectively treating each of the methods on the grain as a pure function. Takes in something, doesn't do anything to mutate it, and spits out something. The fact that it's doing all of this via side effect is just a hand-waving thing, right? This is not the side effect we're looking for.
When John said that creates a server, that actually creates a client, right? No, no, no, that's the piece that actually runs the server. Okay, you've got a separate server as well. And then we basically wait on the synchronize. So at this point, if I do this, ta-da!
We have that little grid is actually the binary serialized version of the account activity register that got pulled off the cloud, and those category balances, CSV, is actually the source data of the stuff that you saw, the chart.
The very first transaction did the 4,130, and then you had the expense added there, and so on and so forth, up until the point where the final transaction tells us that that's what those categories look like. So we've successfully created, without knowing when the thing ran,
without knowing where it ran, we have now been able to create a workflow and exercise it end to end. So I think that's pretty much what I have for the demo at the moment.
So let's come back to where we started, right? Was the first thing we realized about the system that we had? Mutability was bad. We needed to have isolation. We had to have consistency within a workflow, isolation between workflows, decoupling of the stages of the workflow
so that you can achieve scalability, right? That's the objective. We want correctness and scalability. We've now discovered that immutability is what gives us the consistency across the workflow. We've discovered that the state management is what allows us to have isolation between the workflows.
And we've discovered that decoupling is what allows us to scale the application out. So even though you write very, very simple applications, a couple of functions here, you can now write extraordinarily scalable applications by putting them in the appropriate framework, either as a Q-centric workflow or through the active framework, right?
So we've discovered now two ways of actually going through and doing the decoupling. So what's the key takeaway that we have? The key takeaway is embrace a functional way of thinking about state. Because if your application uses workflows,
I can guarantee you that this will make your application scalable. What that means is you can't write simple workflow stages anymore. You have to consider very carefully the movement of state through the workflows, the propagation of changes in the state between workflows.
Very important. So obviously the first thing you want to step back from is the old way of, let me design my database so that I can design my application. It doesn't quite work like that. Think very carefully about data, how you're going to pass it around. Persistence is important.
We'll get that more or less for free once we start breaking the problem down in a different way, right? When you start persisting stuff, you can now talk about saying, I don't care when this stuff runs. I don't care if I have to wait for a period of time
before I exercise the next stage in the workflow. I don't care if another workflow does its thing before I do my thing. The order doesn't matter anymore. Consider the audience framework and look up the actor pattern in order to be able to find a nifty and easy way
of talking about a scalable platform on which to exercise your applications. Those are the key takeaways. I hope I haven't bored too many people with the accounting part of it. Really this is an application approach that you can apply to virtually any business application.
And there are some serious benefits you get out of this. I'm happy to now take questions.
This is a brilliant question. The question is basically for everyone to hear, is how does all this distributed work of computing, distributed way of programming and everything affect the way you debug code?
Considering that I have never written bug-free code in my life until I started working with FSHOW, right? Right. I think we hark back to the first demo. We talk about modeling the workflows. And if you talk about modeling workflows, you're really talking about pipelining functions,
data through functions, right? So in a very large extent, you can model the workflows in place using functional approach. And that'll get you a long way. Now in flight, right? None of this code is production safe, right?
You want to be monitoring everything. You want to be logging everything. The Orleans framework comes with an enormous wealth of support for observing the behavior of how your grains are behaving and so on and so forth. And so you'll be using a lot of logging to figure out what's going on.
But primarily, the correctness of the thing, you're going to have to use good, pure, functional approach. That'll get you a long way towards improving the correctness of your application. Does that? Yeah.
That's what I was going to say. In terms of tooling, yeah, you can still use Visual Studio for a lot of the stuff, right? And obviously, the best thing to do is to test a lot of the stuff in your own machine before you put it into production, right? And you'd be able to debug a lot of those things. For example, even in Orleans, you do get, when an exception happens,
you just do a try catch, and you can have the stack trace and so on. And there are some tricks to get the stack flow of, sorry, the stack that actually gets thrown in the server and so on. Yeah, look, distributed computing is hard. We're trying to find a way to make it easier.
And in effect, starting with a clean, pure, functional application will go a long way. And yes, you can use Visual Studio with F-sharp, right? So a lot of the stuff that I did was debugged that way. Yeah, question?
That's a very, very good question. The question is about whether unit testing is supported in Orleans. And yes, the answer is yes. There's mocking stuff that's available. There's unit testing stuff that's available. Primarily, you start by using, I mean, we come back to modeling your workflow
in pure functions, right? The moment you do that, you can use extremely powerful techniques like FS-Tech to make sure that your stuff works correctly and adversary test your code. But in Orleans itself, there's definitely support for mocking frameworks. There's support for being able to say,
hey, under this condition, I want the green to be able to do this, and you can actually test all of that, yes. I think it also goes back to the fundamentals of what you consider a unit is and how you actually apply the unit test on it, right? For a lot of stuff, you really don't need Orleans to actually test what a unit is, so yeah. Any more questions? No.
The question is about exception propagation on Orleans. Now, if you think about the kind of code we wrote, it looks like we're just calling a method on a function. The fact that the method's actually being executed
somewhere else and God knows which machine, right? And if it throws over there, how do I know what happened? I need to actually get it back, just like I would have got an exception back from just making a function call locally. And Orleans makes it very, very easy to do precisely that.
It simulates exactly that. It kind of abstracts away the whole thing for you, so you just still use try-catch blocks, just as if you would be doing it in C sharp, right? And yeah, with that, I will probably end the session. Thanks for coming. And if you'll make sure that you put in the light. Yeah, put the green one in there. The other two colors are not so nice.
Yeah, and yeah, that's right. And there are some Tim Tams. If you guys want to come by and try some Austrian biscuits, come on by for a chat and have some Tim Tams. Thank you. Thanks, guys.