Concurrency + Distribution = Scalability + Availability, a Journey architecting Erlang Systems - TIB AV-Portal

Concurrency + Distribution = Scalability + Availability, a Journey architecting Erlang Systems

00:00

1

Norwegian Developers Conference (NDC)

Cesarini, Francesco

Formal Metadata

Title

Concurrency + Distribution = Scalability + Availability, a Journey architecting Erlang Systems

Title of Series

NDC London 2016

Number of Parts

133

Author

Cesarini, Francesco

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/48787 (DOI)

Publisher

Norwegian Developers Conference (NDC)

Release Date

Language

Content Metadata

Subject Area	Computer Science
Genre	Conference/Talk

Speech

Text

Image

00:00

ScalabilityDistribution (mathematics)Concurrency (computer science)Software developerComputer scienceScalabilityBoss CorporationPhysical systemView (database)InternetworkingErlang distributionJSONXMLUMLComputer animation

01:41

ScalabilitySoftware developerArchitectureMiddlewareConcurrency (computer science)ScalabilityPresentation of a groupErlang distributionComputer programmingPhysical systemGraph coloringProgramming paradigmInheritance (object-oriented programming)Endliche Modelltheorie

02:47

Software developerConcurrency (computer science)State of matterMessage passingCrash (computing)Sheaf (mathematics)Computer networkConnectivity (graph theory)LogicDatabaseProcess (computing)Erlang distributionConcurrency (computer science)State of matterPresentation of a groupPrincipal ideal domain10 (number)Element (mathematics)DatabaseUniqueness quantificationProcess (computing)Functional (mathematics)ScalabilityRight angleSemiconductor memoryDistribution (mathematics)Type theoryElectronic mailing listMereologyErlang distributionMessage passingVirtual machine2 (number)Crash (computing)Connected spaceDifferent (Kate Ryan album)Form (programming)AbstractionCodeTupleLevel (video gaming)Validity (statistics)Computer architectureIdentifiabilityEndliche ModelltheorieWordObject-oriented programmingCartesian coordinate systemCore dumpParallel portData typeDependent and independent variablesSheaf (mathematics)Programming paradigmInstant MessagingIndependence (probability theory)CausalityObject (grammar)Content (media)Shared memoryEnterprise architectureBitComputer programmingNumberTheoryRadical (chemistry)GoogolLibrary (computing)Physical systemVariable (mathematics)Heegaard splittingVertex (graph theory)Near-ringSingle-precision floating-point formatMultiplication signInheritance (object-oriented programming)Projective planeExecution unitParameter (computer programming)Kritischer Punkt <Mathematik>ConsistencyServer (computing)SoftwareTextsystemDistributed computingFlow separationOverhead (computing)Physical lawDeadlockThread (computing)Forcing (mathematics)Computer animation

11:55

Software developerArchitectureScalabilitySystem programmingEndliche ModelltheorieDistribution (mathematics)Concurrency (computer science)Physical systemPoint (geometry)Wave packetBuildingResultantEmailScalabilityComputer architectureData managementVirtual machineWeb pageStudent's t-testDisk read-and-write headCheat <Computerspiel>NumberMultiplication signMUDCommunications protocolTerm (mathematics)Exclusive orJava appletText editorMereologyControl flowSheaf (mathematics)Computer animation

16:18

ArchitectureSoftware developerClient (computing)LogicService (economics)12 (number)Partition (number theory)Range (statistics)Hash functionErlang distributionVertex (graph theory)Electric generatorComputer architectureMereologyService (economics)Physical systemServer (computing)Wave packetWeb 2.0Type theoryBlogDatabaseScalabilityCore dumpMultiplication signAuthenticationBitLogic gatePolygon meshDistribution (mathematics)Process (computing)Bus (computing)TwitterMessage passingRemote procedure callNumberCASE <Informatik>Computer programmingSoftware frameworkLogicAlgorithmPhysicalismPattern languageSubsetPower (physics)Asynchronous Transfer Mode2 (number)Electric generatorSound effectControl flowSet (mathematics)Functional (mathematics)IdentifiabilitySheaf (mathematics)Limit (category theory)Presentation of a groupLevel (video gaming)Distributed computingDifferent (Kate Ryan album)Complex (psychology)outputSemiconductor memorySlide ruleScheduling (computing)Overhead (computing)Hash functionRoundness (object)Operating systemDependent and independent variablesRight angleTunisClient (computing)Computer hardwareIP addressPartition (number theory)Search engine (computing)Service-oriented architecturePrincipal ideal domainVertex (graph theory)Front and back endsRepresentational state transferGateway (telecommunications)Goodness of fit

26:01

Software developerPeer-to-peerComputer networkEmailClient (computing)Firewall (computing)Complex (psychology)Parameter (computer programming)Computer architectureServer (computing)Web 2.0Mixture modelInternet der DingeGene clusterAxiom of choiceElectronic mailing listPattern languageScalabilityTelecommunicationSpacetimeCommunications protocolFront and back endsTerm (mathematics)CASE <Informatik>Functional (mathematics)Connected spaceDistribution (mathematics)Information securityEndliche ModelltheorieInterface (computing)Physical systemRight angleLogicClient (computing)Peer-to-peerDatabase normalizationFlow separationService (economics)SoftwareBitArc (geometry)Queue (abstract data type)WeightPoint cloudCellular automatonDistributed computingGraph coloringComputer clusterVirtual machineWeb serviceSocket-SchnittstellePersonal area networkLevel (video gaming)Computer fileLine (geometry)

31:00

Software developerFunction (mathematics)System programmingError messageClient (computing)Multiplication signSystem callFrequencyData recoveryError messageServer (computing)Strategy game9 (number)Point (geometry)Process (computing)Web 2.0Total S.A.DebuggerDifferent (Kate Ryan album)SoftwareFault-tolerant systemDependent and independent variablesHigh availabilityClient (computing)Physical systemLogic2 (number)Database transactionSingle-precision floating-point formatErlang distributionStructural loadComputer hardwareCrash (computing)Fraction (mathematics)Regulator geneRoutingResultantFlow separationNumberHuman migrationSoftware bugType theoryMereologyArchaeological field surveyTelecommunication40 (number)RootQuicksortTerm (mathematics)Formal languageGraph coloringArithmetic meanProjective planeUniverse (mathematics)CASE <Informatik>Meeting/Interview

36:43

Client (computing)Software developerPosition operatorPoint (geometry)LogicFunctional (mathematics)Client (computing)Physical systemEquivalence relationScripting languageStrategy gameSingle-precision floating-point format2 (number)LoginError messageCASE <Informatik>NeuroinformatikInterface (computing)Dependent and independent variablesMessage passingSoftwareResultantCrash (computing)Web 2.0Multiplication signCondition numberFerry CorstenFront and back endsDebuggerComputer clusterComputer animation

42:14

Software developerLoginClient (computing)Client (computing)ScalabilityOverhead (computing)Server (computing)Set (mathematics)DatabaseConsistencyFreeware2 (number)Chemical equationShared memoryLogicType theoryError messageSource codeDifferent (Kate Ryan album)LoginMereologyStrategy gameSoftwareDistributed computingRight angleLastteilungPhysical systemComputer animation

44:02

Client (computing)Software developerAxiom of choiceWave packetSet (mathematics)DatabaseShared memoryLine (geometry)ConsistencyFront and back endsSoftwareComputer animation

45:18

Client (computing)Wave packetMereologyReplication (computing)Set (mathematics)Shared memoryDatabaseAxiom of choiceCASE <Informatik>SoftwareComputer animation

46:14

Partition (number theory)Computer networkSoftware developerClient (computing)Strategy gameData recoveryConsistencyCASE <Informatik>SoftwarePartition (number theory)Uniqueness quantificationView (database)Decision tree learningFunctional (mathematics)Physical systemDatabaseIdentifiabilityClient (computing)Axiom of choiceMereologyGroup actionStrategy gameFront and back endsLogicEstimatorShared memoryBit rateAnalogyPay televisionMultiplication signMainframe computerRight angleElectric generatorSet (mathematics)Wave packetDependent and independent variablesResultantWebsiteChainSign (mathematics)ConsistencySystem callAreaLine (geometry)WordData recoveryErlang distributionLibrary (computing)State of matter1 (number)Process (computing)Program flowchartComputer animation

52:59

Software developerScalabilityBasis <Mathematik>Set (mathematics)Strategy gameFunctional (mathematics)Interface (computing)Military baseVirtual machinePoint (geometry)Goodness of fitVertex (graph theory)

53:59

Software developerScalabilityData recoveryStrategy gameScalabilityPoint (geometry)ConsistencyComputer architectureComputer hardwareOverhead (computing)Shared memorySoftwareChannel capacityStack (abstract data type)MereologyPhysical systemResultantPlanningPressureState of matterSystem callComputer animation

55:58

Channel capacitySoftware developerClient (computing)PlanningPhysical systemSystem callStress (mechanics)Software testingPressureOperator (mathematics)Structural loadMaxima and minimaVirtual machineRegulator geneNumberPoint (geometry)1 (number)Queue (abstract data type)Gene clusterType theoryWhiteboardShift operatorParallel portSoftwareOperating system

57:30

BlogSoftware developerSoftware developerSupercomputerConnected spaceWhiteboard1 (number)Cycle (graph theory)Computer hardwareBefehlsprozessorConcurrency (computer science)Type theoryGene clusterBlogSingle-precision floating-point formatVirtual machineFrame problemMessage passingGeneric programmingTunisMultiplication signOperating systemSource code

58:40

Network socketSoftware developerGeneric programmingCodeOperating systemDifferent (Kate Ryan album)Network socketFunctional (mathematics)Computer hardwareProgramming paradigmSource codeChannel capacityCommunications protocolProgramming languageSoftwareSoftware frameworkType theoryServer (computing)Connected spaceWeb 2.0Operational amplifierProjective planeVirtual machineSoftware testingMusical ensembleDistribution (mathematics)Semiconductor memoryBefehlsprozessorRight angleErlang distributionUtility softwareSource code

59:58

Software developerFunction (mathematics)Programming paradigmProgramming languageSource codeFamilyReduction of orderType theoryPattern languagePhysical systemGene clusterCommunications protocolStrategy gameGroup actionElectric generatorPeer-to-peerStandard deviationService (economics)Functional (mathematics)Data managementData modelState of matterMathematical optimizationChannel capacityWeightInterface (computing)Principal idealComputer animationLecture/Conference

01:01:18

Software developerStandard deviationBit ratePoint (geometry)Shared memorySet (mathematics)HypermediaDecision theoryDatabaseStrategy gamePhysical systemPressureFront and back endsOperator (mathematics)Term (mathematics)Queue (abstract data type)Web 2.0Structural loadProcess (computing)Software maintenanceDistribution (mathematics)NumberTable (information)Message passingSynchronization9 (number)LogicRegulator geneRadical (chemistry)Virtual machineChannel capacityConfiguration spaceMetric systemDifferent (Kate Ryan album)Address spaceSemiconductor memoryMilitary baseServer (computing)PlanningMetropolitan area networkRational numberComputer animationLecture/Conference

01:03:34

Software developerDiscounts and allowancesScalabilityMultiplication signCodeBitRight angleDiscounts and allowancesPlug-in (computing)

Transcript: English(auto-generated)

00:13

Okay, welcome everyone. My name is Francesco Cesarini and just to quickly introduce myself,

00:22

I started working with Erlang back in 1994. And in 1995, I started as an intern at the computer science lab working with the actual inventors of Erlang. And I've never looked back. I'm 99, I moved to London and founded what has today become Erlang Solutions.

00:41

And the first book I wrote on Erlang for Riley, that one came out in 2009. Have any of you ever written a book here? Okay. One thing I can tell you is they've usually described writing books as to having children. As soon as you have your first child, you immediately forget all of the pains, all of the sacrifices, all of the sleepless nights. And you want to go

01:04

in and have another one almost soon after. And that's before reality actually hits you. And it's very, very similar with books. And ever since 2009, I've wanted to write the follow-up book on Erlang. There is a popular belief that by magic, you go in and you start using Erlang,

01:24

your system will never fail. It will scale infinitely and... Well, it will scale infinitely whilst never going down. And one thing I wanted to do in this book is bust a myth and instead go in and actually document what it is and how we do things. So that's

01:41

when I started writing, designing for scalability with Erlang OTP. And another tip to give you... And I said, about three years ago, I started writing this book. Another tip, if your partner comes to you and says, hey, I'm pregnant, we're gonna be parents.

02:01

You start jumping up and down of joy, you have eight and a half months to finish that book. If you don't, it continues on and on and on. And in this book, I start describing the whole programming model, the whole middleware in OTP, which you use when defining Erlang systems.

02:25

And it actually goes in and takes Erlang a step further and then describes how do we get the fault tolerance, how do we get the scalability. Now, how many of you here are familiar with the Erlang concurrency model? One, two, three. Are you okay?

02:45

Shall I give you a quick overview over it? Let me do that. So this is another presentation. Let me do it. Now, if you look at concurrency, there are two ways to deal with concurrency. You've got mutable state and you've got immutable state.

03:05

With mutable state, it fits certain types of problems and you work using shared memory. Shared memory brings with it threads, locks, deadlocks, whilst the Erlang way of working,

03:22

the Erlang concurrency model uses immutable state. In immutable state, you don't share memory. Instead, what you do is you copy the data. The processes communicate through message passing. And if they want to share something, you copy it from one process to another.

03:41

And I'm using the word processes here. I should actually be using a more abstract word, workers, because a worker, it could be an actor, so it could be an agent, or it could even be an object. If you think of small talk, small talk has objects. Objects do not share data and objects

04:02

communicate through message passing. That's the foundation of object-oriented programming. And the first problem you start having now with mutable state is if you've got two processes which share memory and a process crashes whilst executing in the critical section, any other process which accesses that critical section terminates because you don't know what

04:23

state the shared data has been left in. So you need to go in and clean up and terminate everything. The second problem with mutable state is assume you want to go in and start doing a distributed system. You've got part of your program running in Seattle, the other part

04:43

running in London. Where do you place your shared memory? Where do you locate your state? And assume you decide to place it in Iceland. So if you're doing something in London, doing something in Seattle, you access the shared state in Iceland. What happens if the connectivity

05:02

between the two fails? All of a sudden, you cannot access that data. So mutable state will work very well for certain type of problems, but what it does not do is it does not define failure and it will not scale because of locks. You need to use locks to actually go in and access

05:24

that data. If you look at the immutable state approach, so the approach Erlang has taken, if you've got something running in Seattle and you've got something running in London and your

05:41

process crashing whilst executing in a critical state, your state does not get corrupted. What you do is you lose your state, but if London has a copy of it, London can still continue executing irrespective of this failure. The second problem with mutable state is where do you locate your data? Well,

06:04

with immutable state, you do not locate state. You actually copy it. So you'll have a copy of the data in London and a copy in Seattle. The third problem is what happens if connectivity between the two sides goes down? Well, having copies of your data means that both sides can

06:24

continue running. What you do is you push the responsibility now onto the application layer and you need to make sure that when the connectivity comes back up again, you're actually able to handle this net split and make your data consistent again. So you could use eventual and causal consistency, CRDTs, or someone else's libraries or databases.

06:48

So databases which handle network splits. So this is a bit of the theory. In Erlang, we've got processes. Processes don't share data. To create a new process,

07:01

what we do is we call a built-in function called spawn. Spawn takes in a module, a function, and a list of zero or more arguments. This will create a new process with its own copy, with its own stack, running that function we've just specified.

07:21

Creating a new process takes up just a few bytes of memory. I think something like 32 or 64 bytes, depending on which architecture, 32-bit or 64-bit architectures. And it takes sub-microseconds to actually go in and create. So it's a completely different programming model.

07:41

It's a concurrent programming model. You need to think concurrently. For every truly concurrent activity in your system, you create a process. You hear that Erlang's great at messaging. Now, messaging could be instant messaging. It could be SMS. It could be enterprise messaging buses. And what happens there is that for every message or every request, a new process is created.

08:02

It handles that request. And then as soon as that request is finished, that process terminates, storing the state somewhere. So it can survive until the next request comes in. So imagine an instant messaging server. You go online. You log on. Your process is created. You go online. You log on. Next thing you want to do is you've got 1,000 friends

08:24

who are all online as well. You want to send them a status update that you're online. 1,000 new processes would be created. They each work in parallel to each other, independent of each other. They each send off the status, say, Google's going to come online. And then once they've done that, they terminate.

08:43

And so what this means is that you've got something which will scale on multi-core architectures. It will scale in distributed architectures because a multi-core, what you do is you have these processes have their own copy of the data. The biggest

09:01

issue to scaling on multi-core architectures is memory lock contention when you try to access shared memory. But as each of these processes have their own copy of the memory, they don't need to access. They don't need to set locks. They can just continue executing in parallel a process per core at the same time. And the same applies to distribution.

09:24

Now we see right there, and I'll get back to distribution later, but spawn will return a PID, a process identifier, which we've bound to a variable right there. And that is a unique identifier which will identify that process, not just in a single node, but in a cluster of

09:41

nodes which are all connected to each other. So you could have created a process on a separate node, and then you can then send messages to it. And the way we send messages is using the exclamation mark. That's the send construct. We pass the PID on the left-hand side. We have the unique process identifier which spawn return and any valid data type on the

10:05

right-hand side. So you're not bound to one particular type of data. You can send any form of data you want. Here we're sending a tuple. A tuple is similar to an array. It consists of a fixed number of elements. The first element here is an atom, similar to constant literal,

10:23

and two integers. Now this data right here is copied from this process onto this process, and it's placed in this process's mailbox right here. You're following me so far. And this process is executing its code. As soon as in its code

10:41

it finds the receive clause, it will go into the process mailbox and start looking and trying to match a particular message. So we've sent the tuple data one, two, three, and we try to match it to the atom start in a tuple that fails. We try to match it to atom stop. That fails. We try to match it to the tuple where the first element is atom data,

11:05

and the last two elements are the variables x and y. This will succeed binding the variable x to 12, y to 13. And even here, you tend to send small messages backwards and forwards, so the copying creates very little overhead, and it usually takes sub-microseconds

11:22

to create, to actually send the message itself. And this is irrespective of the number of processes running in your virtual machine. The Erlang virtual machine can today scale to tens of millions of processes within one single virtual machine. And it puts scalability, vertical

11:41

scalability to the next level on a single machine. And then with the built-in distribution, which I'm going to cover as well, that's how you then scale out horizontally. Is that okay? So this is the distribution model, sorry, the concurrency model in Erlang.

12:02

So if we go back to the book, so when I wrote the first book, I was cheating, because I think one of the things we do at Erlang Solutions is a lot of training. I developed all of the training material for the Erlang course, all of the training material for the OTP course, so the

12:20

advanced course on which this book covers. And what I did with the first 10, 12 chapters is I had the examples ready, so I went in and I used the examples, and then when I write, I just start lecturing, lecture, lecture, lecture. And once I'd lectured, I go back,

12:41

I review everything, and I start thinking, okay, where were people struggling? So those reading, when I was teaching, what concepts were people struggling to understand? And I make sure, add much more text, beef it all up, make sure it's clear, and then from there, go back, check the manuals, documentation, other resources, clean that up, and then I

13:02

start thinking, okay, what questions did all the really bright students who got it ask? And those tend to become side notes, so discussions on the virtual machine, on the underlying protocols, how things work, how you optimize, and so on. And that worked really well until I hit chapter 13. Now, 13, well, it's never a good number,

13:26

but the biggest problem, what I wanted to describe here is how do we architect our systems? How do we actually get this built-in fault tolerance, and how do we get it to scale? And these are things I used to do in my sleep. I do it in my sleep. I've been architecting systems for quite

13:44

a while now, but we had no examples, because we didn't cover this in the training course. What we did in the first 12 chapters took five days. That's heavy enough. Everyone's head is

14:01

hurting already by the end of it. So what I did is, okay, I want a chapter describing how I do things, how I architect things. And to Buddha's joy, we called it node architecture, nothing to do with node per se, even though we were first even with that term.

14:21

And once again, I start writing and I start documenting everything I do. And out of pure curiosity, coming here today, I went in and looked at an email. I was looking for a particular email sent to the editor where I wrote, I'm almost done. Give me a few more days and chapter 13 will be finished. And that email was dated March 3rd, March 4th.

14:47

And probably in August, September, I sent the chapter off to the editor. So another six months. By that point, it had turned into about 80 pages. And my editor gets it in and says,

15:02

this is not one chapter. And next thing he did is threw it back and broke it up into four separate chapters, which he suggested. And that's where we started getting the steps, started breaking down all of the steps, which we do in even smaller steps.

15:23

So we started looking at distributed architectures. We started looking at full tolerance and scalability. And once again, when that was done, and these chapters have now been finished out for review, we ended up stopping at about 120 pages. So probably three, four times more

15:41

than what I originally envisioned. And the result was actually the steps involved in designing and architecting distributed systems, which is no different if you're using airline, if you're using Scala, if you're using Java for that matter. These are the principles one should be following. But the conclusion I ended up coming here is that concurrency plus

16:02

distribution together will greatly facilitate availability and scalability. And the two of them, availability and scalability, you'll have to do trade-offs along the way as you're dealing with them. So this is the first section. I'll break this up into four parts now.

16:27

The first part, which I'm going to focus on, is looking at distributed architectural patterns. So you go about to creating a system which needs to be both scalable and highly available. The first thing you need to do is take all of your functionality and divide it into what

16:45

we call node types. So each node type is used to describe the overall responsibility of each node. One way of looking at it is microservices. So each node will provide a particular service.

17:01

And to explain some of the terminology we use, we've got front-end nodes which usually face the client. So they would be web servers. They could provide a RESTful API, a REST API. They could provide XMPP. So what you do is when you're dividing up your functionality,

17:20

you try to make it self-contained so that each node can be reused and that each node is either just memory-bound, CPU-bound, or IOBound, which makes it much easier to then deploy. You focus on different types of hardware. You deploy based on the capability. If it's memory-bound, you deploy it on hardware with more memory. If it's IOBound, you've got the ability to fine-tune

17:44

the operating system and so on. The next phase, in the middle, we've got a logic node. And the logic node is where all of the business logic of your system happens. And then finally, we've got service nodes. So service node could be a database. It could be the gateway towards

18:02

a third-party API. It could be an authentication server, whatever it is you want. And by breaking it up, you end up having... You simplify the whole problem in your mind.

18:21

And each self-contained part is easier to describe. I won't go into the advantages of it. I think that's fairly evident. Now, once you've got your node types, the next thing you need to do is decide what distributed architectural pattern you're going to use. And there are many different approaches, at least when you're dealing with

18:41

Airline. The most common distributed architectural pattern is the one where your nodes are all fully meshed. So what happens is when you start off distributed Airline, you've got two nodes, and you could connect two nodes with each other. And you can then start sending,

19:00

creating processes on remote nodes and receiving messages from these remote processes on a local node. And you do that using the PID. The PID is bound to a variable, so it's completely transparent. So if you've written a program, if you've done it right with very little change, a program which you wrote, which was supposed to run on a single node, can easily be distributed

19:22

to a cluster of nodes. You're following me here? Yeah, good. So the most common architectural pattern is the fully meshed one. And this has its limitations because when you

19:41

start hitting, all of the nodes connect to each other and then start sending keeper lives alongside the data you're sending. And so usually you start hitting a limit when you reach about 100 nodes. And the limit hits you because of all the overheads you do with

20:01

the underlying message passing, with the underlying TikToks, monitoring, and so on. And once you've hit that limit, you need to take it a step further. And you need to start deciding what other distributed architectures do we use. A very common one is the Dynamo principle from Amazon. What you do is you have a set of identifiers, you have your key set,

20:29

and what you do is you break it up into smaller partitions. So assume you're running four nodes in total, you have your key set and you break it up in two to the power of 160

20:42

smaller subsets. And each little subset will then be passed on to a node. And each node here would represent a physical node. And what you do is you start forwarding your traffic,

21:02

so your job scheduling algorithms, your data partition algorithms, use a hash. So you get your session ID, it could be a user ID, it could be an inbound IP address. What you do is you hash it, and that hash will then forward it to one of the vnodes, you decide. Assuming you have 64 vnodes, you hash it, you pass it on to vnode number two,

21:26

and vnode number two, a simple lookup will tell you it resides on the physical node number two. As the physical number two runs vnodes 2, 18, 38, and 50. This is the same principle Amazon uses to scale. It's the same principle Cassandra uses. It's the same principle React uses.

21:45

And in fact, this is a slide I stole from a Basho presentation. Have any of you seen this slide before? No? Okay. So what happens with React Core is you've

22:00

got a Dynamo style scalability where your actual core itself, so all the logic nodes are running React Core right here, your actual core itself is still fully meshed, and it starts acting as a large kind of job scheduling algorithm. Requests will come in through the frontend node,

22:21

they get sent to one of the logic nodes, one of the vnodes running on the physical logic nodes right here, and the logic nodes then go in and start using services which are then scattered around. So your actual core is still fully meshed, you can scale to about 70 to 100 nodes,

22:41

and then you start creating islands all around which provide certain types of services. And I can comfortably say an architecture like this with maybe 50 to 100 node can very easily handle all of your Twitter traffic. Because the requests come in, you then pass them

23:06

on to your logic nodes, and then you do whatever you need to do with them. You pass them on to your database, you store them in your search engines, your feeds, and so on. So using the actual core of Twitter, the traffic they've got today can easily be handled with about

23:25

a core of 70 nodes plus all the services around it. Obviously it gets more complicated than that. And what happens is you also start getting full tolerance here for the simple reason that logic nodes will also have partitions and copies of the data as you go around.

23:46

So this is one of the typical distributed architectures we tend to use. A second one is kind of a bit more enterprise-y. It's service buses. It's a service-oriented

24:01

architecture, service orchestration, and all of your nodes are connected by service bus. So requests will say come in by the web server, the web server will then forward them on the service bus onto a logic server, the logic server will handle that request, and it will then

24:21

maybe might need to do some billing, it passes it onto a service node for billing, you might want to log it, it passes it onto a service node for logging, and if it's an item you've just bought, you might then pass it onto another service node to get it shipped. And now I think one of the things which came to my mind when I was writing this book was

24:43

why are there no simple frameworks which are just standard which everyone uses? And why are there no... that should be the case. I mean, if you think of Akka, you should be using Akka cluster, or you should be using something which will automatically get

25:01

everything to scale and elastically scale up and down. And the realization when documenting everything is that distributed systems have so many differences in them. So no two systems will be... at least systems which have to scale massively will be alike. And so everyone, all the large companies we've worked with, have ended up doing their own layers

25:24

when dealing with these trade-offs. And I'll get back to those a little later on. Now, you need to ask yourself, do you really need one of the distributed frameworks, or do you just want a simple cluster? There have also been cases where we've gone in and done an architecture review, they needed to handle 10,000 requests per second, but they

25:44

had service buses, they had AMQP messaging buses in them, and they were just adding a lot of complexity because they believed that a simple system couldn't scale to the levels they needed. So most of the time you will get away with complete distribution, but

26:06

often we start seeing much more complexity being added. Another very common approach is probably the most scalable distributed system of all is the peer-to-peer approach. And each client,

26:21

each logic node, each client also acts as a server. Usually they're all very similar to others. Start thinking in terms of BitTorrent, start thinking in terms of Kazaa. And this is a very interesting architecture which I still haven't seen being used in the airline space to reach those levels of scales. But with the whole IoT space, I think there are more and

26:41

more discussions and talks about it right now. And just like all the other distributed architectural patterns, you've got your peer-to-peer architecture right here. You can actually start adding front-end nodes, you can start adding backend nodes, service nodes around some of these logic nodes themselves. And so that's how you diversify and keep it.

27:06

So I've never seen it used in the airline space, but I think it would be really, really cool. And we're seeing it with, we're going to start experimenting very soon with ad hoc loosely connected devices which connect to each other. But I think it's still a bit futuristic,

27:21

but we're getting there. Now, once you've decided which distributed architectural pattern to use, the next step is looking at what network protocols your nodes need to use, your clusters need to use among each other, and how you deal with the communication across clusters.

27:47

Now, this is a very simple example. We've got our front-end nodes running web servers and our backend nodes running the business logic. And we need to have a separation between the front-end and the backend. And put the web servers, web servers, client facing,

28:03

we want to put them behind a firewall running in a DMZ. And so in the old days, we used to do that physically, web servers on one machine, business logic on another, and we then would control the networks between them. That doesn't happen. Well, now today we do the same, but logically, if we were running on the cloud,

28:25

we'll simulate this logically. And if we need a firewall between front-end and backend, now airline distribution is completely open. You get access to, if the front-end and the front-end are connected to each other, you get full access to the rights of these nodes

28:44

from any other node in the cluster. So there's very, very little security built in for the simple reason that airline was meant to run behind secure firewalls, server side behind secure firewalls, wasn't meant to be customer facing or front client facing. So in these cases, you might want to go in and pick SSL,

29:03

you know, SSL connectivity. Stop distributed airline, use TCP IP. You could use MPI, ZeroMQ, UDP, SSL. And you could use sockets or take it a step further, you could use REST, you could use AMQP, SNMP, XMPP, MQTT and so on. I'll stop short of saying Corba as well.

29:26

You can use anything you want except for Corba. And so the next choice you need to make is what network protocol do you want to use between the two. So you basically, and that could be a mixture of them. So you might have a cluster using distributed airline

29:43

and then that cluster might speak to another cluster using distributed airline using REST or using ZeroMQ or RabbitMQ or any other protocol. So it's usually a mixture of them. Once you've done that, your fourth step is, so you've now gone in and you've divided everything

30:03

into standalone nodes. You've picked the distributed architectural pattern. The first step is you've picked the network, the connectivity between these nodes. The fourth step is you need to go in and start defining interfaces, which these nodes provide.

30:21

And the interfaces usually consist of a model function and a list of zero more arguments. And once you've picked your interface, you need to validate your choices. Picking your interfaces will force you to decide how you split everything up into microservices.

30:42

It will pick them in a way that reduces the communication between nodes. The less you hit the network, the better. And you want to also use your data redundancy. So have the data full tolerance and scalability, but you don't want to go in and start duplicating data just in case. And once you've done with these four steps,

31:05

it's a time to start thinking about full tolerance. So how do you deal with full tolerance? You need to start thinking in terms of availability. So availability is what we define the uptime of a system over a certain period of

31:21

time. And high availability refers to systems with very, very small downtime. So I'm sure you've heard the stories of airline reaching five nines availability and nine nines availability, in fact. Have you heard those claims? Okay, so there have been claims in your airline systems you have reached nine nines availability. And it's interesting because Ericsson engineers

31:45

are not allowed to openly talk about the uptime of the systems that developed. But what happened here was British Telecom. They were running the world's largest voice over ATM backbone as part of British Telecom 21st century migration. So the 21st century project.

32:01

What they'd done is they built the world's largest voice over ATM backbone. So they were going from the point to point approach to having a common backbone for all your phone calls. And they went in and all of the systems controlling this backbone were all written in airline. And what happened is British Telecom went out with a press release which said that under the

32:22

trial period, which lasted about six months, we achieved nine nines availability. So no one really knows how British Telecom calculated this nine nines availability because that's about three milliseconds of downtime in total. What happened during that period was, this was

32:40

2002, it was pop idol. There was a surge in calls during the finals. It was before everyone texted. There was a huge surge in calls. Everyone was calling in to vote for the winner. And what happened was the load regulation on one of the switches kicked in too late. So the node crashed. And during that crash, it took three seconds for the standby node

33:05

to kick in. So for three seconds, all existing calls were kept. All of the calls which were being set up were kept and were set up correctly with the standby node taking over. But you could not make any new calls. And that was one of the probably hundred switches.

33:22

So they calculated the three second downtime on that switch over those six months times the number of switches and came down to nine nines availability. Now the truth, unfortunately, is very, very different. We usually tend to say five nines availability, which is a few minutes, about three and a half minutes of downtime per year, is much more like it. And what you do is achieve it at a fraction of the effort. Now, going back to high availability, high

33:47

availability is the result of your system having no single points of failure. So it's the result of a system being fault tolerant, resilient, reliable. So these are all things you need to do. And if we look at no single points of failure, the first thing which comes to mind is fault

34:04

tolerance. So when we talk about fault tolerance, we refer to the ability of a system to act predictably under failure. So assume we've got a client now sending a request to a web server. The web server then forwards the request onto a logic node.

34:20

If that logic node happens to crash, or if the network to that logic node goes down, or the hardware crashes, or there's a software error, the front end node here detects it. So what we do is we can either detect it either by receiving

34:41

an error back or through a timeout. And if that happens, we send back an error to the client. That's what we mean by fault tolerance. So front end servers can either return a response or an error back, and then it's up to the client to decide what to do.

35:00

And what we mean is we need to act predictably. So acting predictably, we can mean looking for alternative nodes or just returning a visual error to the user saying, sorry, when you clicked buy, the transaction failed, try again. Now, the beauty with Erlang here is

35:22

that if you're dealing with Erlang, dealing with errors, you handle errors in all different types of errors in exactly the same way. So you won't go in and differentiate between network errors, hardware errors, software errors. You send the request and you get back an error, you handle the error in the same way. So it could have been a process crashing,

35:45

your recovery strategy deals with it. It could have been hardware crashing, you have the same recovery strategy. Network error, network issue, you have your same recovery strategy. You're following me here. So what you need to ensure is the requests are

36:01

fulfilled. So that's where resilience comes into the picture. So resilience is the ability from a system to recover from failure. So we send the request, the client receives an error. Now the client can expect an acknowledgement or an error, and it knows that if it does receive an error, it goes in and sends a new request again. The new request hits the front end web server,

36:25

thanks to the resilience, the web server is up again, or it's maybe a separate web server. The request gets forwarded to the logic node, the logic node that sends back a reply, and the client receives a reply. And the actual client, which sent the original request, has no idea that things en route have been failing. So it's hiding it all away from them.

36:44

And thirdly is reliability, which we need to deal with. Reliability of a system is its ability to function under particular predefined conditions. So in this example, we have a client which sends a request, the request gets forwarded to

37:04

a front end node, the front end node parses it and then hands it over to the back end node. So imagine, once again, you're buying a book. When dealing with that request, you get the back end node crashes, the request gets sent back up, the error gets sent back up to

37:22

the front end node, which then forwards the request onto another node. The other node successfully executes it and returns a reply, which gets forwarded on to the client. And what we have here is an example of a single point of failure, where assuming that at least

37:42

we've got at least one front end and one back end node, one logic node, we will be able to handle that request. That's how we do it in Erlang. What's really important with this mindset is that for every interface function in our nodes,

38:03

so as I said earlier on, you need to define your interface. For every interface function in your nodes, you need to pick a retry strategy. And you've got three different retry strategies you need to think about. One is fire and forget. So we send the request,

38:21

but we don't wait for an acknowledgement. And that's usually the request which is used for SMS, it's used for instant messaging, where you run the risk of possibly losing that request, but if you lose it, there will be no financial penalty. And it's one of the trade-offs you do.

38:42

The second approach is at least once. So you send off a request and you wait for a response. If you don't receive that response as a result of a timeout or if you receive an error, you go in and you try it on a second node.

39:04

So we go in, we send the request, and if it was fire and forget, we send that request and we just send back okay back to the client. But if we have at least once, we'll continue trying until we get a successful acknowledgement. Now that's a bit tricky because a slow node

39:22

in distributed computing is the equivalent of a dead node. So we could have sent that request to buy a book to a logic node, but that node is incredibly busy, it doesn't respond, the network could be busy. So we could have lost acknowledgement coming back. So the front-end

39:42

nodes then go in and request it again to another logical node. That logical node then successfully manages and handles that request, so you've bought your book and you get back a reply which gets sent back. Yes, your book has been ordered, but next thing you know is that you received two books back because your first request had happened. And so that's where,

40:03

you know, and in those cases you wouldn't use at least once. You might use it for instant messaging where you want to be guaranteed that the message reaches its recipient, but where you're allowed to print that message twice. I don't know if it's ever happened to you

40:21

that you received the same text twice. That's what happened. I don't know if it's ever happened that you've never received a text which someone sent. Yeah, that's at the most once. And then there's a third approach, which is a third retry strategy, which is exactly once.

40:42

Now exactly once means you send a request and you either get back a success or a failure. And when you're dealing with distributed systems, if you get back a failure, you cannot go in and guarantee that that request has been handled because it could have been handled, but you could

41:02

have lost the acknowledgement or maybe it never reached its recipient. So exactly once is usually what's used in banking. And what happens if everything's successful, great, you don't need to think about it, you just think of the positive cases. But if something does go wrong, you need to go back and start analyzing the state and at every step involved to try to figure out

41:27

where that issue went wrong. And you used exactly once approach when dealing with money. And there are cases where if a request which may not fail, fail and scripts try to figure out what went wrong, the escalation point is human intervention where a human actually goes in and

41:46

starts reading the logs. Imagine you need to send a few billion euro from Brussels to Greece. I'm sure Greece would love many times at least once approach, but if that money gets lost along the way, you need to go in and check at every intermediary bank

42:04

where that money's ended up. So the fifth thing is for every interface function, you need to pick the retry strategy. And the next thing you need to do is decide how you're

42:20

going to partition and share your data. What you do is you distribute your data for scale and you replicate it for availability. So you will have a trade-off between the two, and I'm going to cover these trade-offs, between scalability and availability.

42:42

Now there are three different types of data sharing strategies you can take in your distributed system. And I'll start with the share nothing. So the share nothing is equivalent to sharding.

43:00

Each node will have its own set of data, but it's not distributed. So assume we've got an instant messaging server, we've got two clients logging in, the load balance sends the first client to the cluster on the left, and we store the session in the session one in that database. Second client logs on, it goes into another part, and we source session two right down there.

43:25

Now if we now lose, we send a request which causes one of the logic nodes to crash, and the client one sends that request. If that request gets routed now to a logic node where we don't have any session data for that particular client, we send back an error,

43:47

an error unknown session. So what would happen then is the client would kick in and start a new login approach, we now have both sessions in the same database. You follow me? So this is the share nothing approach which we use. And obviously this, just like the

44:05

fire and forget, this is the cheapest one. There's no network overhead, there's no need to worry about the consistency of your data. Next in line is the share something. So assume we want to go in and ensure that, once again, you're buying a book and your session must never

44:24

fail, but you can lose items in your shopping cart. And this is how Amazon works. You go in, you start buying a book, you go in, you buy it, and it gets stored in the database on the left.

44:42

And so what we're doing is the only thing we're sharing is the session data across all the nodes, but we're not going in and actually sharing any of the shopping cart items. We then go in and, you know, this back end node here crashes. We then go in and say we want to buy a train set as well. The train set now gets added to this particular node.

45:06

So when we go in and want to do a checkout, the only thing we have now is a train set. But there in between, we've not had to log on. So this is the second choice. And the third choice is share everything. So you go in and you start sharing all of your data.

45:26

This example here is a typical example of your primary-primary replication. We go in, we decide we want to buy a book. The left-hand cluster handles it, but we then copy the fact that we want to buy a book on all nodes. We lose a node, we want to go in and buy a train set.

45:43

The book has both, you know, and now this database has both a book and a train set. And now assume this part of the node comes up again. We'll now copy the data from this database to this one. And we decide to go in and remove a book. You know, we'll have

46:02

a book and a train set. We remove the book. We only remain with the train set here. And this obviously is the least scalable of them all. And I keep on talking about failure. And with failure, I also mean your network partitions, for example.

46:22

So we could have a case where we go in and we start buying. So let's have a case where we go in and we start buying a book. And then we get a network partition. And we go in and we buy a train set. Now we lose the fact that because of the network partition, we have replicated

46:44

the session database, but we've lost the fact that we wanted to go in and buy a book because the two nodes now are not connected. Network comes back up again. You know, if you're using an eventually consistent or causal consistent database, what will happen then is that the data

47:06

is going to be merged. So as soon as these two, if you've picked the right libraries and the right database, you realize that, oh, you know, our data here is inconsistent. Let's merge the two shopping carts. And that's, once again, if you read a Dynamo paper,

47:24

that's the way Amazon does it. If it has two shopping carts which are inconsistent, it merges them. And there might be a risk in that case that you've actually deleted an item on one shopping cart, but it's still available. The deletion didn't take place in the part, you know, in the database, in the partition data because you couldn't reach it.

47:45

What happens is when you go in and you press pay, you'll actually end up paying even for the data you've deleted, even for the items you've deleted. So, and finally, you know, another word about retry strategy, you know, I spoke earlier about the importance of, you know,

48:03

the exactly once strategy where you go in and you send the request. You know, the request is successfully executed. You send back a reply. What you need to do is, and then the reply is maybe lost to the client. What you do here is the client will never get a response, he gets a timeout, he says, okay, I maybe lost that request. It goes back in and it

48:23

sends a new request. And now it's the responsibility of the logic node to realize that, oh, this client has maybe already paid for this item or he already has put this book in and the reply you originally sent back to the client. And we tend to do this with, you know,

48:41

premium rate SMSs where each SMS will have a unique identifier. So an aggregator tries to send the premium rate SMS to you, it gets back an error. It tries to send it again, as long as it uses the same identifier, it's up to the business logic of the system to actually go in and realize that that's premium rate SMS has, you know, has been handled.

49:05

So, you know, I mean, you need to think, you know, different functions require, you know, different retry strategies. And what you need to do is when deciding if you want to use at most ones, at least ones and exactly ones approach, you need to examine all of the possible failure scenarios, which can take play in your whole cold chain. And one of the things you want to do

49:26

is, you know, take particular care when you're dealing with exactly one's strategies. Now, all of these things, you know, come off and result in trade-offs. You know, we had one

49:41

customer who said that, oh, our system has had 100% uptime. It's never failed. It's never gone down for the last 30 years. And we start, and the system was actually running on mainframes, on a mainframe. So, you know, what we did is we started digging into what they were saying,

50:00

what they actually meant with uptime and availability. And we quickly realized that they had front-end nodes and they had so many front-end nodes, they always made sure that none of these front-end nodes ever went down. And so, if someone sent a request and the backend was down, someone manually would pick up that request and process it manually.

50:24

And so, in their view, their system had 100% availability because they successfully actioned all of the issues. But what we say is it certainly was available, but it was not reliable because all of a sudden, you know, the reliability hits you on the latency.

50:42

If the request had been taken care of, the system being up would have taken milliseconds, now it was possibly even taking hours. And so, in your choices you've done along the way, when architecting your system, there are two choices. You know, you basically have done trade-offs between your consistency and availability. Now, if we look at the recovery strategy,

51:03

if you've got the exactly once approach, the biggest trade-off you've done right there is around availability. Because if something goes wrong, you might have to take your system offline to guarantee consistency. Because you don't know if the request was successful or not. The at-least-once is not as consistent because you might have handled an action, you know,

51:28

on two separate nodes and not be aware of it. But it certainly is much more available. And the most available system, but obviously the least consistent, is the at-most-once. Because you've sent off your data, but you've never received an acknowledgement that it might have been received.

51:43

And so that data you sent off, the request you sent off, might have been lost. So the availability is there, but you're not sure, you cannot guarantee the consistency of your data. So you might have inconsistent data, inconsistent state. And the same analogy could be done when sharing data, but when you're sharing data, you're sharing it for reliability.

52:08

And the reliability, you know, if you share everything, that's going to be the most reliable system, but the least available, once again. Because sharing might fail, and so if sharing

52:21

of the data might fail, as soon as you've got a network issue, your system is not available anymore. Because you can't share it to all the nodes. And you're not sure if the remote node you can't access is dead. You're not sure if it's still functioning and handling requests elsewhere. You then have the share-something, which is an in-between, and you've got the

52:41

share-nothing, which once again is the most available, but the least reliable. And this is, once again, the way we reason when we're architecting an Erlang system. This is the stories we walk through and the way we think and reason.

53:00

And that brings us to five and six. For every interface function in your node, pick a retry strategy, and for all your data, pick your sharing strategy. And once again, your sharing strategy is not necessarily, it's on a per data set. So it's not for everything. You might decide to share all your session data,

53:23

but not the shopping cart. So once again, this is on a per item basis. So once we looked at full tolerance, we need to start looking at scalability, and looking at scaling out. And what you need to do here, let me just check. What you need to do is, you know,

53:41

scaling out must be carefully integrated and orchestrated. It's not that easy to do. And back in the good old days, what we did is we scaled vertically. We had a single point. We could either write faster code or buy more powerful machines.

54:02

Today, we scale horizontally. We throw more commodity hardware at the problem. And, you know, when you start reaching this point, you know, in your whole architecture, you might not be aware of it, but you already have made all of your trade-offs in your horizontal scalability approach. Because, you know, with your scalability,

54:23

you might have, you know, with your share everything approach, you know, you will have huge overheads copying the data and then guaranteeing its consistency across all your nodes. And if you look at, you know, the scalability side, you know, if you have at most once

54:44

a request, it will be, you know, probably at the most once it will be the most consistent, but once again the least scalable. At least once will, you know, be somewhere in between. And the only ones, when you send the request only once and don't care about the result,

55:03

that will obviously be the most scalable. You know, you don't need to store any state, you fire and forget, but it's the least consistent because you might lose that data anywhere in your software stack. You know, on the network layer, hardware, because of hardware failures and so on. Sharing data is the same. When you share everything,

55:22

it's the most available system, but it's the least scalable. So sharing everything, if you lose one node, you'll have your data elsewhere, so you just can continue handling that request elsewhere, but it will not scale because of all the overheads involved. Share something is in between, and then share nothing is obviously the most scalable, but it's also least available,

55:44

because if you lose a request, then you lose the data associated with it. And, you know, finally a really important part to look at is the whole idea of capacity planning, which, you know, I won't go into that much detail with, but what you need to add to your

56:01

system is back pressure. So, you know, back pressure is the point where you start rejecting calls because your systems overload, and you know that if you accept more calls, your system is going to fail over. And the second is load regulation, and road regulation is usually handled with queues, and what you do is you use queues to even out peaks and troughs

56:26

in your request. And so assume you've got a third-party API, which will only allow a maximum of 50 simultaneous requests. You've got 100 users, you know, wanting to simultaneously make a request to queue up 50 and handle the other 50 ones concurrently.

56:43

And you need to do that, you know, with all external APIs and certainly with inbound systems. You know, back in the days, we never had to do this. If you go back 15, 20 years, so what you do is you start stress testing your system, and then the virtual machine itself would handle back

57:01

pressure. So if, you know, you were running 100% CPU, what you would do is you would start getting a degradation. You know, you still handle the same number of requests, but at the cost of latency. Today, you know, the problems are shifted, you know, from the virtual machine to the underlying operating system, to your network. That's really where the bottlenecks

57:23

we're seeing today are. And finally, you know, there's monitoring and preemptive support, which is the last chapter which I will not go into. But just to give you, you know, a taste over the types of systems, you know, we're developing. When we're working with Erlang, you know, we'll work everywhere from the Raspberry Pi and clusters of Raspberry Pis or Parallella

57:43

boards all the way up to supercomputers and everything there in between. And, you know, this is just a blog post over how, you know, WhatsApp is using concurrency. They were the first ones to start fine-tuning the VM and FreeBSD to receive, you know, to handle

58:00

about 2 million simultaneously OpenTCP IP connections. So, and they managed to do that in 2012. So every single, you know, and their goal, you know, before being acquired was to reduce operational overheads and reduce the cost of hardware as much as possible. So they wanted to squeeze in, squeeze out every single CPU cycle out of every machine. So by updating

58:23

and handling, well, but by optimizing FreeBSD, optimizing the Erlang VM, they hit 2 million simultaneous TCP IP connections. So at any one time, if you were actually sending or receiving a message, each machine, each node can handle 2 million users. And that was a few years ago.

58:40

Today, many of you have heard of Elixir and Phoenix. And without having to fine-tune the VM, without having to fine-tune the underlying operating system with a completely generic framework, keep in mind the WhatsApp code was highly optimized. With a completely generic framework, they hit 2 million WebSocket connections on a single machine. And they couldn't go

59:05

further, not because of, you know, the Erlang virtual machine or the hardware itself, because they were not running it. Neither WhatsApp nor the Phoenix web framework, which is being tested, were running on full memory utilization or CPU capacity. The problems here were in the

59:21

underlying operating system and the network. So IO and network capacity were the big problems. TCP IP and Ethernet were not good enough for it. So ongoing research work now happening, looking at, you know, getting Erlang to actually run on the bare metal.

59:41

There is a PhD project ongoing right now where you bypass the whole OS layer and also looking at other, you know, adding InfiniBand, you know, to the distribution and adding other different types of protocols. So just to wrap up, you know, and

01:00:00

That's how we do it, that there is no, there's no, that's a secret source, and it's a secret source which you should apply to any programming language. It's a programming model itself. And the way we do it is we split up your functionality in small, manageable servers, standalone nodes, microservices. This allows you to optimize

01:00:21

and do capacity-based deployment based on the particular needs of that node. Decide what distributed architectural patterns you're going to use. So you want to go with the dynamo principle, peer-to-peer service orchestration, or fully meshed, or a mixture of them. And once you're done,

01:00:41

decide what network protocols you want, your nodes, node families, and clusters to use, and when. Now, node families is a group of nodes of the same type. And then node families together will form a cluster, and your system will consist of many clusters. And fourth, define your node interfaces,

01:01:02

your state, and your data model. And when doing so, try to optimize, reduce duplication, and standardize. Next step, pick your retry strategy for every single node. And at this point, you might want to go back

01:01:22

and reiterate through the design decisions you made in the first five steps. And pick your data sharing strategies, incredibly important. If you're using a database such as Mnesia or React, you can actually decide how to share your data.

01:01:42

And you can try all the different data sharing strategy, share everything, share nothing, share something, by just configuring the distribution of the tables. So it doesn't have to be in the business logic. You reiterate for all the steps, and then you look at your cluster blueprint. What you do is you try to understand,

01:02:01

if I want to scale elastically, for every X number of front-end nodes, how many new back-end nodes do I need to add? Because you could add certain processing capacity, you could add more web servers, but then these in turn could go in and actually sync your logic nodes. So you need a cluster blueprint,

01:02:20

and you're trying to understand the ratio of scaling up, but also even more importantly, scaling down. You're in low capacity, shut down your cores, shut down your machines. Identify where to apply back pressure and load regulation so as to make sure that if you had a huge surge, you reject requests, but you continue serving in a predictable manner

01:02:42

requests which are going through the system. And last but not least, and that's subject for a completely different talk, look at your operation and maintenance approach. So what you need and what's really important is ensuring that you've monitored at full visibility over what's going on. In terms of preemptive support, you want five nines availability, you need to address issues before they escalate.

01:03:02

So you need to know exactly what's going on. You need to have, we tend to monitor about 100, 150 metrics, about 20 different alarms and about 20 different notifications just on a generic non-business specific VM,

01:03:20

on a per VM basis. And that will show you if you're getting memory, you've got long message queues or whatnot, and it will allow you to go in and address these issues before the system terminates. Okay, any questions? Okay, shameless plugin. Book I'm writing, you can find it on BitTorrent

01:03:43

and it's still in early release, so it's not out yet. But what it is, is the last four chapters describe what I've been talking about. And if you like it, no one should be paying a full price for an O'Reilly book. So use the discount code AufD, which will give you 40% off the printed copy, 50% off the early release.

01:04:01

And I shouldn't be telling you that either, but it works on all other O'Reilly titles as well. So yeah. All right, thank you so much for your time. If you've got any questions, I think we're out of time now. Feel free to come up and ask. I'll be around you for the rest of the day. Thanks. Thanks. Thanks.

Recommendations