Principles of distributed systems
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 133 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/49592 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
NDC London 2016126 / 133
2
6
10
12
15
17
23
24
28
30
31
32
35
36
39
40
43
44
45
47
51
52
55
58
59
60
61
62
63
64
67
69
71
73
74
75
82
84
86
87
97
103
107
108
111
112
114
115
117
120
123
126
128
129
132
133
00:00
Large eddy simulationSoftware developerSystem programmingMessage passingGroup actionCoordinate systemComputerSoftwareComponent-based software engineeringGoodness of fitSystem programmingDatabase transactionPhysical lawBitSpherical capSoftware engineeringInformationPresentation of a groupPower (physics)LaptopHand fanComputerMessage passingQuicksortTheoremBootingRoutingSoftwareQueue (abstract data type)Group actionBus (computing)Service (economics)Computer animationSource codeMeeting/Interview
03:05
SoftwareSoftware developerSystem administratorInfinityBand matrixTopologyComputer networkE-bookFreewareServer (computing)ComputerServer (computing)LaptopSystem callMessage passingInternetworkingMereologyRouter (computing)Multiplication signConnected spaceBand matrixBackupDomain nameSoftwareDomain nameBlogTime zoneLink (knot theory)Data miningDatabase transactionCartesian coordinate systemInternet service providerQueue (abstract data type)Data storage deviceTable (information)Interface (computing)System programmingBitOpen setLine (geometry)DatabaseSinc functionService (economics)SpacetimeState of matterProcess (computing)Sheaf (mathematics)WebsiteSpecial unitary groupMultilaterationBeer steinLocal ringShared memoryWeightArmPolygon meshSequelNetwork operating systemMusical ensembleDependent and independent variables
11:28
Software developerSQL ServerMessage passingDatabase transactionDistribution (mathematics)Execution unitEmailContext awarenessData managementInfinitySource codeMultiplication signMessage passingForm (programming)Database2 (number)Shared memoryClient (computing)Queue (abstract data type)Level (video gaming)MereologyCodeConnected spaceSystem programmingDatabase transactionCartesian coordinate systemContext awarenessVolume (thermodynamics)MultiplicationCoordinate systemSheaf (mathematics)Group action1 (number)Boss CorporationWebsiteWeightGoodness of fitAreaDependent and independent variablesSequelStability theoryStatement (computer science)Service (economics)Student's t-testBootingQuery languageSubject indexingOrbitMassWordData structureWindowSoftware frameworkComputing platformMathematicsDefault (computer science)Server (computing)Special unitary groupInjektivitätGame controllerBus (computing)Social classRight angleCommitment schemeArithmetic meanLoginComputer animationMeeting/Interview
19:52
Form (programming)Software developerGamma functionContext awarenessQuiltMessage passingConnected spaceContext awarenessLevel (video gaming)Database transactionServer (computing)Category of beingObject (grammar)Library (computing)DatabaseConstructor (object-oriented programming)EmailCodeOpen sourceGroup actionComputer animation
21:42
Software developerEmailContext awarenessDatabase transactionDistribution (mathematics)Process (computing)Spherical capTheoremSystem programmingPartition (number theory)Perspective (visual)WebsiteData storage deviceDatabase transactionDifferent (Kate Ryan album)Data managementSequelDatabaseGroup actionDistribution (mathematics)Order (biology)ComputerSoftwareVirtual machineTelecommunicationTheoryCentralizer and normalizerSound effectLine (geometry)Multiplication signPartition (number theory)WebsiteSystem programmingConsistencyComputer configurationWeb 2.0Client (computing)Acoustic shadowForcing (mathematics)Relational databaseServer (computing)Default (computer science)Message passingPerspective (visual)Insertion lossGoodness of fitSpherical capTheoremCalculus of variationsQueue (abstract data type)Drop (liquid)Computer animation
27:25
Software developerConsistencyState of matterGroup actionConsistencySystem programmingTerm (mathematics)Entire functionOrder (biology)BitProcess (computing)Message passingTheoremQueue (abstract data type)Closed setWeb 2.0Multiplication signCuboidException handlingSpherical capDatabaseEmailTable (information)Query languageDataflowRule of inferenceRandom number generationRepetitionInteractive televisionForcing (mathematics)Point (geometry)Data centerComputerSoftwareData storage deviceDistanceCausalityPhysical lawNumberComputer animation
32:56
Software developerConsistencyConsistencyMessage passingDatabaseSynchronizationSpherical capDatabase transactionDifferent (Kate Ryan album)Presentation of a groupProcess (computing)System programmingMultilaterationComputerSystem callComputer animation
35:15
Software developerIdempotentMessage passingProcess (computing)Database transactionTable (information)Queue (abstract data type)Pattern languagePolygon meshEmailService (economics)State of matterGoodness of fitVotingInsertion lossDatabaseRight angleInjektivitätCrash (computing)Event horizonMultiplication signLatent heatWorkstation <Musikinstrument>LogicSoftwareException handlingOrder (biology)IdentifiabilityDifferent (Kate Ryan album)Instance (computer science)Decision theorySurfaceBuildingObservational studySelf-organizationSheaf (mathematics)Mechanism designSystem programmingLine (geometry)WordServer (computing)CodeBus (computing)MassElectronic program guideThread (computing)MultiplicationPhase transitionData storage device
44:21
IdempotentSoftware developerSystem programmingDatabase transactionBitComputerPattern languageSystem programmingEmailConsistencyGame controllerGroup actionComputer animation
45:02
Software developerE-bookFreewareSystem administratorInfinityBand matrixComputer networkTopologyE-bookPattern languageEmailBitMessage passingNetwork topologyUniform resource locatorMathematicsPoint cloudSoftwareMereologyConsistencyDatabase transactionMultiplication signSpherical capGroup actionComputerStaff (military)
Transcript: English(auto-generated)
00:06
So, good afternoon, and welcome to this talk. If you're having a good time, I certainly am, but I was at another presentation just now, and it was a really good presentation, but it was also quite hot in the room, so
00:22
I almost fell asleep. I hope I don't bore you to sleep, but we'll see. I'm Dennis from The Stealth, as you can see. There is some info on how you can contact me if you wish to. I'm a software engineer at Particular Software, and Particular Software makes and serves us.
00:47
What I want to talk about is distributed systems at first, talk a little bit about what they are, and some about the policies of distributed systems, then go on with how
01:01
transactions are handled, or how you deal with transactions in distributed systems, and I want to talk a bit about the CAP theorem. This session, the goal of it is to make you aware of some of the principles of distributed systems that I wish I was made aware of back when I started developing systems
01:24
like that. It always reminds me, Martin Fowler, he wrote a few good books, and he wrote one book in which he said his first law of distributed computing is don't distribute your objects.
01:43
Distributed systems isn't about distributing your objects, but the law still applies in some sort of way, because it's really hard. There is a lot of new stuff you have to think about, and hopefully I can introduce you to some of this.
02:01
So what is a distributed system? I like to have a look at Wikipedia for a quick definition of what a distributed system is, and this is what Wikipedia says. What I find particularly interesting is it is networked computers, and they communicate by
02:21
passing messages, because I'm a big fan of writing a system based on messaging, so instead of creating RPC kind of style calls to other systems, I normally put a message on
02:41
the queue, and another system tries to read it. And there's, as I said, a lot of difficulties, which, for example, I experience even in the real life, because at particular we work distributed, and it looks something like this, and fortunately my laptop is pretty power hungry, so I only last for 30 minutes
03:05
on the beach or so, but with all my colleagues working distributed, you have the same problems as you would normally have with a distributed system. For example, there might be a huge latency in messages arriving because of time zones.
03:25
Other people aren't always on. They might be asleep. They might do something completely different. So these problems, they have been documented for quite a while, and they're known as the
03:45
eight fallacies of distributed computing. Who have heard of these eight fallacies before? Only a few. Okay. Maybe it's a little bit too hot, but these are the eight fallacies of distributed computing. They've been, how do you say this, created or invented or made up by Peter Deutsch
04:06
back in 1994 at Sun Microsystems. The eighth one was added later. And these are all fallacies, so something you should take into account, because this
04:21
is what happens when you don't. And for example, latency, I just talked about it, it's an interesting issue. For example, Ingo Remmer, he's a German guy, he wrote a blog post once about chatty interfaces versus chunky interfaces, where due to latency, you might be better off with
04:43
an interface that accepts really big messages instead of continuously chatting over the network, perhaps to the other side of the world. Because maybe you've tried pinging the other side of the world, like in the United
05:00
States for example, and it takes on average 250 milliseconds to get there and get back with a reply, and that's just like a ping command or something. So when you continuously start talking to the other side of the world within your system, it takes, each message takes up a latency of 250 milliseconds, which can take quite
05:24
a lot, whereas a big message can take use of the bandwidth available. And bandwidth keeps on increasing, yeah, we get better lines and gigabit lines, and
05:42
I don't know what, but latency, that will probably never be solved unless you perhaps prove Einstein wrong and find something that's faster than light. But bandwidth on the other hand is also not infinite.
06:01
Probably every one of you does have a network or an internet connection at home, and I don't know about you, but I from time to time have problems and I have no idea why they exist, but sometimes my router just quits or, for example, the network gets
06:23
really really bad and you start streaming like a movie from Netflix for half an hour and then suddenly it keeps on getting problems. And also at this campsite, where I was once, and they had very good Wi-Fi when everyone
06:42
was at the pool, but when it started raining and everyone had to get inside, everyone would start using Wi-Fi and then suddenly the bandwidth dropped so bad, or the bandwidth would remain the same but the part that I got out of it, which made sure that I sometimes couldn't
07:05
even browse the internet instead of having conference calls or streaming movies when everyone was at the pool and I was by myself behind my laptop. Anyway, these are some of the fallacies of distributed computing.
07:24
There's just been a nice book released by a colleague of mine and it's about these fallacies. It's quite a funny book and it reads very well, and in the end, just to remind you,
07:43
I'll be providing you a link so you can download the book if you want to, but another one is the network is reliable. I don't know about you, but some time ago, there was an earthquake in Taiwan, I think
08:04
it was a couple of years ago, and the undersea cable, it just broke, resulting in the fact that a lot of ISPs, and I don't know who registered domain names, they had
08:21
to renew the domains because the undersea cable just broke, their internet went completely gone, and the domain names couldn't be renewed, so they were open again to the entire world, and some really important domain names, they were claimed by others, and I have no
08:42
idea what happened then, but yet the network isn't reliable. For another example, could be an internet provider where they have a router and a backup router and when the main router, it just cracked, it broke, and, of course, the backup router,
09:00
it picked up and started processing everything. The only thing was that the routing tables weren't updated properly, also meaning in the death of the internet for a lot of people. So this is all what we have to take into account when we develop our software and distributed systems.
09:22
A good example of this is store and forward with MSMQ. Does anyone use messaging at all? And how many of you use MSMQ or have used MSMQ? Okay, a few.
09:41
Well, the way MSMQ works is when you send a message, it doesn't send it directly to the other application or whatnot, not even to the other server. The first thing it does is it stores the message locally, and then your application
10:01
can continue to work, and MSMQ itself tries to connect with the other node, and if it's available, it sends over the message, and, of course, if it's not available, it just remains in the queue on node A, and then when it's transactionally on node B,
10:25
the connection is broken up again, your application can then pull the message from the queue, and this all happens transactionally or can happen transactionally. It's how you set up MSMQ in your queues, but it's very, very safe.
10:42
However, if your message arrives at your system or your service at node B, that's not the only thing we do with the message. Normally, we store it in SQL Server or Oracle or whatever kind of database or data
11:00
store you're using. And it's since ages we've been used to open up a transaction to SQL Server and make sure everything is transactional, so it's consistent, and your database isn't
11:22
messed up and anything, but if you already have a transaction open with MSMQ, then what if one of the two fails? So you can use, if you want to, and if you set it up properly, distributed transactions. So how do these work?
11:41
Here's a small piece of code. You're using the transaction scope. This is a default .NET class that handles transactions for you, and you can see that I open up a message queue, the queue customers, then I read from the queue,
12:00
I have the message, then I open up a SQL connection, update the customers table, and I execute the query, and both fall into the same transaction. But it's not automatically because the transaction scope actually has no knowledge of how to create a connection in SQL Server, and it also has no
12:23
knowledge of how to create a connection in MSMQ because both have resource managers, and that's where the transaction scope talks to or talks with. But even then, the transaction scope isn't smart enough to handle both,
12:40
and that's where on the Windows platform, the Microsoft Distributed Transaction Coordinator comes into play. Does anyone use the Distributed Transaction Coordinator? Does Distributed Transactions? Yeah, not too many. It's because a lot of the times, this is how people react when you say you use
13:03
the Distributed Transaction Coordinator, and it's for a reason. And to explain this, I've once learned about the two generals story, which kind of makes it clear. For example, there's a village in a valley somewhere, and it's
13:22
surrounded by mountains. Yeah, and there are two generals, and they want to attack the village. But it's really, really important that they attack at the same time. But they can't decide on it beforehand because they have to have an overview of the terrain and whatnot.
13:40
So they both decide, you'll go east, and I'll go west on top of the mountain. And there's one general in command, and he decides on the time, and he needs to notify the general on the other mountain. So what does he do? He sends a messenger on a horse, and he goes to the other side of the mountain.
14:06
And the messenger tells the general on the other side of the mountain that they'll have to start at 8 o'clock in the morning or something. However, the messenger, it might be captured.
14:20
So he needs to know if the messenger arrived on the other side. And the only way to let the first general know is if the second general replies with another messenger and says, I received your message. But then again, how does the second general know if the messenger ever arrived at the first?
14:41
It's very simple. He needs to send another messenger. Well, you get the idea. This can go on infinite, and you'll never know for sure. And that's how two-phase commit works with the distributed transaction coordinator. It first asks every resource manager, are you ready to commit?
15:00
And they all reply, yes, sure. Then the coordinator says, well, go commit. And then they acknowledge, I'm done with committing. But somewhere in between, when the coordinator says, OK, everyone's ready,
15:21
go commit. Some might never respond. And then it's too late to roll back because every resource manager that did respond and did receive the follow-up message, they have all committed their data, and there's no way back. This doesn't happen often, but there's no way of telling.
15:42
And most of the pain with distributed connections is when you have a lot of resource managers and especially when there's, for example, a lot of latency, asking all the resource managers, are you ready? And when one resource manager or multiple
16:01
don't respond for a few seconds, it takes time and time and time. And the worst part is that most of the time, distributed transactions have the highest isolation level. That's serializable. And serializable means that even other clients can't read data
16:26
that you're accessing. So they all have to wait. And if there's a lot of latency with all the resource managers, it can take up multiple seconds before the logs on the database or whatever are released by all the resource managers.
16:44
And well, that's not good, especially if you have a system with high-volume messages and processing all the time, and everyone starts creating huge logs and the others start waiting. And it can take up an insane amount of time,
17:02
which is definitely bad. However, what I just showed you was this application. And when you want to get a message out of the queue and you want to update the database and make sure that everything works correctly,
17:22
the only way to be, well, 100% sure is because of the general story, not possible. But the most sure that everything succeeds is by using distributed connections. Because for example, when everything's committed into SQL Server and you want to tell your messaging system, I'm done with it,
17:43
you want to tell Amazon Queue, it's OK, the message can be deleted, and something goes wrong there, the message gets processed again. And we'll talk about some scenarios, how to deal with that as well. But distributed transaction makes it simpler.
18:01
The only thing you have to do is make sure the entire transaction is really, really short. And with this kind of system, where everything is under your control, it is very much possible. For example, getting the message from Amazon Queue.
18:23
Amazon Queue is locally on your system, which isn't a problem. And SQL Server, if that's your own SQL Server and you have control over all the SQL commands and the statements you throw at it, then it's not that big of a problem. Because in the past, I've built system
18:43
and we were able to get a really good throughput on messages and didn't have to worry about serious locking issues at all. Of course, it's not just with distributed transactions, it's with any kind of transactions. You have to take care of your SQL,
19:02
you have to take care of your indexes and whatnot, or else a regular transaction can be as bad as a distributed transaction. So for example, this is a message handler that we create with Nservice Bus or Mass Transit
19:23
or whatnot. And here you can see that a message is coming in. And I'm having a DB context, a database context, this entity framework. And I tell the context, add this claim, save changes,
19:43
and I publish a new message. And I've got another example right here. This is real code. Here you can see the message coming in. This is the constructor of the handler.
20:01
Here I've got the database context coming in and the bus again. So here I create a simple new entity, an empty entity, and I use AutoMapper. I'm not sure if you're familiar with that. That's an open source library that
20:20
can map from one object to another. It just copies all the properties and it has some additional features that are really good. So first I map the message to the entity, and then I save it. And then I map it again, but now against another message.
20:43
And then I publish the message. And why do I publish the message? For example, when I want to send an email. We talked about transactions and whatnot. SMTP, for example, doesn't have any transactions. So instead of adding that to this handler, this message
21:04
handler, I delegate it to another message handler. Because if something goes wrong, you have no, of course, you can develop against it. But if something goes wrong, then the transactions have to be rolled back and whatnot.
21:22
Whereas I delegate it to a different handler, and if it goes wrong there, it just can just retry the email over and over again until it has access or connection to the SMTP server and whatnot. And it can actually send the email like this, yes.
22:00
So that's another option to take the SMTP out
22:03
of the transaction and send it to a different handler. And of course, publishing something again in the example we've been talking about so far using Amazon Q, it's just putting another message onto the queue on the local machine.
22:21
So there's no communication with another resource manager on another server or whatnot. It's just your local computer keeping the transaction really, really short. So MSDTC, or distributed transactions in general, it's not like they're really good or they're really bad.
22:43
It's just taken to account that they're very usable. You can really trust your software way more than when you don't have transactions at all. Because again, I'll be talking about them as well, having no availability to transactions.
23:04
But MSDTC, or distributed transactions in general, aren't evil by default. So we've been talking about SQL Server or relational database. There are also, of course, databases or data stores that aren't transactional at all.
23:23
And that's where no SQL databases, they came in, like document databases and whatnot, and they were built in the shadow of the CAP theorem. So what is the CAP theorem? There's a lot of discussion going on on the internet
23:44
about whether or not CAP is actually a theorem, because you have a theory, which means it might be true. And a theorem is kind of like, well, it's been mathematically proved that it's true. But apparently, some others go into discussion of that as well.
24:05
The idea behind it is when you have consistency, availability, and partition tolerance, is that you can only pick two. You can't have all three. It's impossible. So for example, when you have a centralized system,
24:23
for example, the FET clients that we were used to build in the past, you automatically get availability and consistency because you are not working with a cluster or partitions or whatnot. But what if you do have to be partition tolerant?
24:46
What other option would you choose? Would you drop availability or would you drop consistency? Well, I'd say let's try to match the business perspective.
25:01
For example, I don't know if you buy stuff on the web from time to time. This is a great picture I got from the Best Buy website. It's a big web shop in the US. And I don't know if you guys got it here in the UK. But on Black Friday, and Black Friday
25:22
is kind of an important day for shops, they just were expecting a snow blizzard or something, or they got a snow blizzard. And now their site is offline. And in the Netherlands, we got a nice variation of that.
25:42
We put a guard in front of the website. When it's really, really busy at the Bayekorf, a Dutch shop, you get to see the guard and you can't get in until it's your time that you get in. So what it says here in Dutch is I'm
26:03
calculating how many people are in front of you. And I'm calculating how long it will take before you can get in. And it's even too busy to do that. So the system definitely gave up availability for consistency
26:21
because it's completely down. And a real problem occurred to me a few weeks ago when I wanted to order the Force Awakens tickets. Have anyone seen the Force Awakens? Oh, that's too few. But everyone who raised their hand immediately
26:41
got a smile on their face. So that's good. But the website for ordering tickets was offline, which was, of course, not too good for me. But all this is just money down the drain. So offering up availability, like saying when we have an issue, when there is a partition,
27:03
the first thing we offer is availability. So we shut the website down, we shut the shop down, and we just don't continue. Well, that's just money down the drain. But if we can't sacrifice or offer up availability, then we need to drop consistency.
27:22
Well, luckily for us, it's not dropping or sacrificing consistency entirely because, of course, some smart people have thought of different ways to achieve this. And one of them is base. And this says I'm basically available.
27:43
This means that the system does guarantee availability. But in terms of the CAP theorem. And that's where eventual consistency comes into play. Soft state is where the system is sometimes in a state.
28:03
But without user interaction, it can still change. And that's for example when, I don't know, your shop closes at 5 PM or something, which on the web is impossible, of course. But imagine that your web shop gets closed.
28:22
There might still be messages in the queue which change state. Or some other process might trigger a timeout on which another message can be sent,
28:41
changing the state of your system. So hopefully you get what soft state is now. And then eventual consistency. It means that in time, the system will have consistent data.
29:01
This is a bit harder, but I'll try to explain it with an example about beer. Who does like beer here? Almost as many as like The Force Awakens. But tonight there will be a party at the NDC.
29:23
And I hope everyone's going. So they ordered a lot of beer. And I used a brewer from the Netherlands because of the next picture. But there was a lot of beer ordered. And so a truckload. And here you can see it's coming or on its way.
29:42
And then suddenly it had a problem. And this actually happened. The entire truckload of beer was on the street. So now this beer can't be delivered. But in their data system, in the data center, in their system, in their databases, it's sad.
30:03
No, seriously, it was paid. It was delivered. Sorry, it was on its way in a truck. And it's just this little check box that said delivered, which can be checked. So this is, of course, in the business, impossible.
30:23
There will always be found ways to deliver the beer anyway. And of course, we engineers, we always think of our own solution. But the idea is that the business always
30:42
finds a way to compensate for these kind of actions. And a lot of times in software, we just develop the happy flow. And we have to think with the business about what to do if something doesn't go right.
31:01
For example, when you do messaging in a shop like amazon.com and books are ordered, they might use the CAP theorem and eventual consistency and whatnot.
31:23
But at some point, so they start sending messages. And Oprah Winfrey, they said, everyone has to buy this book. So now everyone goes online, goes to amazon.com, buys the same book, and they're eventually consistent. So they don't query the database.
31:42
How many books do I have in store? And this user buys one copy of the book. So the inventory goes one down, and I have to bill it and whatnot. And a lot of tables get updated. That places a lot of locks on the database. And they are reliable on this database performing.
32:03
So amazon.com uses messaging. But now messages are in the queue, and the system is processing over and over again, creating orders for every single customer. But they had like 1,000 books, and the books were ordered like 10,000 times, I don't know,
32:24
just some random numbers. So what does the system do? Does it continuously throws exceptions 9,000 times to make sure that, I don't know, nothing happens because there's an error, and someone else will refill the inventory,
32:43
put all the messages again in the right queue. They have other rules or other processes in place for this. They might send you an email. Your book is out of order. I know you ordered it. And perhaps you paid for it. Perhaps you want a different book. You want to wait for the book, or you
33:01
want a voucher for, I don't know, anything else at amazon.com. But there are different processes. And that's kind of how eventual consistency works with messaging. You find ways to deal with the problems that messages arriving
33:22
a bit later create. And of course, there's a lot of scenarios as well where you want to immediately tell something to the customer about what happened, but you can't because the message hasn't been processed yet, and there are all kinds of scenarios for that, which we can talk about after the presentation
33:43
because they are out of scope. But the inventor of the CAP theorem, he said eventual consistency is about compensating for mistakes instead of preventing them altogether.
34:00
And eventual consistency is a hard thing to grasp in your mind. And a lot of people ask questions about what do I say to the customer when they hit the Submit button and nothing is processed yet. But where a lot of people make a big issue out
34:21
of eventual consistency and how to deal with it, a lot of people are already using it, but just differently. For example, caching, they don't think for a second about introducing caching, whereas caching is probably
34:41
always out of, not always, but it's easily out of sync with the reality in your database. So you're presenting data to your users that might be stale, just as if you were using messaging and eventual consistency and whatnot, but with caching, it's no problem.
35:02
So keep that in mind when you want to build systems using eventual consistency. So we talked about transactions and when messages are in the queue, you can get them
35:21
out transactionally. But sometimes your queuing mechanism isn't transactional at all. And maybe you don't want to use distributed transactions. But for example, RabbitMQ or Azure Service Bus or Azure Storage Queue or whatnot, they don't support transactions at all.
35:41
How do you deal with them? Well, then you can build item-potent services. And it's easiest explained with, for example, elevator buttons. The best way, the item-potent services,
36:00
they handle each message or repeated messages, is a better word, the same way, just as in an elevator. When you call an elevator, you can push the button one time. And pressing it multiple times doesn't make a difference.
36:21
Some people do think it makes a difference and keep pushing it as if it comes quicker or something. But you should build your services item-potent as much as possible. So for example, you might remove an item from your order
36:40
and a message arrives, remove item from order command or something, and you delete the order line in the database. But if something, for some reason, the message comes in a second time, there's no problem because the order line in the database
37:03
is already gone. You just deal with that exception, and the message will be completed as well. And for example, when can you have duplicate messages coming in when, for example, Azure Service Bus?
37:21
Azure Service Bus has a lease on messages. So when you pull a message from Azure Service Bus, you get a specific amount of time to process the message. And when you're done, you're telling Azure Service Bus, I'm done with it.
37:40
But if the lease expires, for example, because your system crashed, then when the lease expires, the message is put back in the queue, and it can be processed by another service or another instance of your service. So that's an easy way to get duplicate messages.
38:02
And for example, you should definitely take care of mass processing messages that take longer than the lease on your queue. Because then the message is put back in the queue, and another instance or another thread of your service
38:21
starts handling the message, and that can lead to way more problems. So item potent services. So let's have a look at one. For example, here the sender is putting a message onto a queue, and it gets an ID.
38:40
And the ID is pretty small, but it doesn't matter. It's a GUID. You don't have to remember anything. So the message is in the queue, and the green service pulls it from the queue and starts processing it. Then we just had a look at a handler.
39:02
The handler starts processing the message, updates the database and whatnot. And then just before it wants to tell the queue, I'm done with it, it crashes. So the message stays in the queue, and it gets processed again.
39:20
Now with item potent handlers or item potent services, there's no problem because it can be picked up again, and there will be no state change in the database. But if you're not item potent, multiple things can happen. You can start processing the message over and over and over again. But for example, another problem is
39:42
when you tell the queuing system, I'm done with the message, but your transaction fails. And then the business or all the data that was in the message and needed to be in your system is just gone. So to handle this is a good way
40:00
is to use the outbox pattern. And there are many names for it. But the first thing you do when you retrieve a message from the queue, you put it in the database. And the reason for that is that you put it in the database and you immediately tell the queuing system,
40:20
I'm done with it, and you commit the transaction. So there's the least amount of, how do you say this, possibilities that this might fail because there's no custom code for any handler being executed.
40:43
And then when it's in the table, two things can happen. When the message comes in again, you can check in the table, hey, the message's already there. So you de-duplicate immediately your messages. And when you pick up the message from the database
41:02
and start changing your tables, it's no longer a distributed transaction. It's just a single transaction on your database. So the tables have to be in the same database, of course, or else you'll escalate to a distributed transaction again.
41:24
But now what I did regularly was I published messages. Does anyone use publish subscribe with messaging? Oh, quite a few, good. So I used to publish events.
41:42
And the problem, of course, again, can occur with publishing messages. I get the message from the table, update the tables, publish the message, and something fails. So for example, the message is never published, or the message is published.
42:01
And if the message never is published, I have a problem, because there's business data and business logic. There's business data in the message and some logic behind it that needs to be executed in another handler. That's a problem. There can be even a bigger problem, because that's when the message is actually
42:21
published onto the queue with a new ID, a new GUID, and the transaction fails, and the original message stays in the blue table on top. Because then, again, this message will be picked up.
42:45
Everything was rolled back, so that's OK inside the database. So the database will get into the proper state. But then when your software publishes a new message, it creates a new ID for your message.
43:01
So now there are two messages which are actually duplicate messages, but they are created with different identifiers. So the other side, the orange event subscriber, has no way of telling that these messages belong
43:22
to each other, so you can build item potent services. I don't know for how long, but this isn't the problem that can be solved with item potent services. So when you publish a message, pretty logical. The first thing you do is you store it in another table.
43:43
You store it in a table so that when the transaction is completely committed, another process looks at this table and sees, hey, I need to publish some messages. So then it picks up as it just did the other way.
44:03
It picks up the message from the table, publishes, and there's, again, very few possibility or a little possibility that something strange might happen there. But now you've kind of solved the transactionality without building item potent services
44:23
with the outbox pattern. This is extremely hard to implement, but it's a good exercise, all right? OK, so we've looked at distributed system principles.
44:40
We had a look at some of the fallacies of distributed computing. We talked a bit about distributed transactions and why they are not that evil, as a lot of people think. They can be very evil, but if you're in control, there's no harm in checking out distributed transactions.
45:04
And we talked about CAP theorem, where we had to choose between one of the three, and eventual consistency with messaging. And we talked a little bit about item potents and the outbox pattern. So as I promised, the free e-book.
45:25
And if you'd like, we can have some time for a Q&A. Does anyone have questions?
45:42
Everyone's too busy copying or photographing the. There's a question. Could you come to the microphone? So in the eight fallacies, there's one that says the topology never changes.
46:03
Was it about, and isn't it covered by the other fallacies? Because we kind of try to use network in a way that obstructs us from the topology, usually. So we don't even know what the topology is in the first place. Sorry, the last part was? Sorry?
46:20
The last part was? As developers, we usually don't care about the topology of the network, and we usually use it in a way that where the topology is abstracted away from us. Yeah, definitely with the recent cloud stuff.
46:40
And for example, I'm not too familiar yet with all the containers like Docker and everything. But definitely there, things will probably change. I've never thought about it like that, but it's definitely something that
47:02
reduces a lot of those problems, but of course, introduces a lot of other problems. Because, yeah, how do you say this? There's, for example, the transactional ability of the lack of transactions there
47:24
is something that's probably causing a great deal of pain. The topology doesn't change. Did that answer your question or not?
47:46
What does it mean? Well, I don't understand the question fully, but maybe if you can come to me afterwards, maybe then we can go a bit deeper into it and have a more lengthy discussion.
48:03
Anyone else have a question? No? OK. Well, I hope everyone got the URL.
48:23
And else, I want to thank you for your attention. And I hope you enjoy the party and the rest of the NDC.
Recommendations
Series of 30 media