We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Commands, Queries, and Consistency

00:00

Formal Metadata

Title
Commands, Queries, and Consistency
Title of Series
Number of Parts
110
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
As developers build larger and more complex systems supporting many users collaborating on growing data-sets in parallel, many are turning to patterns like Command/Query Responsibility Segregation (CQRS). Unfortunately, the baggage of building N-Tier style business logic continues to weigh on their modeling efforts, often resulting in domain models that don’t handle consistency correctly in the face of race conditions. Join Udi for a new perspective on CQRS using a new twist on the saga pattern.
23
63
77
Software developerConsistencyQuery languageSoftwareTotal S.A.Order (biology)Repository (publishing)Order (biology)LogicVirtual machineProgramming paradigmMultiplicationType theoryProduct (business)DatabaseObject (grammar)ConsistencyExecution unitSoftware testingDomain nameReal numberIntegrated development environmentMehrplatzsystemReal-time operating systemCASE <Informatik>Multiplication signCondition numberDiscounts and allowancesTerm (mathematics)QuicksortPhysical systemFlagResultantSoftware design pattern1 (number)CodeAuthorizationProcess (computing)StapeldateiRun time (program lifecycle phase)Relational databaseQuery languageServer (computing)SequelAxiom of choiceRight angleRow (database)OracleVideo gameThread (computing)2 (number)Shift operatorSoftwareSelf-organizationCommitment schemeProjective planeInformation securityComputer architectureAreaMereologySoftware developerScaling (geometry)Web 2.0BitCuboidPresentation of a groupSoftware frameworkFormal languageStability theoryMathematicsConcurrency (computer science)Arithmetic meanSet (mathematics)Client (computing)Goodness of fitFocus (optics)System callDependent and independent variablesBoolean algebraBusiness objectUnit testingComputer animation
Process (computing)OvalRepository (publishing)Order (biology)Total S.A.Discounts and allowancesSoftware developerSummierbarkeitDatabase transactionConsistencyEndliche ModelltheorieTime domainProduct (business)Social classLine (geometry)Domain nameConsistencySoftware testingEndliche ModelltheoriePhysical systemDatabase transactionRootProduct (business)Order (biology)Software developerMathematical modelSet (mathematics)DatabaseDisk read-and-write headParallel portLogicBuildingExplosionException handlingDiscounts and allowancesConcurrency (computer science)FlagState of matterNumberRevision controlACIDCASE <Informatik>Boundary value problemMultiplication signType theoryMultitier architectureSoftware bugAreaCalculationStapeldateiResultantMehrplatzsystemQuicksortMereologyReading (process)CodePurchasingDecision theoryCondition numberWeightFeedbackMultiplicationWordBitView (database)Level (video gaming)OracleServer (computing)SequelWebsiteDefault (computer science)Covering spaceArithmetic meanRoutingGroup actionRight anglePerspective (visual)Computer animation
Software developerPerspective (visual)Order (biology)Total S.A.Process (computing)OvalRepository (publishing)Discounts and allowancesLipschitz-StetigkeitSoftwarePerspective (visual)Physical systemConsistencyEndliche ModelltheorieMathematical modelDomain nameTotal S.A.Order (biology)State of matterQuery languageType theoryQuicksortProcess (computing)CodeDiscounts and allowancesLatent heatMultiplication signOcean currentDatabase transactionBitSinc functionNumberArrow of timeSystem callDifferent (Kate Ryan album)Intelligent NetworkRight angleVirtual machineComputer programmingResultantJava appletPattern languageWeightFrequencyGoodness of fitInformationStreaming mediaService (economics)DatabaseSemiconductor memoryParallel portArithmetic meanTwitterWebsiteAverageCollaborationismRow (database)Level (video gaming)RoutingCodecTerm (mathematics)AreaStapeldatei1 (number)Bus (computing)Summierbarkeit
Software developerSoftware testingMassPhysical systemOrder (biology)SpacetimeService (economics)Projective planeLevel (video gaming)Software testingBitRow (database)QuicksortOrder (biology)Message passingTotal S.A.Mathematical modelType theoryDecimalZoom lensComputer animation
System programmingSoftware developerDatabase transactionMessage passingTablet computerVirtual machineSoftware testingPhysical systemOrder (biology)Service (economics)SpacetimeSoftware frameworkExecution unitIcosahedronDrum memoryDiscounts and allowancesString (computer science)Maxima and minimaInfinite conjugacy class propertyComa BerenicesProgrammable read-only memoryTotal S.A.OvalType theoryOrder (biology)Message passingEvent horizonRepresentation (politics)Total S.A.Bus (computing)Latent heatDiscounts and allowancesSubsetMultiplication signIntelligent NetworkInterface (computing)First-order logicStreaming mediaFrequencyBitField (computer science)QuicksortDatabasePhysical systemSoftware frameworkLogicSoftware testingService (economics)Queue (abstract data type)Instance (computer science)Domain nameRegular graphTheory of everythingInformationEndliche ModelltheorieWritingException handlingComputer animation
Software developerMessage passingStreaming mediaCodecEndliche ModelltheorieMultiplication signSoftware testingOffice suiteWritingMessage passingINTEGRALDomain nameCodecNumberQuicksortMathematical modelComputer animation
Software developerSoftware testingOvalOrder (biology)Total S.A.Repository (publishing)Discounts and allowancesBitCodeQuicksortMessage passingNumberDiscounts and allowancesSoftware testingOrder (biology)Process (computing)Bit rateSystem callDoubling the cubeComputer animation
Software developerOrder (biology)Total S.A.Software testingString (computer science)Discounts and allowancesBefehlsprozessorConfiguration spaceFunction (mathematics)Physical systemSpacetimeService (economics)Message passingOvalGamma functionNormed vector spaceMiniDiscOrder (biology)Software testingBus (computing)Execution unitDiscounts and allowancesDoubling the cubeSystem callProcess (computing)Multiplication signCodeMessage passingBitType theoryExpected valueTotal S.A.First-order logicUnit testingLogicDomain nameEndliche ModelltheorieRegular graphLine (geometry)Service (economics)Programmable read-only memoryWordMereologyCASE <Informatik>Right anglePoint (geometry)Interactive televisionFault-tolerant systemThread (computing)Arithmetic meanSineComputer animation
Discounts and allowancesSoftware testingSoftware developerException handlingOrder (biology)Total S.A.Message passingLocal GroupPort scannerGamma functionOvalString (computer science)IRIS-TMultiplication signCodeTerm (mathematics)DatabaseRegular graphMessage passingCategory of beingMathematical modelDifferent (Kate Ryan album)Type theoryElement (mathematics)Boundary value problemConsistencyPoint (geometry)Concurrency (computer science)Bus (computing)Goodness of fitEndliche ModelltheorieBitSoftware testingExecution unitException handlingRevision controlService (economics)Block (periodic table)Discounts and allowancesNumberDomain nameReal numberExistential quantificationOrder (biology)Unit testingLogicFirst-order logicTotal S.A.SummierbarkeitProcess (computing)AdditionSystem callSingle-precision floating-point formatBlack boxIntelligent NetworkRelational databaseData structureDisk read-and-write headArithmetic meanWeightLengthCASE <Informatik>Right angleComputer programmingMereologyState of matterDemosceneWordFlow separationLatent heatOSI modelPerspective (visual)RoutingWebsiteObject (grammar)Computer animation
Discounts and allowancesSoftware developerMessage passingOvalTotal S.A.SummierbarkeitPurchasingProcess (computing)Repository (publishing)Radio-frequency identificationOrder (biology)DatabaseDatabase transactionTablet computerPerspective (visual)Perspective (visual)Different (Kate Ryan album)MereologyMultiplication signStreaming mediaMessage passingProcess (computing)Mathematical modelSoftware testingExpert systemMultiplicationOrder (biology)Type theoryQuery languageLogicSingle-precision floating-point formatCASE <Informatik>Event horizonComputer animation
Software developerDiscounts and allowancesSoftware testingGroup actionMessage passingTablet computerCodecDatabase transactionFormal languageMereologyInformationExpert systemSoftware testingProcess (computing)Message passingData conversionMultiplication signSoftwarePattern languageTest-driven developmentManifoldNumberDomain nameArithmetic meanLogicSoftware developerSet (mathematics)Representation (politics)1 (number)Type theoryAreaCASE <Informatik>Mathematical modelRight angleDisk read-and-write headComputer architectureCartesian coordinate systemEnterprise architectureGoodness of fit3 (number)NP-hardComputer animation
Boundary value problemConsistencyReal numberRootSoftware developerPhysical systemGroup actionPattern languageBus (computing)Type theoryMathematical modelEnterprise architectureEndliche ModelltheorieSoftware testingComputer architectureConsistencyQuery languageCartesian coordinate systemPerspective (visual)MassMereologyDatabase transactionParameter (computer programming)ImplementationSoftwareAreaGoodness of fitDomain nameService (economics)CASE <Informatik>MehrplatzsystemMultiplicationINTEGRALComputer animation
Transcript: English(auto-generated)
Good morning, welcome. Hope you had an interesting morning so far. And my intention is to mess with your minds a little bit this morning. So if you've had a little bit of an easy time, people talk to you about languages, technologies,
and frameworks, and that kind of stuff, our focus today is going to be a lot more on the things that most people don't tend to talk about very much in software. The runtime behavior behind the code that you write, the way the technologies that you take for granted actually do behave that may end up causing you problems. And the problems are the kinds of things
that are really hard to find and really hard to address. So based on the title of this talk, Command Queries and Consistency, you can probably guess that it's going to relate a little bit to CQRS. So who's heard of CQRS before, Command Query Responsibility and Circregation? OK, that's just about everybody. Good. Who's already using CQRS on some kind of project?
OK, got maybe 50%. Now, whether you're using CQRS or not, what I tend to see as I travel around the world and work with clients is that people end up making the same kinds of mistakes repeatedly.
Or more accurately, there's the same symptoms. The way that people make mistakes is very different each time. So most of it has to do with business logic. So with regards to business logic, I think it's fair to say that all of us in our systems have some kind of complicated business logic. But I want to talk about a specific kind, the kind that involves both commands and queries.
So who as a part of their business logic sometimes has to look up some additional data you need to query, some additional data in your business logic? Yeah, OK, so that's just about 90% of you. And it turns out that in that area, that when you have your most complicated business logic, which often gives the business the greatest value,
that's where we actually have consistency problems that are hiding. And this is irrespective of whether you're using CQRS or not. Even if you're using a fully synchronous end to end architecture, talking to a fully transactional relational database, these things can bite you.
So a lot of times, everybody's heard about this eventual consistency, and yeah, you can go use Mongo, it's web scale, and all those kinds of things. But a lot of organizations say, well, we'd rather keep it safe and stay with the things that we know. And what I'm here to tell you is that even when you're using the safe architectural choices of doing everything
synchronous, even when you're using the safe technology choices, like using SQL Server or Oracle, even then sometimes you could end up with an inconsistent system. And I want to tell you why and how that ends up playing out. Now first of all, I want to talk a little bit around the background, which type of world are we in now.
So the always on perpetually working lots of users in parallel type of system has become more and more ubiquitous. Users expect to be able to access their data whenever they want, wherever they want, and it's not just each user by themselves.
When you have a situation where you have users that are only operating on their own data, life is actually pretty simple. I call those types of systems their multi-single user systems. They're kind of like a multi-user system, but each user is kind of in their own little data, and that's it. It's when users are able to touch each other's data
that things start to get even more interesting. Now unfortunately, when we look at not just the technologies that we're dealing with, but the programming practices and the paradigms, a lot of them are based on that same object-oriented thinking. Well, you write objects, and you persist these objects in some kind of database, and it works,
or at least it works on your machine pretty well. Then you put it in production, and some kind of problems happen. Usually the way that we address consistency concerns when we have multiple users operating on the same set of data is using things like optimistic concurrency. Who's heard of optimistic concurrency? Yes, just about everybody. Good. So we've got our traditional first one wins and last one
wins concurrency, where first one wins means first user that gets their changes done, last one has to redo their changes. Sometimes this gets even more interesting when we jump into a user and themselves, and the user is able to do things relatively quickly. So let's jump right into code.
The most traditional innocuous type of domain object that you've probably seen, the order object. Everybody, when giving some sort of presentation, always reverts to retail, and here we've got an order object. Now this order object is implementing some, I wouldn't call this complicated business logic,
but we got a new business requirement that says as a part of processing a new order, we want to decide under which conditions a customer is going to be getting a discount. Now if this customer in the past week, the customer that is submitting this order in the past week, did more than $250 euros, pounds, Norwegian kroner, Swedish kroner, Danish kroner,
whatever your currency of choice is, then they get a discount. And if not, they don't. Your average developer looks at the requirement, implements this code, doesn't really think twice, checks that it works OK, maybe writes a unit test or two, commits, and off it goes to production.
Now the question is, all right, so what's so bad about this code? Nice, clean, object-oriented, domain-driven. We can make it test-driven too. We got all the little check boxes ticked out of the best practices. The area where things become a little bit tricky is what happens if it's, I don't want to say two users,
but the customer, in essence, is an account, and we can have multiple users from the same account buying stuff on the same account. So you have two users at the same time, both of them are submitting an order. All right, so imagine two threads going through that logic at the same time where
the last week of orders was $200. Each of them is submitting a new $100 order. Both of the logic goes down to the database. Both threads go to the database, take a look at all of the existing orders, and see, oh, no, we've only got $200 worth of orders.
Both of them return false, right? Where if these users just made their requests a couple of seconds apart, then one of them would have gotten a discount.
Now, that's not a very nice thing to do when you think about it. I mean, ultimately, we are penalizing our users for this type of behavior, for purchasing too quickly. Now, before I jump into sort of the bigger solutions, I want to start with how we ended up here to begin with.
So this idea about processing these types of things in real time, before that, we had the good old solutions, the ones that, you know, any time you needed to do any kind of data crunching, what did you do? You wrote a batch job, something that would run, you got your nightly batch, somebody comes along,
yanks a lever, and the machine starts humming. And off it goes to the database, and calculates, calculates, calculates, and does a whole bunch of stuff. Here's a code that might look like something like that. So we've got some customer object that has a method that's invoked in the nightly batch, and what it's doing is fairly simple.
It's just going and looking up these types of orders, and then saving some sort of flag on the entity, saying, all right, this customer now should be given a discount. The problem with that is that we have this sort of reset problem. Say, well, a week, how exactly are we counting that?
And then in terms of the order logic, the order logic is simpler. Now we just check that give discount. Sorry, it's not this, it's a customer.give discount. So this is how we used to approach these types of logic. Any time you need to do a big historical type of data crunching, you did that offline,
you stored the results in some sort of Boolean values or other rolled-up aggregate values, and then in the real-time environment, you just check those values. The problem with this, and it was a real business problem is that once again, if we had two users coming to us, in this case, it's not clicking the button at the same time.
In this case, it's actually clicking the button on the same day. If we have two customers or two users from the same account coming to our site, both of them submitting an order, then ultimately we're saying, no, sorry, if you wanna get a discount, come back tomorrow.
You realize how ridiculous that sounds. If you were a customer saying, no, if you make your purchase today, you don't get a discount, but if you come back tomorrow, I'll give you a discount. So it's understandable why businesses all over the world are moving away from these type of batch jobs. It doesn't work out that well, and that leads us back to this type of logic.
Now this shortens down the problem, at least with regards to the discounts. The bigger issue is that, once again, we have a consistency problem. Now just framed another way, we've got a race condition. We've got the ability for two users to be invoking some logic
that when that logic happens at the same time, it ends up making a wrong decision. I wanna talk a little bit about why that happens, because a lot of developers, they've been lulled into this sense of complacency with transactions. In other words, we know all this code over here
is running in a transaction. Our belief is, assuming we're using a traditional transactional database, that all the code that's in there is going to come up with the correct results. The thing is that at a database level, and DBAs are good at thinking about these types of things,
when they start talking about consistency and isolation levels, they pull it out and start thinking, okay, so we've got this transaction that's touching this entity and that transaction touching those entities. When you have multiple transactions that can run at the same time, reading entities and writing entities, then the question of isolation comes up.
So in this case, we have a very simple scenario. We have transactions one and three that are operating, each of them, on a specific entity. One of them is on entity A, the other one on entity B. Then we've got our third transaction, the one in the middle, that is reading some values as a part of its complicated business logic.
It's reading the value of A, reading the value of B, doing some sort of calculation based on those values, and using that to write the value of entity C. Now, what happens from a database perspective, and I'm talking about a transactional database here, none of that eventually consistent garbage, the stuff that you assume gives you the ACID guarantees.
The I don't need to think about it guarantees is another way of presenting ACID. So the problem that we have here is that our transaction number two over here gets the value of entity A as it was before transaction number one started. It gets the value of entity B as it was before transaction number three started, uses those stale values in order to come up
with a value that should be in entity C, and then writes that down. From the database perspective, this is a perfectly reasonable behavior. Your assumption is when I'm in transaction number two, the end state of entity C will be consistent
with the states of entity A and B, because I was in a transaction. The thing is that databases by default, and when I say by default, the vast majority of them, we're talking about SQL Server, Oracle, Postgres. Unless you specifically tell the database,
I want you to check versions, but not just the version of the entities that I'm writing, because that stuff databases know how to do. I want you to also check the values of the entities that I'm reading. It's known as multi-version concurrency checking.
I want you to make my transaction fail if the value of entity A or the value of entity B changed during my transaction. Why? Because I require that entities A, B, and C be consistent with each other. Now, occasionally developers run into this setting.
So anybody here using nhibernate? Yes? For those of you using nhibernate, are you familiar with an MDCC flag, the multi-version concurrency checking flag in nhibernate? No? Maybe one of you? Now, it's a flag that occasionally a developer when spelunking around nhibernate will find. This is, in a sense, telling nhibernate,
hey, nhibernate, I want you to check not just the entities that I'm writing, but the entities that I've read as well for their versions. And if they're different, fail my transaction. Occasionally, one developer will say, hey, I wonder what would happen if we turn this on. They turn it on, and all of a sudden, a whole bunch of their transactions start to fail.
You start getting a lot of exceptions in the log. The oh, no, no, no, turn that off again. Without realizing that that's feedback, saying, hey, look, you actually have concurrency problems in your system. But, like most developers, we don't like exceptions, and we'd much rather pretend that they're not there
than actually deal with them head on. And that's the problem when dealing with parallelism. It can be quite tricky. It's an optical illusion, these transactions. It looks like they're not parallel, but they are. So just so that your eyes and head don't explode while I'm talking, I'm gonna shut that off, all right?
When dealing with parallelism, we need to start thinking through these scenarios. We need to start dealing with the fact that we're not just building multi-single user systems anymore. The bigger the domain model that we build, and that's the challenge, that when going to build domain models that have to support multiple users operating in parallel
on a common set of data, the traditional domain model approach, it doesn't always work so well. They're not necessarily safe for parallel execution. So that's a problem. When people say, I'm going to be doing
domain-driven design, and then they create a whole bunch of entities and one-to-many relationships over here and many-to-many relationships over there, the belief is I'm doing domain-driven design. And I hear this all the time, both in the discussion groups, and I see it also with customers, say, well, customer's my aggregate root, and order's my aggregate root,
and product is my aggregate root, and they've got aggregate roots all over the place. But the interesting thing about domain models, at least as defined in the literature, is that an aggregate root is a consistency boundary. Remember hearing that? An aggregate root is a consistency boundary? Yeah. It's an important thing. That's actually the main thing about an aggregate root. However, when we look at what we just saw,
that if you have transactions that can touch multiple entities, and these transactions are running at the same time, you might not actually have a consistency boundary, in which case you might not really have aggregate roots. Aggregate roots are not a given.
You have to work hard and understand the behavior of your domain in order to come up with a domain model that will have real aggregate roots. And that's a problem, not only for your traditional n-tier types of systems, also for the people doing CQRS. A lot of times when they're going to do CQRS,
they're still applying these same styles of domain models, and that causes a problem. And the issue is that mistakes in this area can really be quite costly. The reason is it's not just the bug. It's the fact that you don't know
how many things it's influencing. Is this influencing million dollar orders, or is this influencing thousand dollar orders? Are we giving customers too much of a discount, too little of a discount? You really don't want to fudge around when it comes to consistency. So this is worse than a system losing data.
This is system garbageing up its data. It's getting it into an inconsistent state. And the problem is that, well, most testing doesn't uncover it. I mean, when you look at it, if your users say, hey look, I think the system made a wrong decision, and we start debugging it, it can be really hard to find those bugs, right?
I mean, how many tests? You need to start doing parallel testing. And you need to find some way to assert the right state. This is hard stuff. So this is one of the areas when dealing with these types of things, we really need to start designing in the consistency up front. I'm all for the agile mantra
of staying away from big design up front. But there are certain principles that you kind of got to get right. You got to have a good strong foundation. Otherwise, we end up with systems that are defective by design. This is a wonderful website, by the way. Defectivebydesign.org. You know, all sorts of things that kind of makes you wonder whoever designed them,
what they were thinking. Unfortunately, a lot of our traditional domain models in a parallel collaborative type of world are defective by design. Thing is, we can do better. We can do a whole lot better, but it requires us to think about software and think about requirements in a slightly different way.
So to sum up this new way of thinking, think about it like Back to the Future, okay? Those wonderful old movies that you saw when you were younger that were really cool. Has anybody ever re-watched one of the Back to the Future? Didn't it really suck?
Like the second? You're like, I can't believe I really like that. So always be careful with revisiting the nostalgia. Sometimes it's better to leave it as a pleasant distant memory. But the hinge in dealing with software and parallel type of software is exactly that.
It's that concept of time. It's rethinking the arrow of time and how we program with it. So the traditional requirement that we said was, all right, we've got this stream of orders that are coming into our system. And whenever we get an order, we look back in time, seven days, or whatever the requirement is, sum up the total value of orders in that period of time.
And if the value is greater than 200 and some odd dollars, then they get a discount. And we mentioned the problem was if we have two things happening at the same time, we could end up with an inconsistent value. We can achieve the same result instead of looking backwards in time,
looking forwards in time, which seems a little bit unusual. How can you predict the future? Thing is, you can't predict the future, but you can fold the future into your programming now. And let me explain how you do that logically. So when we get an order, ultimately what we're interested in
is the last seven day total. Instead of calculating that at the time of the order, we can turn that around and say, actually, let's introduce that concept into our domain itself. A customer has a seven day running total of orders,
because ultimately that's what the business is telling us. It's talking about the last seven days. What we're doing is we're just making that explicit. So when we get a $100 order, what we're saying is, all right, the seven day running total is $100, but that needs to be decremented seven days in the future.
In essence, we want to send data to ourselves seven days from now saying, you know, you need to decrement the seven day running total seven days from now. And as we get this stream of orders that are coming in, we say, okay, now we increment it by $200 now, and we throw information into the future
and say decrement it by another $100 then. And as we keep on going, aggregate the data that we need so that we have a current running total that's telling us, okay, now it's $300, now it's $400, et cetera. And as we catch up to ourselves in time,
then we need to have that kind of wake up call saying, hey, look, it's been seven days since order number one. Say, oh, okay, then what I need to do is to decrement it down to $200. Okay, it's been seven days since order number two. All right, let's decrement that one down as well.
The difference here is that, like I said, instead of looking back in time, we look forward in time. So as we're modeling this, as new orders are coming in, we're doing that same type of behavior over and over and over again so that when we finally arrive at a specific order, we're not actually doing a query,
we're not doing a historical look up, but rather we've folded the history into our current state. But we've done that in a highly consistent way. The reason that we've done it in a highly consistent way is that what we've done is we've in essence created, let's call it an upside down batch job.
So remember the code that we have with our batch job where we said, over here we're gonna have a customer that has some sort of give discount equals true value. We rolled up some state and we put it in the customer. We're doing the same thing here.
We're rolling up some state and putting it in the customer. The important thing is that when we're dealing with a customer or something like that, what we can say is well now we have a specific entity that has specific data. This is one row in the database. Databases can guarantee us full transactional consistency
at the level of a single row. In other words, if we have two things that are happening at the same time, one of them wants to submit a new order and another one is dealing with a timeout, then ultimately that bit is going to lock on that single row.
That in essence is the trick of an aggregate route. To design your logic in such a way that all transactions are touching just a single row, a single entity. However, in terms of programming, this requires something that we don't really have. We need a way to program time.
And you know, system threading timer. Not such a good idea, is it? The problem with those types of in-memory timers is that if our machine crashes, well it won't remember that it needs to decrement the value in seven days. So we need a highly reliable, durable,
transactional way of dealing with time so that no matter what happens in our system, we won't lose that information. Once we have all of this kind of stuff, then we'll be in a situation where we can say all right, now we can build real aggregate routes. So I'm gonna switch over to code now
to talk about how we actually build these types of things. It's not gonna look like your average domain model, let me tell you that. It's not gonna have one to many relationships. What this is going to be building on is a pattern that in Service Bus we call the saga.
But this comes from the saga ideas, ultimately taking what we were talking about here, saying what we have over here is a long-running process. We need a transactionally consistent, stateful, long-running process that can be managed by time.
The saga pattern was discovered, documented by the database community as they were dealing with what's known as long-lived transactions. So they were trying to address the same kind of issues, but more from a data up perspective. What we're now looking at it from a domain-driven design perspective is a behavior down.
So it's not a by-the-book saga definition, but it follows a lot of those same principles. And within Service Bus we try to make it easier to program these types of things because like I said, number one, well, time isn't something that we have a very good way of programming. Either in .NET or in Java or in Ruby or in Python,
it requires a little bit of infrastructure and also requires us to deal with the issue of, well, the consistency. What happens if I have a timeout being processed at the same time as a new order is coming in? We need to make sure that those locks hold up,
and that's why they have to go on the same entity. All right? So when dealing with this, I'm not gonna spend too much time talking about the infrastructure. A lot of the ways that sagas behave with regard to an infrastructure perspective are documented and you can see it online. I want to talk more about the behavioral modeling side of it, okay?
So what I'm gonna do, I'm gonna switch over to code, like I said. But to make this a little bit more interesting, what we're gonna do is, so here we've got sort of the basic level of just a simple project. We're gonna start writing this
in almost a test-driven fashion. It's not going to be entirely test-driven because I required just a little bit of stuff to set up so you can see how this is gonna play out. So we're gonna start off with the message, the thing that's actually triggering our behavior. We said, well, we've got orders that are coming in.
These orders have a total. And ultimately, we need to make sure that all of the order totals, as they are correlated by customer, that we calculate the seven-day running total, okay? So I'm gonna set up a couple of messages so that we actually have something to deal with over here. So here we have an order accepted message.
And this message has a customer ID so that, as I mentioned before, we want to make sure that we aggregate this stuff based on customer ID. Each order will also have an order ID so that we can clearly identify it. And it has a total, which, forgive me,
I'll just use a double because decimals are a little bit ugly. I have to do lots of casts later on, all right? So we have double, and we have the order total. Now, in dealing with this type of saga, ultimately, when we're going to model behavior,
and we look at this behavior, the only one we had over here, let me zoom back up to the picture over, there it is. Oh my God, it disappeared on me. There it is. That when dealing with, we're saying we have some behavior on customer
and we have some behavior on order. And we actually need to pull that behavior together. The total value that we're dealing with needs to be stored somewhere. So instead of talking about this as an order saga or a customer saga or naming it as a regular entity,
I'm going to call this what the business people call it. This is a discount policy. It's the policy with which we use to decide which customers get a discount and how much, okay? Now, for this discount policy, we're going to have some discount policy data.
Now, in Service Bus, we have this interface called IContainSagaData. And let me squeeze that over a little bit over there. Okay, so the IContainSagaData has some required fields. It's not particularly problematic.
So why is it complaining about this? Discount, of course, discount policy data, there we go. And I don't want to talk too much about this type of information, but ultimately, an ID is something that we need to clearly identify this quote unquote entity. And the discount policy actually contains the behavior.
So I'm just going to set up that saga representation saying that this is holding the discount policy data. What we have over here is a situation where we're saying, we have a stream of messages, these order accepted events that are coming in and our saga wants to handle them. So if this is the first order
that we've received for a customer, then we're going to actually be kicking off a new discount policy for this customer. Say, I'm started by messages of, we call that order accepted. Okay, we're saying over here, this discount policy for a specific customer instance, what we want to do is we want to take the order total
and increment the seven day running total of our customer. So we can say over here, let's create a public double seven day running total. And when an order comes in, we say, okay, the data's seven day running total
plus equals message dot total. And what we'd like to say is, and seven days from now, decrement it back down. Right, we want to take this message that we have right here, I say, you know what, play that to me seven days from now.
So this is one of the things that we've introduced within Service Bus as a messaging framework. It's easy for us to hold onto data in various cues. Unlike with HTTP, HTTP doesn't give you a good way to put something on the shelf and get it back later.
So you kind of have to, okay, we'll put it in the database and we'll have something pulling against the database and it's quite messy. Messaging systems allow us to model these types of behaviors easier. So one of the things that we can do over here, is say request a timeout passing in a, well, when do you actually want the timeout? So here I say, well, I want a time span from days seven
and what I'd like to say is, the data that I want to get back is ultimately the message that I have right now. So I can take the message that I have over here and say, you know what, send me back that message in seven days.
Where when the seven days are up, well, then I want to decrement the value back down. But as I mentioned, you know, this is sort of the basic stuff of setting it up. I'm not going to implement the logic of the handling of the timeouts just yet because I want to do that via tests.
But I want to set up that extra little bit of infrastructure that says, and my discount policy needs to handle timeouts of the type of message order accepted, okay? So over here what we're saying is, when I get a message back due to a time period being elapsed,
now I'll do the rest of my logic. Now, let's take some of the scenarios that we had in this picture and start writing some tests for it. Because as we know, when going to write domain models that have complicated business logic, and I gotta tell you, once you start programming with time, you're gonna want to be able to have some decent tests.
Because if you don't have automated tests, imagine what your manual testing is going to be like. Somebody comes into the office Monday morning, presses a button, says, all right, see you next Monday, and is waiting until actually the seven-day timeout goes by.
This is, in essence, the problem of doing full-blown integration tests when you actually have a clock that needs to be dealt with. By having a message-driven time model, we can simulate the passage of time with a message. So let's start writing some, I'm not sure I want to call them tests,
because we're more looking at this as sort of a scenario. Given that you got message number one, message number two, message number three, then this is what should happen. Now, one of the things that we said, and I want to go back to our original code requirements over here, is that we need to actually give the customer a discount.
So this discount policy ultimately needs to decide what message it's going to be sending out. So I just want to add that final bit over here, and Roy will forgive me for not doing everything test-driven, but we'll add that command over here that says,
let's call it, give discount, or sorry, discount order, or let's call it, okay, and we have, that's actually not a discount, right? It's talking about processing an order that has a discount. So we have a process order message
that can have a public, let's call it, double discount percent, okay? And that's ultimately what we want to be checking, that when a message is emitted of a process order that it has either a 10% discount or a 0% discount.
All right, so let's start writing a test here. I'm not yet sure what to call it, because ultimately this is a scenario that is going to be playing out over time. So one of the things that we created within Service Bus was a nice way to do these types of message-driven, time-driven unit testing, because it wasn't really easy to do with just regular nUnit, xUnit, jUnit, whatever.
So we start off by initializing our test. So this is the test.initialize, and hopefully it'll identify that for me. Okay, so we've got, and I know what the problem is.
It's that method, let's call it scenario, confuses it. All right, so we've got endServiceBus.testing.test.initialize. Now I can get rid of the beginning part, hopefully. Let's try getting rid of that so we get shorter lines.
Okay, and now I want to test my discount policy saga, and what I'd like to say is that when I give it the first order accepted message, it will emit a process order message
without any discount, okay? So I'm going to expect a send. In other words, this discount policy is going to be getting an order accepted message. I was gonna say, you know what, this guy doesn't get a discount. It should just be a regular process order. So I expect a process order message
that this message, its discount percent is equal to zero dot when this message, when the saga receives the order accepted event, okay?
So I can say the saga's invoked here, so we have saga.handle new order accepted, and in here let's pass in an order total so that we can actually take this to the next step. All right, so we have a new order total where the total value equals, let's say that that's 100.
Okay? So we've written our first test, saying if you get an order accepted, you're gonna send out a process order message without any discount whatsoever. Let's run that really quick and see that, well, it's not gonna pass clearly because we haven't actually sent out a message. And there it tells us over here,
expected send invocation, process order not fulfilled. Okay, so we need to do that. Let's go and make the test pass, okay? So inside this logic over here, we say when you get a handle, what we know that you need to do is send out a message. All right, so we just do a bus dot send process order message.
And in this case, I'm just gonna make the, you know, this is the TDD style almost. I'm not gonna try to be too smart. I'm just gonna make it pass, saying that the messages discount percent is zero. Okay? So let's run our test again.
And this time, it should pass. Hopefully, fingers crossed. There we go. So far, so good. Now we wanna start saying, well, if I increase the value of this order, in other words, if this was a $300 order, and then I gave you a $100 order later,
within the same week, then you will get a discount. So again, Roy will forgive me for actually just tacking this on because we don't have lots of time to get this done. So what I'm gonna do is I'm gonna start chaining the next order. So here I'm going to, and occasionally I like to do this,
expect send of that same process order message. Only this time, the discount percent should be 10% if the total value of the first order was $300. And now I'm gonna give you a new, let's say that it's not a $300 order, this one's gonna be a $100 order.
Okay? And now we can run this again and see that, all right, now it fails again. Why? Well, the check evaluated false for that expectation. Clearly, we didn't do any type of tests in here to say that, well, if the value,
the seven day running total was greater than whatever, then we needed to do that. So let's quickly do that over here. We'll have a double discount percent equals zero if data.7dayRunningTotal is greater than, let's just say, or equal to 250, then the discount percentage is 10.
Bus.sendProcessOrder with the discount percent, and now we'll run our test again. And this time, again, fingers crossed, hopefully it'll work. I'm not that good at this, running the code at the same time. It says process order not called.
What's that? Well, the first order hasn't timed out yet. Well, the problem that we had here was, all right, so we've put in a value that's 300, right? Total value has increased.
We send in another order, and our seven day running total is what? It's very little code. I'm very happy that Andreas, and yes, that was a plant, to have this a little bit interactive, right? The problem, and I wanna be upfront
with you guys about this, writing traditional domain models tends to be easy because we tend to think about them in a very single-threaded fashion. No, we do this, and then we do this, and then we do this. It's very simple, straightforward logic. When we start dealing with time-driven processes, we need to start thinking about, well, what happens if things happen in parallel?
So there are other concerns to deal with, okay? So ultimately, we have a little bit of a bug over here. What is it? Anybody see it? Anybody using nservice bus today? I'm curious. Yeah, okay, maybe 10% of you.
The discount is applied to the first order. Well, ultimately, what we're saying here is that when we get an order coming in, we increment the value. So what would be our seven day running total? We passed in, remember our test over here, the total is 300, and then we pass in a total of 100.
Right? So ultimately, our running total should already be how much? 400. Let's actually check that out, okay? So debug it. So the first time we run through this,
we see our seven day running total is indeed zero. That's great, we request a timeout, we calculate a percentage, and we see that the seven day running total is now 300. For the first one.
In other words, the problem that we had here is that we incremented our seven day running total too soon. Right, in other words, the failure here is ultimately a failure saying, well, we expected a discount percentage of zero the first time, and that's what we failed on.
Okay? So when I finish running this, let me go back over here, I expected send invocation, and I'll just step through that and we can see the unit test session down here. It's important that we look at the calls that were made and at which point in time there was a failure.
Over here, all it's saying is calls made a single order service dot process order message down here, okay? So in other words, it failed the first time we sent a process order, not on the second one. So always, you will need to be looking at when doing these types of tests,
not just a specific assert, but ultimately which one of these failed. So going back and fixing this, we've done almost everything that we need to do. The difference is that the incrementing of this value has to be done at the end, right? Where we say, request a timeout from this message,
calculate the discount based on the seven day running total, and then, after you've calculated the discount, it's probably better to have this bit done right above this. Now when we go and run our test,
what we'll see is that the first time that it goes through, it says, okay, well, we actually had zero, now the test passed, and on the second one, well, everything was great. Now let's add, the important twist to this is time. We say, well, what happens if time was up
between these two orders? Right, we got the first order for $300. We waited seven days, it timed out, and now we got a $100 order. So this time, we should not actually get the discount. So this will be a,
let's actually create a separate scenario over here. So we'll call this scenario two. It behaves very similarly. However, in addition to this when, what we're going to be doing here, we're saying, and the saga times out. This method over here
is doing a fairly intricate thing behind the scenes. Suffice it to say that what it does is it is watching the clock for you. So in your domain model, in your saga, you're saying, wake me up in seven days, and then wake me up in another seven days, and then wake me up in another seven days, one after the other. What this when saga time out does
is it replays time in the correct order for you. In other words, if you had one bit of code that said, well, wake me up in three days, and another bit of code that said, well, wake me up in seven days, and another bit of code that ran later and said, wake me up in five days, say, well, actually, I'm going to replay you the timeouts in the order of three, five, seven, and not in the order that you open the timeouts.
So this makes testing these things quite a bit easier. Now when we say this, okay, when the saga times out, in this case, the discount percentage should again be zero, right? Because ultimately it's been seven days, and then when the next order comes in, the seven day running total should again be zero.
So now we can run this test, and this is our scenario two test, and this one we'll see should fail. That says, oh, no, sorry. On the second invocation, the check evaluated to false.
You see that over there. So, okay, so what we need to do is ultimately that final bit of our logic saying, when time is up, this is when we need to decrement back down that state. So we got the data dot seven day running total minus equals our state's total.
Go back, run all of our tests, make sure that they pass now. Oh, we see that both of the tests pass. So the important thing here is, actually two things.
Number one, in dealing with our business logic, the logic itself was not extremely complex. Right? In other words, when we compare the logic that we would have written in a traditional domain model, say, well, it kinda looked about the same length, right?
Go to the repository, look up all of the seven day history of the orders, calculate the sum total if the value is greater than this, then give it this discount. If the value is less than that, then give that discount. So we haven't written a whole lot more logic. Another thing, while traditional domain models
are also testable, who's written unit tests for their domain models? Yes, good. Who's had to change their unit tests for their domain models more than once or twice? Who, if ever, got to the situation where they were wondering, what's the point of writing unit tests for these domain models
if I keep having to rewrite them over and over again? Yeah, some hands going up. Wait, is Uncle Bob in the room? No, okay, Roy, no, no Roy anywhere? All right, yeah, I can see. That's often been a problem, that when we have domain models that change very often, a lot of times our unit tests
end up having to change with them. Unfortunately, because a lot of the traditional domain model unit tests tend to be very coupled to the internals of our domain model structure. When we're dealing with this type of behavioral entity, the way that we test it, and you saw that over here,
is we're talking about very much black box testing. So I don't know how you're structured internally. I don't know that you actually have a property called data.7dayRunningTotal, and I don't care. All I know is if I give you these messages on this side, then you should be putting out those messages on that side.
And as long as the discount policy requirement is the same, then those tests should remain the same. So we have pretty good testability. We have short code, object-oriented, same as before. Main difference is the fact that we have time baked into this,
where if the timeout method is trying to run at the same time as the handle method, then regular optimistic concurrency blocks out one or the other, or causes one to fail. That's what happens when you have a conflict.
So in your domain models, who's had an optimistic concurrency violation exception? Yes? Yeah. What do you do about it? Who logs it? Okay, it's a best practice. Hint, whenever Udi says something's a best practice, it's not a best practice, okay?
So logging is great, but the question is, well, what do you do after that? So for all those of you who are running in Service Bus, the answer is simple. Well, we do nothing, and Service Bus just retries it for us, and then it'll work the next time. For all those of you who are running traditional domain models without in Service Bus,
when you have an optimistic concurrency violation exception what do you do, other than logging it? Who retries? Okay, we got about four or five hands going up. What happens if you crash? Well, we don't retry anymore, because we don't remember that we actually got the exception to begin with.
Kinda sucks, but hey, we didn't like that order anyway. So in dealing with these types of issues, it's consistency, it's reliability, it's all those elements that make up an aggregate route, a real DDD consistency boundary. It's making sure that you have the necessary infrastructure in there,
so that if these two methods, the handle and the timeout, were running on the same time, and here it doesn't matter if you're using a relational database or using a document database. Because we have modeled our behavior in such a way that it is a single element, that we can have the locking behavior on top of it.
So this is what ultimately gives us the consistency back again. As you see, dealing with time, it's not, in terms of programming, it's not lots of code. But for a lot of us, the hard part, and I just wanna go back to this picture, because it really is hard.
These different perspectives of time, modeling this is something that is counterintuitive. It's something that we don't have a lot of practice in. So any time that you have requirements, any time you have a business expert
that is coming to you saying, look, what I need you to do is to look at a historical query as a part of your business processing logic. Note that down, but understand that for you to understand the way that you're gonna have to program this and the way that you're gonna have to test this, you should really draw it out on a timeline.
Sometimes it can be relatively simple, like in this case, all we had was a single message type, a stream of order accepted events. In other cases, you may have multiple message types that are coming in. The more complex your logic, the more messages will be going in there. So I don't wanna say always, but really, until you get proficient
with looking at the world from a looking forward in time rather than a backward in time perspective, always draw things out on the timeline. And please, write the tests. With sagas, with this types of logic,
I'd say write the tests first. Sometimes it's too difficult to keep five different scenarios in your head. That is the challenge over here. So instead of saying, like in a regular domain model, okay, well, we'll just implement the logic
for each requirement and test it afterwards. With sagas, what I recommend you do is draw the timeline and say, all right, we got message number one, message number two, timeout message number three. What should be the behavior? Write a test. And I'd say that even more strongly, this shouldn't be test-driven development.
Again, forgive me, Bob. This should be test-first development. What I want you to do as you're doing this, sit down with your business expert because they're the ones that are gonna be able to tell you how time actually relates to your business process. So you sit with your business expert, draw out a timeline, define what is the behavior, write a test for it.
Don't make it pass, otherwise the business expert will stand up and leave, okay? Once you got them, say, all right, let's go through another scenario. What should I do? Write a test. Your business expert should be able to read along this type of thing with you and say, yes, this is an accurate representation of the scenario.
Go through scenario after scenario until the business expert says, that's everything. After you have codified all of the requirements as a set of failing tests, then go and make them pass, okay? So it's test-first development rather than test-driven development. You might be able to think up these scenarios yourself
if it's a domain that you're very familiar with. Still, I'd say, get a business expert in there. Time is something that, especially in the more established domains, people are familiar with, at least on the business side. There is a whole lot of experience on the business side when you're talking about mortgages, banking, insurance.
These guys understand time very deeply. A lot of the newer domains where the business was computerized from day number one, online gambling is a good example, that's where time, sometimes business experts don't know to actually give you that information.
Don't assume that the requirements you're gonna get are going to be readily expressible like this. And that's the hard part about DDD. That's known as ubiquitous language. Time is a big portion of your ubiquitous language, and unless you can work that into the conversation in a way that everybody speaks the same language, implementing this kind of stuff is gonna be difficult.
So, just wanna wrap up. We've seen lots of good books in software over the years. We have domain-driven design. We've got the patterns of enterprise application architecture. People are familiar with the domain model pattern. In some cases, people are more or less familiar with some of the messaging patterns.
The area where a lot of problems start are just around the fact that people continue to assume that when they're doing CQRS or N-tier type architectures that they can build the same style of domain models as with a multi-single user system. You can't.
The consistency won't hold up. In the best case, you'll lose data. In the worst case, you'll lose data, and the data that you'll keep will be inconsistent. The problem is it won't show up in testing. Who has tests for their system? Yes? Who actually, as a part of their system testing,
runs tests in parallel to make sure that the data at the end is consistent? We've got one out of maybe 150 people. It's hard to write those tests. It's very difficult to express the way that things are. So A, start writing those tests, start running those tests,
but B, more importantly, start designing your system up front with the idea that you're gonna have to deal with this. So again, at an implementation perspective, Nservice Bus has been designed from the ground up to give you this type of consistency. But it's not specifically about Nservice Bus.
You could take everything that I just showed you. It's plain old CLR object, and put them on another infrastructure. You could run it on mass transit. You could run it on Rhino Service Bus. You could run it on Azure Service Bus. But always check back down to that issue of transactional integrity, consistency, this kind of stuff.
Don't assume that the infrastructure is going to handle it for you until you've really gone down into the bits and bytes. So I hope I've given you a new perspective on the whole command query argument. It's not about dividing up commands and queries. It's about your domain model. It's about modeling your behavior. It's about finding out where the appropriate transactional seams are.
If you get that, then you'll have a high performance, reliable, and consistent system. Thank you very much.