We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Beyond Validates Presence of: Ensuring Eventual Consistency

00:00

Formal Metadata

Title
Beyond Validates Presence of: Ensuring Eventual Consistency
Title of Series
Part Number
62
Number of Parts
86
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
You've added background jobs. You have calls to external services that perform actions asynchronously. Your data is no longer always in one perfect state- it's in one of tens or hundreds of acceptable states. How can you confidently ensure that your data is valid without validations? In this talk, I’ll introduce some data consistency issues you may see in your app when you begin introducing background jobs and external services. You’ll learn some patterns for handling failure so your data never gets out of sync and we’ll talk about strategies to detect when something is wrong.
35
Variety (linguistics)Different (Kate Ryan album)Multiplication signValidity (statistics)Physical systemFrequencyView (database)Amenable groupComputer animation
Library (computing)MathematicsRow (database)
Endliche ModelltheorieMultiplication signValidity (statistics)Electronic program guideRow (database)NumberCartesian coordinate systemXML
State of matterNumberBitProjective planeMultiplication signView (database)Medical imagingComputer animation
CausalityRelational databaseProduct (business)Row (database)Instance (computer science)Validity (statistics)Statement (computer science)Subject indexingConstraint (mathematics)Computer animation
Row (database)Core dumpValidity (statistics)Product (business)BitEmailProcess (computing)AreaMultiplication signComputer animationXML
Correspondence (mathematics)Mobile appProduct (business)Row (database)Decision theoryProcess (computing)Moment (mathematics)Cartesian coordinate systemGame controllerSelf-organization
Process (computing)Web serviceAsynchronous Transfer ModeProduct (business)
SoftwareFilm editingError messagePhysical systemWeb serviceState of matterSpacetimeArmChemical equationGroup actionConcentricPartial derivativeComputer animation
Different (Kate Ryan album)Physical systemVariety (linguistics)Video gameProcess (computing)State of matterRow (database)Product (business)Workstation <Musikinstrument>
AreaNumberStrategy gameState of matterError messageWeb serviceDigital electronicsResultantRollback (data management)Potenz <Mathematik>Process (computing)Multiplication signPosition operatorSystem callOrder (biology)Queue (abstract data type)Information retrievalCodeDatabasePhysical systemComputer configurationConfidence intervalGroup actionArithmetic meanStructural loadQuicksortProduct (business)ParsingQueueing theoryBoundary value problemComputer animation
Scripting languageProcess (computing)FrequencyResultantComputer configurationRow (database)Queue (abstract data type)NumberMoving averageProduct (business)Multiplication signDatabasePhysical systemRight angleComputer animation
CodeDatabase transactionStrategy gameRow (database)Default (computer science)Service (economics)Queue (abstract data type)Process (computing)DatabaseView (database)NumberException handlingCondition numberStructural loadDressing (medical)Grass (card game)Error messageBitKey (cryptography)XMLComputer animation
Software developerMultiplication signCodeWeb serviceProduct (business)Object (grammar)Figurate numberLine (geometry)InternetworkingSampling (statistics)Right angleField (computer science)Social classClosed setDatabase transactionDatabaseRollback (data management)Self-organizationResultantSoftware design patternError messageFerry CorstenQuicksortSoftware testingMusical ensembleTimestampProcess (computing)System callCartesian coordinate systemSlide ruleSign (mathematics)MassService (economics)Mobile appOnline help
Right anglePhysical systemTimestampDegree (graph theory)State of matterComputer animation
Product (business)Row (database)State of matterQuery languageDifferent (Kate Ryan album)SequelMultiplication signInformationAreaView (database)Staff (military)
LogicStatement (computer science)Drop (liquid)Drag (physics)Mobile appWritingCodeMultiplication signDisk read-and-write headTimestampPlanningConfidence intervalRow (database)Computer fileDefault (computer science)Time seriesSequelEqualiser (mathematics)System callSummierbarkeit1 (number)Reverse engineeringXMLComputer animation
Data structureData storage deviceComputer fileSequelDivisorSelf-organizationTraffic reportingScripting languageService (economics)DatabasePhysical lawInsertion lossCASE <Informatik>Computer animationEngineering drawing
Cartesian coordinate systemPhysical systemComputer configurationMessage passingConnected spaceSequelEndliche ModelltheorieDirectory serviceDatabaseGoodness of fitWritingTraffic reportingDirection (geometry)
CodeStrategy gameClosed setMultiplication signPhysical systemData storage deviceLine (geometry)TimestampRow (database)Cache (computing)Axiom of choiceWritingFluxSequelAbsolute valueInheritance (object-oriented programming)Right angleSource codeComputer animation
Streaming mediaComputer configurationTimestampEvent horizonAdditionMathematical analysisString (computer science)Multiplication signMereologySequelProduct (business)Process (computing)Tape driveCellular automatonComputer animation
Electronic mailing listLine (geometry)Form (programming)Dependent and independent variablesEvent horizonOrder (biology)Pattern languageGene clusterStreaming mediaComputer animation
Confidence intervalBlack boxEvent horizonGoodness of fitStreaming mediaResultantCodeSoftware testingMathematical analysisComputer-assisted translationRight angleData structureLoginCurvatureCartesian coordinate systemMiniDiscFile formatDirection (geometry)Computer fileParameter (computer programming)Similarity (geometry)Shared memoryOcean currentComputer animation
BuildingPhysical systemTraffic reportingConnectivity (graph theory)Information securityMultiplication signWeightDatabaseCASE <Informatik>Universe (mathematics)ExistenceParameter (computer programming)Automatic differentiationComputer animation
Social classXML
Transcript: English(auto-generated)
I'll let some people trickle in.
Welcome, thank you for coming. My talk today is called Beyond Validate's Presence Of. I'm gonna be talking about how you can ensure the validity of your data in a distributed system where you need to support a variety of different views of your data that are all, in theory, valid for a period of time.
So my name is Amy Unger. I started programming as a librarian, and library records are these arcane, complex, painful records, but the good thing about them is that they don't really often change. If a book changes its title, it's because it's reissued, and so a new record comes in.
We don't deal with that much change and alteration within the book data. Obviously, users are a different matter. So when I was first developing Rails applications, I found active record validations amazing. Every time I would implement a new model or start work on a new application,
I would read through the Rails guide for active record validations and find every single one that I could add. It was a beautiful thing because I thought I could make sure that my data was always going to be valid. Well, fast forward through a good number
of consulting projects, some work at Getty Images, and now I work at Heroku, and unfortunately, this story is not quite as simple. And so I wanted to share today some lessons I've learned over the years. First, kind of speaking to my younger self,
why would I let my data go wrong? That would be how me five years ago today would be reacting to this. What did you do to your data, and why? Next is prevention. Given that you may have accepted that your data may look differently at different times, how can you prevent your data from going into a bad state
if you don't only have one good state? And then finally, detection. You know, if your data's gonna go wrong, you just better know when that's happening. And if you were here for Betsy Hable's talk just before me, a lot of this is gonna sound familiar
or just be a little bit more focused on the distributed side of things. So first, let's talk about causes and how your data can go wrong despite your best intentions. And I'd like to start with reframing that by asking why would you expect your data to be correct?
Five years ago, me would say, but look, I have all these tools for data correctness. I have database constraints and I have ORM code. I have active record validations. They're gonna be in my corner. They're gonna keep me safe. So let's take a quick look at what those would be. So for database constraints and indexes, we're looking at ensuring that something is not null.
For instance here, I'm trying to say that any product that we sell to you should probably have a billing record. For the health of our business, it's kinda important that we bill for things that we sell and so this statement would keep us safe
in the sense that anything, before we can actually save a record of something that we have sold to you, we also need to build up a billing record. The core allowing there would be an active record validation, the inspiration of the title of this talk where product validates presence of billing record.
Although after I submitted this talk, I realized the syntax is a little bit new. So here, this is what you may recognize and I need to clearly review some things. So why would this go wrong? Well first, let's get a product requirement that gosh, it's taken too long for us to sell things.
You got so much work going on when a user clicks a button, I want that. We really wanna speed up that time. So you think gosh, I've already extracted some of my email mailers. I'm doing all the things I can in the background but billing only needs to be right for us once a month
at midnight hour, beginning of the month. Until then, we have a little bit of leeway. So why don't we move that into a background job? Well that leads to a kind of sad moment where we have to comment out this validates presence of billing record
because we want to get our product controller to have this particular create method. What we're doing in that create method is we're taking in whatever the user gave us. We're saying hey, all right, we now have a product that we have sold and we're gonna enqueue this job
to create the corresponding billing record. And then we're gonna immediately respond with that product and that's awesome for them. They can start immediately using their Redis, their Postgres, their app, whatever they want and it just leaves us with within a few milliseconds we need to get that billing record created.
So it sounds great. Unfortunately what happens if that billing creator job dies? You're in a tough spot for having a product that is not in fact billed for. Then we have another fun complication. Your engineering team thinks gosh, it kind of sucks
that we're doing all of our billing and invoicing in a really legacy Rails app. That does not seem like the right engineering decision. So let's pull out all of our billing and move it into something that can be scaled at a far better pace for that kind of application.
Well now, our job billing creator just gets a little more complicated because when it is initialized, it finds the product, builds up this data and then calls out to this new billing service
and now we have two modes of failure. One is your job could just fail but then most of the job could succeed but your billing service could fail horribly which leads to our fun discussion of all the ways your network can fail you.
So some of these are easier than others. You can't connect. Okay, well you can probably try again. No harm, no fail. Let's just give it a shot. What happens if it succeeds partially on the downstream service? It doesn't fully complete and you get back an error
and you think gosh, I'll retry. Well, is it gonna immediately error because it's like I'm in a terrible state. I refuse to accept anything. Or is it going to say that looks weird. Maybe I'll create a new one. You have another option. The service completes work
but the network cuts out in such a way that you don't, that it thinks it's done but you don't see that. So do you retry that and risk the fact that maybe you're gonna bill for something twice? And then this final one is kind of a corollary to that.
Well, do you distinguish between the, knowing which systems will roll back if they see a client-side timeout error? So with all of these aspects that are critical to designing highly performant systems
that are going to be distributed, I think we have to move to accepting that your data won't always be correct or at least it will be a variety of different ways of correct. It is perfectly fine now for a product
to not have a billing record because all that means is that the billing record is in the process of being created. What we want to be able to express is the fact that eventually we truly expect something to coalesce to most likely one but maybe multiple valid states that we expect it to spend the majority of its life in.
Now, of course, that's not always true. People create products, make or buy things and then decide, whoops, that was exactly the wrong thing to buy right now and then immediately cancel it. So you may not even get to see this thing finally coalesce into something that you might think would be valid. But what if you don't always know what correct is?
So let's move to prevention where it's more about handling those errors. We've stopped really caring about making sure that everything is in a perfect state. Let's just sophisticatedly handle the errors we're seeing.
So we have a number of strategies. The first I'd like to talk about is retry. I mentioned this earlier. If you can't connect, might as well just try again. But this brings into question a couple of issues. First, you want to be aware of whether
the downstream service supports idempotent actions. If it does, you're good, keep on retrying. Even if it succeeds, keep on trying. It's fine. The next is a strategy that if you're doing mostly just background jobs, you can implement some sort of sophisticated locking system.
I haven't done that. It seems a little more work than I would want to do but then again, if you are only doing jobs within one system, that might be the right solution. You can choose, if you don't trust your downstream service
to be idempotent, you get to choose between retrying your creates or your deletes. Please do not retry both. Or have far more confidence than I do that your queuing system will always retrieve things in order. And the reason why you might think
you don't have to choose is because sure, if you put them on a queue, you can get first in, first out really well. But what if, and most of the time with a downstream service, you're gonna wanna be retrying multiple times, right? Why retry just once? What if the service has a 15 minute blip?
Should that require manual intervention? Probably not. You probably wanna say hey, retry this thing like five or 10 times. If it fails on the 10th time, that's fine. But try it five times. Well, so what happens then if your delete call takes far longer to fail than your create?
What that means is that by the second time round, your delete that is being retried for the second time is higher up in the queue than your create. And by the 11th time, I mean, who knows which one is gonna come off first? And if you end up in the unlucky position that your delete call gets pulled off
before your create, then you're left in a situation with someone who just wanted to quick buy something, realize that they did something wrong, delete it, and yet they're being billed for this, add in an item, and nobody is happy. A final thing to mention with retries is, you know, if you are gonna do many, many retries,
do consider implementing exponential backoff and circuit breakers. Don't make things worse for your downstream service if it's already struggling by increasing its load. Another strategy you have is rollback,
which is a great option if your code, if only your code has seen the results of this action. So if your code base is the only one, your code base and your local database is the only one that knows that this user wants this product, absolutely, rollback. But what about external systems? And the fun thing here is you need to start
considering your job queue as an external system, because once you say, hey, go create this billing record, even if the end result is that that billing record is going to be in the same local database, you can't delete the product.
You can't just have that record magically disappear. So roll forward would say you have a number of options. You can enqueue a deletion job right after your creation job. Once you create something, you can delete it. You can also have cleanup scripts that run,
that detect things that are in a corrupted state and clean them up, hopefully very quickly. But rolling forward is all about accepting that something has gone wrong, but that something existed for just a short period of time, and we can't make that go away because something out there knows about it.
All right, so you say, okay, this kind of makes sense maybe. What does this look like for my code? First, let's talk about transactions. So transactions will allow you to create views of your database that are only local to you.
So let's say I want to create an app, create a Postgres, create a Redis, I don't know, register like five users for that app, and also call two downstream services with all those. If you wrap that all in a transaction and any exception is thrown and bubbles up out of that transaction, all those records go away.
Now, your downstream services, you still need to worry about those. But it's a nice tool for making local things disappear. With that in mind, there are a couple things you might want to consider. First is understanding what strategy you're using.
Usually this will be the ORM default. So if you were in Betsy's talk earlier, you saw active record dot base dot transaction do. That chooses by default one of four transaction strategies. If you're in Postgres, if you read Postgres's documentation, you'll see they choose a sophisticated default. But please understand which one you are using because it has implications for what things
outside of the transaction can see in and what they can't. The next thing I'd like to suggest you consider is adding your job queue to your database. Now, if this causes you absolute horror because of the load that you foresee it putting on your database, you are correct.
And this is a little bit like me. If I were from LinkedIn in the days when they had, rumors had it like 20 people working on Kafka and then they told people everybody should use Kafka. Heroku has a decent number of very intelligent people working on Postgres.
That being said, if this doesn't totally terrify you, you should definitely absolutely do it because what it means is you do not have to worry about pulling deletes off of the queue. They just disappear. So instead of having that crazy race condition of a delete possibly outrunning a create,
just never happened. You can write code as if you just were able to go ahead and think, but then if you have an error, it's as if that job never got enqueued.
Next suggestion is to add timestamps. And I would suggest adding one timestamp to an object for every critical service call. So for a product that you sell, you might want to consider adding billing start time and billing end time. And what you do is you set that field
in the same transaction as you call to the downstream service. If the downstream service fails, it'll raise an error that you choose not to catch which will exit the transaction and result in that timestamp not being set.
Timestamps obviously allow you some fun debugging knowledge, and they do help you with additional issues debugging across distributed services. But the nice thing here is if the timestamp's not set, but you know the call never succeeded, and you should be able to retry if you know that it is safe to do so.
The next one I want to talk about is code organization. And this is one where I don't have any panacea. And it's really hard, but I want to advocate very strongly that you think about writing your failure code
in the same place as you write your success code. And what I mean by this is if you have a downstream service, let's say you're calling Slack. In the next few slides, I'm gonna talk about creating a new employee. So let's say you're uploading, or Slack, so you're creating a new employee within your company's Slack.
The same place that you are writing that create call, please only a few lines away have the code to do the wind back so that no matter where your call, whether it's further down the line from Slack,
wherever your employee creation fails, the path of the code goes right back through that. And what it helps do is it helps your developers think about failure paths at the same time as they're doing successes. So what would this look like? So let's say we're going to create an employee.
And we have this beautiful app. This is a completely contrived example, so we're gonna have a local database. We're gonna register them in Slack. We have a HR API. We're gonna upload a headshot to S3. Then we have another bunch of jobs, I don't know, maybe getting them all set up in GitHub.
So what happens if, let's say, S3 is down? Lovely thing that I'm already standing up, right? So if S3 is down, and I wrote this slide the day before S3 went down,
let's say S3 goes down, then your employee creator class has a pretty clear path for unwinding this all, right? You call the downstream HR API, you pull the user from Slack, and then you cancel the transaction that would have created the employee. And that's lovely, you can think through that, right?
But this is kind of more like the code we write. And if this does not look like any code you've ever seen, congratulations, this is awesome. You should give a talk. You will get all the job applicants. So do you know what to do to unwind this mess
if it fails right there? I don't. I have absolutely no idea. And sure, I can stare at this long enough and try to figure out what's going on, and I'd probably get close. But if I'm tired, if I haven't spent time
with the Slack API since they've updated it, I'm probably gonna make a mistake. There's something I'd like to suggest you consider is something called the saga pattern, which allows you to create an orchestrator that essentially controls the path that things walk through,
and then keeps all of your rollback or rollforward code encapsulated in the same spot as the creation code. All right. So with that in mind, that obviously it's hard,
and we're gonna mess up, how do we detect when things have gone wrong? So the first thing I wanna talk about is SQL with timestamps. And since we have added at a previous date, timestamps saying delete it at, create it at, and billing started at, billing ended at,
we actually have some degree of hope of trying to reconcile things across a distributed system. So we may never get to this. We're definitely never gonna get to this. But with a bunch of different small SQL queries, we can get maybe close. So let's say we wanna tackle one small aspect of this.
Shockingly, you all do not want to continue paying for things that you no longer have on Heroku. If you delete NAP, we probably shouldn't continue billing for it. So this query may look a little bit complicated,
but what it does is it says, hey, for our billing records and the things we have sold, find all the billing records that are still active, that are attached to products that are not active, as in canceled, someone deleted them,
delete those where the product was deleted 15 minutes ago. And what that does is it gives us 15 minutes for us to become eventually consistent into a state that we're pretty confident in. I say pretty not because, you know, we wanna continue charging you for stuff, but because let's say the billing API goes down
for longer than 15 minutes, this thing is gonna start yelling at me. And that's a pain for me, but most of the time, I mean 15 minutes is a pretty darn long time, we're likely gonna be safe. So SQL with time stamps has a lot of benefits.
Some of them are incredibly subjective. The first is absolutely subjective. I am far more confident of my ability to write business logic in really short SQL statements than I am about writing a very large auditing code base.
That SQL statement to me was far more readable and something I can maintain confidence in that it will continue to run successfully than I am about writing the same thing in Ruby. That's probably gonna be something that your team is going to be different on, depending on where you work.
The other nice thing about SQL with time stamps is that you can set them up to run automatically. Betsy was talking about Sidekiq earlier. We have just an app that will run these. We also have drag and drop folders, so make this easy to write new ones.
It shouldn't be hard for someone to think, wow, that record looks weird. Let me write a check to see if there are any others like it So these drag and drop folders will take SQL and they'll make sure it runs. Alerting by default, if you have ways
of making it really easy and consistent, for us that means wrapping our SQL in Ruby files that say, hey, alert me if there are zero of these or alert me if there are any of these, the more common. And then finally, documented remediation plans. As an engineer on call, I have really no interest
in relearning our credit policy. I mean, I'm happy to do it because it means that my mistake is cleared up, but let's not have us have to talk to our head of finances every time. He's not gonna be happy.
So some of the challenges here, as you might suspect, are non-SQL stores. And I specifically say non-SQL because you could be shoving structured JSON files in S3. I don't know what you're doing. But yeah, so no SQL, non-SQL, who knows what.
And everything I've talked about so far has been built on the concept of the big, beautiful reporting database. And every large organization I have worked at has one of these. Like, you have so many distributed services and someone has just decided there will be a central one.
I think it's probably a corollary of Conway's law somehow but in any case, what happens if let's say one of these is Redis? For us, we usually try to just do a quick ETL script
and if we need to, get it into Postgres. There's also the fully functioning model of just flipping this on its head. If you don't want to use that big, beautiful reporting database and you are fully confident
that you can write good auditing code, then you open the doors to so many other options. So you can talk directly either to Redis, get a direct Redis connection string, or hit some API that is backed by Redis. You can hit arbitrary APIs and then you can also hit all of your other distributed systems.
For me, the concern is that writing an application that will talk to every single one of your distributed systems seems a lot more bug-prone than just SQL off of one big, massive, giant database. But I've done this. So it really depends on the scenario.
So as I mentioned, some of the challenges are non-SQL data stores where you can call whole, transform, and cache. Those are usually the verbs we're using but it's really just ETL. You can end up writing code in not SQL which may be the right choice. The other challenge that we run into
is systems that do not have timestamps. And so you can't do anything that says, like, gosh, I expect for five minutes this thing to be in flux. But after five minutes, if it's been created for five minutes, absolutely start checking it. If you can't get timestamps added,
then I would move to a strategy close to snapshotting, analyze the whole gosh darn thing, write in records that say, like, hey, at this time it was correctly configured. At this time, this thing wasn't correctly configured. But hey, maybe next time it will be.
And then we threw together some SQL to determine whether things are coalescing. You may wanna, again, do this in code. The SQL was about 60 lines long and included a self-join. And a table, and it's a little scary.
The other option in addition to SQL timestamps that I wanna talk about is using event streams. And this may sound somewhat similar to log analysis, which it absolutely is. So if you're doing that, this will be very similar. So let's walk through the process of the events
of buying a thing on Heroku. And so each time we hit one of these events, Heroku will emit an event to a central Kafka, and we can read all of these events from one consumer.
So for this, for buying a product, we'll first see an event that says, hey, someone really wants a Redis, that's cool. We then move into assorted events on, okay, are they authenticated? And hey, is that product available? Are they allowed to install it? And this goes on many, many, many events
are emitted, even for the smallest requests, until we get to the end, which looks roughly like, hey, this Redis cluster is up and available. Billing has started, a user response has been generated either to send them a webhook to say, hey, it's available, or because they were, in fact, waiting in line for us to do all this work.
And you can start to see patterns. If the user is an average authorized user, right? We can create that list of what events we should see, and in what order. And we can use this to determine whether something was actually successfully created,
and whether we should expect the data to be in the correct form at the end. So some benefits of event streams. It's a single format. You're not having to negotiate, oh, that thing is backed by Redis. That thing, like, why are we still on flat files?
Why? It is one place, and you can just register a new consumer to walk a stream, or walk many streams. It has the added benefit of, essentially, black box testing your application.
So again, if this sounds similar to log analysis, where you're trying to determine whether your application is successful based off of, hey, if someone hits the search button, we should probably see some results returned, and we see those, that kind of structure in the log. And therefore, we're gonna validate
that this AB deployment can slowly be scaled up. This is, it's very similar, just used for a different purpose. I do have concerns about this approach, and we're not using this explicitly for any business-critical auditing right now.
But it's something we've discussed heavily, and it's the direction we want to go in as we refactor things. So I wanted to share some of the concerns I had with going down this road. What do you do if you omit the wrong events?
Data on disk is something I have far more confidence in than whether we're continuing to omit the right event. I write typos. Anything that sounds similar, I'm probably gonna write. I'm probably gonna exchange it. I've been known to exchange cash for cats.
In my defense, there were cats on my lap. But you might have random failures like that. What if you continue omitting events, even if you're not actually doing the work? You know, people make mistakes.
And while it's one thing to scale up an AB test and say, hey, this canary deployment is great. We're gonna go full out with it. It's one thing to rely on events and log analysis for that. It's another thing to trust the health of your business to the accuracy of your events.
And then finally, and this gets back to do you wanna be writing code that validates code? What if the stream consumer code is wrong? What is your confidence level that your team is going to be able to write really good auditing code? So this is the end of my talk.
But I wanted to leave you with a caveat for what I have been proposing, especially towards the end, which is that everything I've been talking about is a lot of engineering effort.
Especially building the beautiful, big reporting database if it's not there. Building an auditing system that will touch every single component of your distributed system. My time isn't cheap. And the reason why my company has chosen to invest in some of these is because
there are certain things that we just fundamentally cannot get wrong. We've talked a lot about billing because it's a pretty easy thing. It's kind of visceral, us charging you for something that you should not be paying for. That's bad. But this also applies to security concerns.
And for us, those are absolutely business critical. And it's why we're willing to put in this effort. But if you're building something that's a little more lightweight and is not going to take down the business if you get it wrong, maybe consider a lighter weight solution. In any case, so I wanted to say
that I hope I have had something that was relevant for everyone in the room. Whether that's talking about why your data might go wrong, how you might prevent it, and then detecting when mistakes inevitably happen. I wanted to say thank you. I really appreciate you all sitting through this talk.
And I have about five minutes for questions. Nola's trying to clap. We can. Thank you. Thank you.