Merken

Beyond Validates Presence of: Ensuring Eventual Consistency

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
genes and the fish and on everywhere in the path according to get started a and Dalits and people trickle in welcome thank you for coming to my talk today is called beyond validates presents that I'm amenity talking about how you can ensure the validity of your data in a distributed system where you need to support a variety of different the use of your data that are all injury valid for a period of time so my name is amy under I started programming is a librarian and library records are these arcadian complex painful records but the good thing about them is that they don't really often change if a book changes its title it's because it's reissued and so a new record comes in we don't deal with that much change in alteration of within the book data obviously uses a different matter the so when I was 1st
developing real applications by found activity validations amazing every time I would implement a new model or start work on a new application I would read through the rails guide for activity validations and find every single 1 that I could add i is a beautiful thing because I thought I could make sure that my data was always going to be valid while fast foreign through a good number of consulting projects somewhere to Getty Images and now again her view and unfortunately the story is not quite as simple and so I wanted to share today some of the lessons learned over the years 1st can speaking to my mandate I in itself so why would I let my data go wrong that would be how me 5 years ago today would be reacting to this what did you do to your data and walk nexus prevention did you may have accepted that Newton and look at differently at different times how you prevent your data from going into a bad state if you don't only have 1 good state and then finally detection you know if you didn't get the wrong you just better off know when that's happening I if you here for a that's evil stock I just before me a lot of this is going to gonna a sound familiar just as being a little bit more focused on the distributed side of things the 1st let's talk about causes and how your data can go wrong I despite your best intentions the and like to start with reframing that asking why would you expect your data to be correct 5 years ago me would say but what I have all these tools for data correction correctness I have database constraints in poem could I have Active Record validations they're going to be in a corner there and so to take a quick look at what those would be opened enemies constraints and indexes we're looking on its ensuring that something is not know for instance here so I'm trying to say that any product that we sell to you should probably have a billing record but held by resent that's kind important that we built for things that we sell and some this statement would keep us safe in the sense that anything before we can actually see a record of something that we have told you we also need to build up a billing records the corollary that would be an active record validation and the inspiration of the title of this talk where product validates presence of billing records and although after I submitted this talk I realize the syntax is will be new at so here this is what you may recognize and I'm I need to clearly reviews and things so why would this go wrong well 1st but in a product requirement that I'm gosh it's taken too long for us to solve net so much work going on when you know user clicks a button I want that I we really want to be at that time so you think gosh you know already extracted some of e-mail male areas are doing all the things I can in the background but you know Billy only needs to be right 1st once a month and then and Midnight Hour beginning in a month until then we analytically way so what we move into the into a background job 1 that leads to a kind of moment where we have to comment out this validates presence of billing records because we want to get our product controller to have this particular create the method but what we're doing that create method is we're taking in whatever the user gave us we're saying here or that we now have a product that we have sold and learning q this job to create the corresponding billing record and an organ immediately responded that product and that is that's awesome for them they can start immediately using and the red is the POS guess the AP whatever they want and then adjust leaves us with you know within a few milliseconds we need to get that billing records created that sounds demonstrate and virtually what happens if that babbling create a job dies you're in in a tough spot for her having a product that is not that for the when we have another 5 complication your engineering things in engineering team things got it kind of sucks that we're doing all of our billing and invoicing in a really legacy Rails app that does not seem like the right engineering just to let's pull out all barbelling and it into something that can be skilled at or better piece for that kind of application who well I job billing created as it's a little more complicated because when it is initiate initialize the it finds the product build up this data and then calls out to this new billing service and now we have to use the value other 1 is you know your job to dispel but then most of the job could succeed but I new billing service could feel powerless which leads to profound
discussion of all the ways you network and failure that was so so these are easier than others you can can act OK well you can I try again non-Novell just give give it a shot on what happens it succeeds partially on the downstream service it is fully and you get back in the space and partial return I well is again immediately opposite like I'm in a terrible state I refuse to accept anything up or is it going to say balance we're in medium all create a new 1 an Newman action service completes work but of the network cuts out in such a way that you know about that it thinks it's done but you don't see that I so do you we tried that and risk the fact that maybe you're going to go for something twice the and this final 1 is kind of a a corollary that what kids do you distinguish between eyes the arm knowing which systems will roll back if they see a the concentration of so with all of these aspects that are critical to designing highly performant systems that are going to be distributed I think we have to manage to accepting that new data will always be corrupt or at least it will be a variety of different is of correct it is perfectly fine now for a product to not have a billing record because all that means is that the billing record is in the process of being created what we want to be able to express the fact that eventually we truly expect something to coalesce to most likely 1 but may be multiple valid states that we expected to spend the majority of its life now of course that's not always true people create products make or buy things and and decide what that was exactly the wrong thing to buy right now and immediately consulate so you cannot even get to see the station finally coalesce into something that you might think would be valid but but what if you don't always know what correct so let's to prevention where it's more about handling those areas we've stopped really can caring about making sure that everything is in the perfect state let's just sophisticatedly handle the errors were seeing so we have a number of strategies the 1st like to talk about is we try I mentioned this earlier if you can connect my oldest time again but there's brings to question a couple of are issues 1st you wanna be aware of whether the downstream cities downstream service supports unimportant actions if it does you good the boundary trying even if it succeeds keypuncher it's back and the next is a strategy that if you're doing mostly as background jobs you can implement some sort of sophisticated walking system I haven't done that it seems a little more our 1 way then I would want to do but then again and if you are only doing jobs within 1 system that might be the right solution the but you can choose if you don't trust your downstream service to be item you get to choose between return your printed elites please not reach but an or a have far more confidence than I do the queueing system will always achieve things in order and and the reason why you might think it's it you don't have to choose as because sure if you put them on a Q you can get on have 1st in 1st out really well but what if have most of the time when the downstream service you wanna be re trying multiple times right why Rotundus just once 1 of the service has a 15 minute but should that require manual intervention probably not the problem once they hit we try this thing like 5 or 10 times it entails on the time time that's fine but tried 5 times well so what happens then if you're delete call it takes far longer to fail in your create what that means is that by the 2nd time around the you're delete the that is being we tried for the 2nd time is higher replicated in your create and many 11 time mean who knows which 1 is going to come out 1st the and if you end up in the unlikely position the you delete we call gets pulled off before you're creating then you're left in a situation with someone who just wanted quick by something realize that they did some wrong deleted and yet they're being being billed for this ad infinitum and nobody is I think I mentioned is by you know if you are to to do many many retries do you consider that implementing exponential backoff and circuit breakers don't make things worse for your downstream service if I had that if it's already struggling by increasing its load another said you have as rollback which
is a great option art if your code parse if only your code has seen the results of this action so if if you work and this is the only 1 your code in the data in your local database is the only 1 that knows that this user wants this product absolutely rollback but what about external systems the something here is you need to start considering you jumped you as an external system because once you say hey go create this billing records even if the end result was a babbling record is going to be in the same local databases you can't delete the product you can't just have that record magically disappear the I'm so roll-forward would say you've a number of options right you can opt into a deletion job right after your creation job I been once you create something you can delete it you can also have on cleanup scripts that run that detect things that are in a corrupted state in clean them up hopefully very quickly I'm but rolling forward is all about accepting that something has gone on wrong but that's something that existed for just a short period of time and we can't make a go wake is something out out there knows about it a I hate to say this coming from me I was this look like from a code to well the 1st segment transactions so transactions will allow you to have create views of your database that are only local to you so let's say I wanna create new created grows creative writers on a register like 5 uses for that at on and also call it to downstream services of with all those if you wrap that all in a transaction and anything any exception is thrown in bubbles up out of that transaction all those are going to go away now you downstream services you still need to worry about those but it's a nice tool for making local things disappear with that in mind other a couple things you might want to consider the 1st is understanding what strategy you using usually this will be the horror and default so if you you're in Alberti's talk earlier you saw on Active Record . be stuck transaction due to that is that users by default 104 transactions judges the fear impose grass if you repost dresses documentation is they they choose a sophisticated default but please understand which when you're using because it has implications for what things outside of the transaction can see and and what the key the next thing I like to suggest you consider is adding your job huge your database now there if this causes you absolute horror because of the low road that you foresee putting on your database you are correct in the and then this is a little bit like me you know if I were from linked in the days when they had our rivers had it like 20 people working on cough and they told people everybody she's Kafka and you're good has a decent number of very intelligent people working expressed that being said if this doesn't totally terrify you should definitely absolutely do it because what it means is you do not have to worry about pulling deletes off of the q they just disappear act so instead of having the crazy race condition of a delete possibly out running a have a creative just around you can write code as if you know you just were able to go ahead and think but then if you have an error it's as if that job never I think you that next suggestion is to add timestamps and I would suggest adding 1 time sample to an object for every critical service also for a products that you sell you might wanna consider are having billing start time and billing and time and what you do is you set that field in the same transaction as you call the downstream service if the downstream service fails it always an error the choose not catch which will exit the transaction and result is not time stamps not to be assessed the 10 cents the lies some fun dividing our knowledge and they do help you out with additional issues dividing Ross distributed services but that the nice thing here is if the timestamps not set but you know the cholera succeeded and you should be able to reach right if you know that it is safe to do so the that's what I want about is code organization a this is what I don't have any panacea the bands are is really hard but I want to advocates very sorry that you think about writing your failure occurred in the same place as you write your success can and what I mean by this is if you have a downstream service let's say you're calling slack In exercise an attack about creating a new employees so let's say you're uploading I Our slack so you're creating a new employee within your company's lack of the same place that you were writing that create calls please only a few lines away have the code to do the wine back so that no matter where you you're I call whether it's further down the line from I yeah where employee creation fails the code the path of the code goes right back through that the way it helps to use it helps you developers think about Philip has at the same time as the doing successes so what would this look like so let's say we're going to create an
employee and we have this beautiful out I and this is a completely contrived examples organ have a local database where register them in slack we have AI and HR API I will upload a headshot test 3 they have another bunch of jobs I do know maybe uh getting them all set up in In get and the so what happens if let's say as 3 down can and lovely thing animal rescinding Africa and so investors down there road the slide the day before as 3 and down and let's say as being as that island they your employee creator class has a pretty clear path for unwinding this all right you know Call the downstream Ajaria API you pull the user from slack Avenue I cancel the transaction that were created in place the that's lovely you can think through that right I think this is kind of more like the career and if if this does not look like anything you've ever seen our congratulations how is is also you should give a talk he will get all the job applicants so do you know when to do down 1 this mass if it fails right there I don't absolutely no idea I am sure I can steer the sign of Internet turn figure out what's going on the provided close but I'm tired if I haven't spent time in the psyche API since they updated it I'm a brilliant make mistakes the something like 2 hours suggest you consider is something called have sort of pattern which allows you to create the orchestrator that essentially controls the path that things walk through and then it keeps all of your you were our rollback were roll-forward code encapsulated but in the same spot as a the creation cut right so with that in mind but that obviously Todd and really messed up how do we detect things about the
of so the 1st thing I want right is equal
times and since we have added as previous state hence saying leave it at the at the end billing started at filling and that we actually have some degree of hope of trying to reconcile things across a distributed system the so we may never to this wonderful
area but we have a bunch of different sepals small sequel queries we get me because so let's say we want either 1 small aspect of shockingly you all do not want to continue paying for things that you no longer hence on her view he deleting them with and continue billing for the so is this query may look a little bit complicated but what it does is it says hey for a billing records and the thing is we have sold the find all the billing records the still active that are attached to products that are not active as canceled some 1 billion of but only those where the product was deleted 15 minutes ago what that doesn't give us 15 minutes for us to become eventually consistent into a state of their work pretty confident I think pretty not because you know we wanna continue charging for staff but because let's say the billing goes down for longer than 15 minutes this land and start yelling the and that's a pain information by most time and in 15 minutes of pretty darn long time were likely going to use and this
is equal timestamps has a lot of benefits and men are incredibly subjective the 1st is absolute subjective I'm far more confident of my ability to write business logic in really short sequels statements than I am about writing a very large auditing days that's it statement to me was far more readable and something I can maintain confidence in that it will continue to run successfully then I am about writing the same thing in reverse but that's probably me something that your team is going to the the different on a depending on where you work the other is the sum of a about equal time series is that that you can set them up to run automatically somebody was talking about sidekick earlier we have just enough that will run we also have dragon drop folders so make make this easy to write new ones it shouldn't be hard for someone to think wow that record looks weird let me write a check to see if there are any others like so these are not followers will take sequel and no make sure it runs a learning by default if you have what his of making it really easy inconsistent for us that means wrapping a sequel in Ruby files that say hey alert me if there are 0 these were let me if there are any of these the more common I am thoroughly documented remediation plans as an engineer and call and I have really no interest in relaying our credit policy so I mean I'm happy to do it because it means that like my mistake is cleared up by let's not have as I have to talk to our head of finances every time he's not can be happy so some of the challenges here estimate factor non sequel stores I'm and specifically say Nancy bone because it Yemini could be shoving structure sound files and S 3 unweighted but but yes so no sequel non-simple like I and I have talked about so far has been built on the concept of the big beautiful reporting database but every large organization I have worked out has 1 of these like you have so many distributed services and someone has just decided there will be a central 1 I think it's probably a corollary of Conway's loss and home by I In any case what happens if but they were uses rats for us we'd usually try to just do a quick entails script and if we need to have getting into 2 Postbus on there is also the fully functioning model and just flipping is our sense you don't have it your 1 is that big beautiful reporting as and you are fully confident that you can write that on linked and then you open the doors to so many other options so you can talk directly either to red scanner directories connection strength or hit so maybe I that is apparatus and you can hear arbitrary the eyes and then you can also here all the other distributed systems for the concern is that writing an application that will talk to every single 1 of your distributed systems seems a lot warmer by the person than just sequel offer 1 big message i database but I don't so I had a really dependent on this the so as I mentioned some of the challenges and on simple data sources where you know you can call will transforming castling usually the ribs using but it's really just GL and you can end up writing code and not sequel which may be the right choice the other challenge that we run into is systems that do not have time stamps and so you can't do anything that says like gosh I expect for 5 minutes this thing to be in flux but after 5 minutes of it's been created for fragments absolute CER checking if you can get timestamps added then I would move to a strategy of clustered stop chatting analyze the whole gosh darn thing right in my parents that said like at this time was correctly configured at this time this thing was correctly configured but he may be next it will make i and then only through together so sequel to determine whether things are coalescing and you may want again to this encode and suppose about 69 it's long included as self trying and table and and it'll scary the other option in addition to sequel times there the 1 hand that is using about strings and this may sound somewhat similar to log analysis which is absolutely it's part of your dream this will be very small so let's walk through the process of the events of by new things on her to on and so each time we have 1 of these events I heard you will emit an event to a central Kafka and we can read all of these events from 1 so for this for buying a product will perceive them as as tape someone really wants a at school the we then move into certain events or they are they authenticated and he is the product available are the allowed to install it and this designed many many many events are in the reference cells
repressed until we get to the end which looks roughly like name this writers clusters often available billing has started is the response has been generated either to send them away to say hey it's available or because they were in fact waiting in line for us to do all this and you can start to see patterns if the user is an average authorized users right we create that list of what events we should see and in what order and we can use this to determine whether something was actually successfully I've created and whether we should expect the data via the cracked form at the end so some benefits and mn streams is a single firm in not having to negotiate 0 that thing is that there there's nothing like why are we still on flat files why but it is 1 place I'm hands you can just register new consumer 2 Walker Street the uh or what many streams it has the added benefit of essentially by Procter black box testing new applications are again if this sounds similar to log analysis we're trying to determine whether your application is successful based off have someone hits the search button which of policies and results returned and we see those and that kind of structure in the log on and therefore we're going to win and validate that this on AB a deployment can solely be scaled up and this is it it's very similar just use for a different purpose I do have concerns about this approach and and were not using it explicitly for I any any business-critical auditing right now but it's something we've discussed heavily on its direction we want to go in as we we factor things and so I wanna share similar concerns I had with going down this road what you do if you met the arguments right data and this is something I have far more confidence in then whether were continuing to emit the right is that I might take those anything that sounds similar and probably didn't right a under PALinkA exchange I've been known to exchange catch for cats In defense there were cattle and but you know you might have fillers like that but what you continue emitting events even if you're not actually doing the work the you know people make mistakes and while it's 1 thing to scale up in AB test and say hey you know this this can irritable is great really go full out of it and it's 1 thing to rely on in Benson and analysis for that but it's another thing to trust the your business to and the accuracy of of your events then finally let's get back to do you wanna be writing code that validates code what if the stream consumer current is on waste a confidence level thing you're going to be can fit your team is going to be able to write a really good on encoded the so this is the end of my talk but I wanted to leave you with a caveat for my what I have been proposing especially during the argument is that everything even talking about is a lot of engineering effort are especially of building the beautiful a big reporting database of it's not there building and non-existent system that will touch every single component of a distributed system my time isn't she I'm in the reason why my company has chosen to invest in some of the universe is because there are certain things that we just fundamentally cannot get wrong we've talked a lot about billing i'm because it's a pretty easy thing ads can kind of visceral as starting for something that you should not be paying for it as bad but but this also applies a security concerns and for us those are absolutely business-critical the and it's wi willing to put in a separate I but if you're milling something that's a little more light weight and is not going to take down business if you get it wrong many consider a lighter which which In any case so I wanted to that say that I hope I have had something that was relevant for everyone in the room but whether that's why and you did and I go along way might prevent it out and detecting when the mistakes and inevitably happen I am I want to say thank you I I really appreciate you all as they used up on it and I have about 5 minutes for questions yes no is trying to class and you would have there are so few but he if you things
Datensatz
Amenable Gruppe
Mathematisierung
Validität
Programmbibliothek
Physikalisches System
Frequenz
Bit
Prozess <Physik>
Momentenproblem
Nebenbedingung
Kartesische Koordinaten
Raum-Zeit
Computeranimation
Medianwert
Prozess <Informatik>
Code
Elektronischer Programmführer
E-Mail
Automatische Indexierung
App <Programm>
Multifunktion
Befehl <Informatik>
Sichtenkonzept
Datennetz
Exponent
Physikalischer Effekt
Logische Schaltung
Partielle Differentiation
E-Funktion
Biprodukt
Gleichheitszeichen
Arithmetisches Mittel
Randwert
Konzentrizität
Dienst <Informatik>
Einheit <Mathematik>
Automatische Indexierung
Client
Strategisches Spiel
Warteschlangentheorie
Projektive Ebene
Ordnung <Mathematik>
Objektrelationale Abbildung
Instantiierung
Aggregatzustand
Fehlermeldung
Nebenbedingung
Ortsoperator
Selbst organisierendes System
Zurücksetzung <Transaktion>
Gruppenoperation
Zahlenbereich
Ikosaeder
Dienst <Informatik>
Datenhaltung
Informationsmodellierung
Datensatz
Bereichsschätzung
Datennetz
Arbeitsplatzcomputer
Biprodukt
Bildgebendes Verfahren
Tabelle <Informatik>
Relationale Datenbank
Videospiel
Validität
Physikalisches System
Quick-Sort
Endogene Variable
Summengleichung
Flächeninhalt
Last
Digitaltechnik
Gamecontroller
Resultante
Bit
Euler-Winkel
Zurücksetzung <Transaktion>
Selbst organisierendes System
Gruppenoperation
Zahlenbereich
Code
Computeranimation
Datenhaltung
Strategisches Spiel
Systemprogrammierung
Selbst organisierendes System
Datensatz
Syntaktische Analyse
Prozess <Informatik>
Code
Gruppe <Mathematik>
Stichprobenumfang
Skript <Programm>
Zeitstempel
Verband <Mathematik>
Softwareentwickler
Default
Transaktionsverwaltung
Gerade
Schreiben <Datenverarbeitung>
Sichtenkonzept
Datenhaltung
Systemaufruf
Ausnahmebehandlung
Physikalisches System
Biprodukt
Frequenz
Gruppenoperation
Konfiguration <Informatik>
Objekt <Kategorie>
System F
Transaktionsverwaltung
Dienst <Informatik>
Datenfeld
Rechter Winkel
Konditionszahl
Strategisches Spiel
GRASS <Programm>
Schlüsselverwaltung
Fehlermeldung
Softwaretest
Zurücksetzung <Transaktion>
Selbst organisierendes System
Datenhaltung
Klasse <Mathematik>
Abgeschlossene Menge
Ruhmasse
Kartesische Koordinaten
Code
Quick-Sort
Computeranimation
Internetworking
Rechenschieber
Transaktionsverwaltung
Prozess <Informatik>
Vorzeichen <Mathematik>
Rechter Winkel
Mustersprache
Figurierte Zahl
Tabelle <Informatik>
Subtraktion
Sichtenkonzept
Konvexe Hülle
Stab
Abfrage
Fortsetzung <Mathematik>
Physikalisches System
Biprodukt
Computeranimation
Zeitstempel
Datensatz
Minimalgrad
Flächeninhalt
Zoom
Biprodukt
Information
Störungstheorie
Aggregatzustand
Resultante
Einfügungsdämpfung
Prozess <Physik>
Gewichtete Summe
Gemeinsamer Speicher
Blackbox
Schreiben <Datenverarbeitung>
Kartesische Koordinaten
Fortsetzung <Mathematik>
Computeranimation
Richtung
Eins
Streaming <Kommunikationstechnik>
Softwaretest
Reverse Engineering
Code
Existenzsatz
Mustersprache
Zeitstempel
Tropfen
Default
Gerade
Auswahlaxiom
Softwaretest
Schreiben <Datenverarbeitung>
Parametersystem
Addition
Befehl <Informatik>
Computersicherheit
Datenhaltung
Gebäude <Mathematik>
Magnetbandlaufwerk
Systemaufruf
Ähnlichkeitsgeometrie
Strömungsrichtung
Quellcode
Biprodukt
Dateiformat
Ereignishorizont
Teilbarkeit
Konfiguration <Informatik>
Dienst <Informatik>
Betrag <Mathematik>
Rechter Winkel
Strategisches Spiel
Computerunterstützte Übersetzung
Ordnung <Mathematik>
Verzeichnisdienst
Message-Passing
Zeichenkette
Luftreibung
Gewicht <Mathematik>
Selbst organisierendes System
Klasse <Mathematik>
Automatische Handlungsplanung
Zellularer Automat
Mathematische Logik
Code
Zeitstempel
Bildschirmmaske
Datensatz
Informationsmodellierung
Zeitreihenanalyse
Bereichsschätzung
Endogene Variable
Vererbungshierarchie
Zusammenhängender Graph
Biprodukt
Cluster <Rechnernetz>
Speicher <Informatik>
Datenstruktur
Ereignishorizont
Grundraum
Analysis
Schreib-Lese-Kopf
Einfach zusammenhängender Raum
Default
Mailing-Liste
Automatische Differentiation
Physikalisches System
Elektronische Publikation
Differenzkern
Mereologie
Streaming <Kommunikationstechnik>
Verkehrsinformation

Metadaten

Formale Metadaten

Titel Beyond Validates Presence of: Ensuring Eventual Consistency
Serientitel RailsConf 2017
Teil 62
Anzahl der Teile 86
Autor Unger, Amy
Lizenz CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
DOI 10.5446/31232
Herausgeber Confreaks, LLC
Erscheinungsjahr 2017
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract You've added background jobs. You have calls to external services that perform actions asynchronously. Your data is no longer always in one perfect state- it's in one of tens or hundreds of acceptable states. How can you confidently ensure that your data is valid without validations? In this talk, I’ll introduce some data consistency issues you may see in your app when you begin introducing background jobs and external services. You’ll learn some patterns for handling failure so your data never gets out of sync and we’ll talk about strategies to detect when something is wrong.

Ähnliche Filme

Loading...