Convergent/Divergent
Formal Metadata
Title 
Convergent/Divergent

Title of Series  
Author 

License 
CC Attribution  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2014

Language 
English

Content Metadata
Subject Area  
Abstract 
Whether you realize it or not, if you've built a rich web application in Ember.js, and you're sending data between clients and a server, you've build a distributed system. This talk will discuss the challenges of building such a system, specifically the challenges related to preserving consistency when dealing with concurrent actors. We will begin with a primer on the various types of consistency, covering topics such as eventual consistency and causal consistency, and then move on to discuss recent industrial and academic research that aims to solve some of these problems without synchronization, specifically discussing operational transformations and convergent and commutative replicated data types.

00:00
Slide rule
Divergence
Process (computing)
Hybrid computer
Divergence
Medizinische Informatik
Videoconferencing
00:43
Group action
Thread (computing)
Distribution (mathematics)
Model theory
Bit error rate
Database
Water vapor
Verteiltes System
Client (computing)
Computer programming
Software bug
Expected value
Medical imaging
Mechanism design
Hypermedia
Singleprecision floatingpoint format
Videoconferencing
Erlang distribution
Partial derivative
Physical system
Identity management
Computer icon
Theory of relativity
Building
Software developer
Coordinate system
Bit
Sequence
Category of being
Message passing
Process (computing)
Programmer (hardware)
Internet service provider
Order (biology)
Buffer solution
Arithmetic progression
Reading (process)
SocketSchnittstelle
Algorithm
Consistency
Serializability
Online help
Mass
Drop (liquid)
Event horizon
Rule of inference
Product (business)
Number
Term (mathematics)
Energy level
Data structure
Computerassisted translation
Form (programming)
Model theory
State of matter
Theory
Client (computing)
Basis <Mathematik>
Total S.A.
Line (geometry)
Parity (mathematics)
Cartesian coordinate system
Causality
Software
Universe (mathematics)
Design by contract
Data center
Window
Building
Digital electronics
State of matter
Multiplication sign
Sheaf (mathematics)
Design by contract
Insertion loss
Mereology
Replication (computing)
Formal language
Web 2.0
Programmer (hardware)
Mathematics
Casting (performing arts)
Strategy game
Synchronization
Analogy
Vulnerability (computing)
Predictability
Wrapper (data mining)
Determinism
Connected space
Computer science
Condition number
Lastteilung
Right angle
Physical system
Asynchronous Transfer Mode
Classical physics
Server (computing)
Functional (mathematics)
Freeware
Service (economics)
Virtual machine
Theory
Twitter
Causality
Readonly memory
Operator (mathematics)
Medizinische Informatik
Message passing
Linear map
Condition number
Operations research
Rule of inference
Addition
Multiplication
Consistency
Projective plane
Database
Cache (computing)
Logic
Synchronization
Object (grammar)
12:22
Commutative property
Complex (psychology)
Group action
Local area network
Database
Verteiltes System
Client (computing)
Faulttolerant system
Mechanism design
Different (Kate Ryan album)
Vector space
Error message
Social class
Physical system
God
Mapping
Electronic mailing list
Infinity
Database transaction
Replication (computing)
Category of being
Web application
Radical (chemistry)
Message passing
Process (computing)
Order (biology)
Boiling point
Arithmetic progression
Point (geometry)
Slide rule
Variety (linguistics)
Mass
Parallel computing
Event horizon
Rule of inference
Number
Frequency
Vector graphics
Energy level
Data structure
Data type
Stapeldatei
Key (cryptography)
Information
State of matter
Content (media)
Cartesian coordinate system
Limit (category theory)
Timestamp
Causality
Word
Software
Universe (mathematics)
Revision control
Lattice (group)
Topological vector space
State of matter
Multiplication sign
Function (mathematics)
Database transaction
Subset
Mathematics
Radical (chemistry)
Diagram
Process (computing)
Data conversion
Determinant
Algebra
Lattice (group)
Algorithm
Data storage device
Proof theory
Vector space
output
Right angle
Data type
Physical system
Fundamental theorem of algebra
Resultant
Reverse engineering
Server (computing)
Functional (mathematics)
Freeware
Idempotent
Web browser
Power (physics)
Revision control
Confluence (abstract rewriting)
Fluid
Causality
Operator (mathematics)
Medizinische Informatik
Condition number
Mobile Web
Operations research
Addition
Quadratic function
Validity (statistics)
Inheritance (objectoriented programming)
Database
Event horizon
Object (grammar)
Communications protocol
Identity management
22:34
State observer
Functional (mathematics)
Greatest element
Computer file
Maxima and minima
Set (mathematics)
Event horizon
Thresholding (image processing)
Computer programming
Element (mathematics)
Bit rate
Natural number
Analogy
Set (mathematics)
Negative number
Diagram
Arrow of time
Medizinische Informatik
Data structure
Lattice (group)
Boolean algebra
Programming language
Stapeldatei
Mapping
State of matter
Maxima and minima
Category of being
Predicate (grammar)
Network topology
Order (biology)
Computer science
Right angle
Internet der Dinge
Reading (process)
25:40
State observer
Presentation of a group
Building
Boolean algebra
View (database)
1 (number)
Set (mathematics)
Verteiltes System
Client (computing)
Open set
Computer programming
Set (mathematics)
Physical system
Exception handling
Vulnerability (computing)
Algorithm
NPhard
Dreizehn
Mapping
Block (periodic table)
Building
Coordinate system
Category of being
Web application
Process (computing)
Numeral (linguistics)
Right angle
Summierbarkeit
Quicksort
Mathematician
Arithmetic progression
Data type
Resultant
Reading (process)
Point (geometry)
Implementation
Consistency
Variety (linguistics)
Mass
Automatic differentiation
Number
Operator (mathematics)
Data structure
Addition
Consistency
Model theory
Database
Cartesian coordinate system
Tuple
Library (computing)
30:57
Freeware
Information
Ferry Corsten
Consistency
Projective plane
Electronic mailing list
Online help
XML
Mereology
Open set
Wiki
Mathematics
Root
Causality
Blog
Synchronization
Authorization
Videoconferencing
Medizinische Informatik
Series (mathematics)
Videoconferencing
Freeware
Reading (process)
Physical system
00:00
and the and this is the exact 1 of the things that we the and by
00:16
Chris this is my talk skull convergent divergent and there's a matter of on the 1st slide maybe by the end of the talk if you don't get it now you'll get it then that's at least if I do a good job of giving a talk and this is like a hybrid of my 2 favorite topics distributed computing end of French cinema so that self styled after a French cinema and jolly good and all that stuff so we can talk about that too if you don't like computers or things like that the day so
00:45
my Why am I here Why am I talking to you why should you listen to me and so I am see make on Twitter you probably see me to eat a lot in spurts and unseen again and I am an engineer Abacha technologies were I work on real and so daytoday underlying programmer and basically working on multi data center replication so how do we get data across like large replicated sites and ensure that we replicated in a consistent fashion and ensure that everything ends up the right way on the remote cluster so and there's a lot of interesting distributed computing challenges there but in addition to that I am against in the Georgia tech and where I do some masses stuff right now and just as fun I like it and you and and also among the podcast helping distributed and you may have a may not have heard of it and where we basically do video podcast from conferences or talk about the problems of distributed during with academic so we'll talk about problems of consistency from the commonality and consensus and things like that so in addition to that I am part of a research group with Basho our Basho others a research group that you call that the existing freak story and it's funded by the EU as a partnership between about 5 European universities and addition with industry partners being basher technologies of Rovio Entertainment and try for what we basically work on is trying is a three year funded project to try to figure out how to do largescale distributed computation without synchronization so 2 synchronizations really hard we'll talk about why it's really hard as part of this talk and we'll talk about the strategies that we try to use to make it so this is possible so 3 parts of the talk is broken into a motivation section while talk about the idea for the torque and we'll go through some distribute systems theory I can be really fast every topic and when you talk about could be an entire talk itself probably more considering that most of them are based on like 10 to 20 academic papers so honored you really fast job and if you wanna know more about it feel free to combine and eventually I'm going to propose a solution for what I think could be used as the basis for something like data something that provides a formally proving mechanism for having data structures that are shared across multiple clients on multiple participants in a distributed system ensure that values are always converge that so the motivation and so I did a lot of Member development about a year and a half ago maybe I don't remember when I originally started the Boston a group which rankorder relatives somewhere out there is the the main guy of now and who he does an incredible job and had Basho we built to average but applications and with like 0 comma decimal 9 or something and both of them we attempted he's ever data and we ran into a ton of problems 1 of these products ships the database actually bundle then there's in Erlang wrapper that provides rolling back and that serves up all the JavaScript and another 1 is attested lies that we use for validating our database in ensuring that we can ship a product when we get into new releases and so we recon 2013 which isn't severances go but is a conference at Basho puts on to talk about distributed computing problems and issues and our product and or competitor's product on cats retirement who's also in the audience somewhere and and he had a conversation at at at a bar about and his ever application use building on top of react and we talked for a long time about how I react provide certain guarantees that the very weak consistency guarantees and and how his application had a lot of closely related events and it was interesting to hear about how you know people result alike manually merging objects were trying to reason about how things are changing and it synchronizes across the replica circuit agonizing between clients and servers so that's kind of the motivation for this talk was that we had a good talk and I thought that there was and some stuff that was valuable to teach people and he demonstrates that the talks so generous have you know like it even when so the introduction of so much of what I'm trying to commit you today that you're building a distributed system and It's interesting has preparing for this talk is the 1st time I've given it and but I talk to a bunch of people the party last night and some people like obviously in building a distributed system and bunch mode people were like really where that actually mean what I have to worry about their so I expect this talk to be received to that you know like either way depending on your knowledge and there's such a diverse in background here that it's you know we'll have to see kind of where we have a few have any questions feel free to come up to me afterward to reply gonna run very close to the time so why do I say that you're building a distributed system so essentially and if you push state to the client and what databases do this if you pushed into the client you're eventually catching values and those values are gonna become stale and it doesn't matter if you know you are pushing stage but I will talk about that but it does matter few pushing updates it doesn't matter if like your polling the server you still dealing with data that is no longer your violating strict consistency there you essentially have a eventually consistent system because not everybody is going to observe those updates immediately and there you have all these problems the cache invalidations state management all the really fun problems in computer science at people like solving all the time right and a right that I mean that from the start so I and regardless of database isolation level I don't have the matter fusing Cassandra does matter fusing bold dB with serializability doesn't matter fusing post press with read committed the matter because that's only 1 part of your system that's only between like media servers in your back and you know how those server what if you have a load balancer and you ordinate reads and writes that so regardless of your database isolation levels soon as you have these clients part of the world that are going online coming offline insulating updates you have a distributed system and again if you're pushing state to the client to web sockets updates like that service and events you still get a deal with the same problems you deal with problems of message delivery of reordering dropped messages and you never failure was some update of our plight and updates on multiple times you have no causality between how those updates coming you can receive 1 before the other and these are
06:46
the kind of issues that I'm talking about drop messages reorder messages race conditions partial failures and cost emerge function I will talk about each of these so drop messages reorder messages this is the real thing this really happens to TCP you is it works really gone when spot on land you get all these other issues so if you're not familiar TCP in cast is a phenomenon where if you send a message and to bunch machines and expect and respond you can basically because buffers and switches to fill up and then you have packet loss than that Packer lost triggers TCP Congestion Control now TCP Congestion Control triggers things like niggling and slow start and that can cause your entire network to help crazy latency spikes were you have a time and then you have no traffic on the network and then a time because the switches are doing exponential backoff again alive lock scenarios you don't know so you know it's really hard to read about when the events will actually get to the people they are trying to push the events to in addition race conditions right yeah to clients and tried at the same value how do you know who went and you know without putting causality in your objection way reasoning about this kind of stuff and partial failures I go to send a request the database a hey increment this counter the database times out TCP times the connection of maybe this which dropped had I know if the server got an update enough and had I know it's safe to resubmit that update if that operation is not item it up so these are like a lot of the problems that we don't normally think about what a very very relevant but custom merge functions and customer deserved problem and this is going to become the subject of the talks follow custom merge functions like I go to get some state against it's made object to the server server that object is about anymore since me back I apply some change automatically based on some custom logic these are not formalized these things in the wrong usually they are wrong the 1st time you have to fix a bug and these things are problematic and you know the classic example customers functions is in Amazon's Dynamo database for the shopping cart they said well if I'm a concurrent adam remove all the headwinds right the customer says do we decided it doesn't appear for the car when they do they deleted again and answered that firmly acceptable that's what we have to do to make our shopping cart the online all the time and so they make a tradeoff right between availability and consists so this is gonna be mostly theory when a walk through a little bit of stuff than when I talk about the research that I do and we'll see why I think it's applicable them data set consistency while media from water to them going fast so what is consistency so consistently model use the Wikipedia definition you might find it interesting but it basically says is a contract between the programmer and the system and you know if you follow these rules you follow this contract to have some predictability out when you read values so when I talk about 3 consistency models the first one is stricter lies ability so lies below 1990 Hurley and Wang define this to reason about how how processes like with multiple threads work 2002 it's formally defined in terms of distributed systems by go with lunch and when essentially is is that you have a total order of all events in your system every event is in this sequence if I write a value everybody reads that value immediately doesn't matter and this is really hard this requires a lot of coordination if your clients that have values in you rights invaded the server and you want everybody you appear that can means that you can't read that value until every single client acknowledges so this is very hard to provide and so we've referred to analyze ability as a single server image you imagine that your programming on a single computer where there's a bunch of windows right it's not distributed it's not spread out so if we weaken that model we have eventual consistency which you hear about all the time this is you know reactors Dynamo this is Cassandra this is languages like balloon which are burning languages that use eventual consistency and lattice structures to assure deterministic programming and ensure consistency formalized levels in this right around the time SOS p paper for Dynamo says if no new updates were made to a given data item eventually I'll get the right value so eventually I'll see some progress in the values but I can guarantee that all see the change immediately so this is 1 of the more weaker forms of consistency that we have and finally we have the idea of causal consistency succumb consistency is read your own right to you may not observe a total all the events in the system the way they actually occurred in in reality will you do observe is that if I write a value and I go to read that value back immediately and guarantee that I will make that read and this is difficult to provide there's 2 common ways 1 is pushing state to the client so you can imagine a number data you can see the analog there is that if I apply update locally and send it to the server the server will eventually observe that update but I observe the up immediately super it preserves this idea of causality right and the other 1 is taking us if I made all my request the same server then they're all the same values from is that like you know servers die all the time and and so the train of we're talking about here when we talk about consistency essentially a safety versus lightness of Lamport says it and this is a good way to think about how your program works so safety says that nothing bad ever happens I'll never be a value that I shouldn't have done I never read a value that was earlier than a value that I observed line is being the property that says eventually I observe the crack values it's a progress property whiteness as eventually something that is going to happen I can make a guarantee of 1 that is but eventually will be there so this is a good way to think about and how we deal with consistency so consensus the house consensus so to provide things like strict consistency you need consensus consensus is the way this works so if you're not familiar with consensus and
12:23
distributed computing fundamental problem is you know we have to agree on a leader but the key here is safely provably safely say this safety property we after you know elect a leader committed transaction to a database observe some values so this is now this is boiled down to 3 things termination agreement validity determination basically says that all the processes that removing all of our clients will basically eventually agree on some value they will terminate the group property says that they all have the same value and the validity properties says that all observed the crew of value that's been proposed this is not very good at a consensus Ogren I always agrees on the number 2 if nobody ever wanted it to agree on the number 2 so these are very important properties so to boil this down we have a bunch of historical literature we can't really get and all of it but the 2 generals problem which basically says that it's proven that you can come to consensus safely between 2 parties that do agree on a value with only 2 parties because it's an impossibility proved because you essentially just get this slide lock of agreeing on agreeing on agreeing and green and yes and we generalize that the Byzantine Generals Problem this says that power in the event of faults especially byzantine faults had I guarantee that I how many people do I need to communicate together to actually agree on a value and in the presence of Sinatra number of faults and and these are all proofs and you know the Byzantine Generals Problem Byzantine fault tolerance the expanded to a whole nother class of errors that we normally don't think about an opera all like I have a bunch of servers on a LAN or I wanted communicating browser tabs and 1 of them sends a fake value 1 of them for to the value of bonds in the value when it was never requested the byzantine Failures or a whole class of errors that deal with like 40 messages omission and lying about the lying about messages and contents things like that so women to algorithms to solve this we're gonna talk about them really but I'm sure you've heard about all these things twophase commit threephase commit axis is 100 version the Paxos if you're interested is about 200 papers and we also have a raft which is basically a simplified Paxos but the general thing here is that these protocols to ensure that everybody agrees on a value need of majority of parties online to make that change and then when they do make that change the parties they're separated the party that might be in another group because the groups can communicate the party that doesn't majority can make progress so we don't want that right we wanna build online applications I can make progress when things are offline so we can rely and mechanisms like this so we can rely on things that require a large amount of consensus especially with the number of parties these algorithms become super complex so finally the thing will talk about is if we can't guarantee that we have this total ordering of events because we shuffle of our rights to some leader with you know high consensus what we need to do is in bed information that defines the causality that expresses the causality between events in our data structures so is
15:13
a bunch of literature here if you're interested this is 1 of the earliest papers Time Clocks and the Ordering of Events of is and this is Leslie Lamport in 1973 and he basically defined that you can use monotonically increasing timestamps counters essentially don't rely on lot wall clock time use counters and then have those counters increment as you send messages between processes and that allows you to find 1 partial ordering between processes that tells you what events happened before the other events so you can reason about how the update should be applied the problem is is that you can prove this relationship which is called happen before you can prove the relationship you can prove the converse but you cannot for the Commerce but you can prove the contrapositive so this isn't good enough this isn't good enough however we can build upon that structure and say will use these monotonically increasing times mobile vector clocks have lists of clocks that represent all of the events in the system and will mark it with the individual actors to every client every participant in the system the problem here is that in this allows us to basically define all the possible orderings of events in the system because there are many possible order this is great we also generalizes diversion vectors so we change the update rules but we use it to reason about versions of data items so in 1991 Bernhard adjourned bows and I think it's safer to the university in Paris and basically said well mathematically proved that if I don't store every single event and I don't store every single actor who did those events that is mathematically proven that there is no way to basically define this causal relationship that represents the ordering of events in the system as their importance and this is an important result because this means that we need to store all the actors but actors coming out right like a browser can to be an actor a mobile phone to be an actor so the final piece of work will talk about before you get to the research is this work done aversion vectors so in 2010 and so this is you see how is getting very new right we're like 73 2 slides ago now a 2010 and so don't basically say well what a limited number of actors I need a way to still reason about events that happen at the same actor had no way of knowing concurrent events at that point so we won't get how this works but basically stories and additional piece of causal information that allows us to reason about the same guy receiving 2 concurrent right how does it know which 1 should happen 1st so all of that leads me into the topic of what are the research were doing is and why we think it's super interesting so Werner talk about your duties that show of hands to people of Syracuse is the to bring good but that it have sold a complex of 103 replicated decides how we call them now and that was the original and the original aim is thus and their convergence and community of replicated data types and so the 2 different data types there is the convergence variety and there's the commuted a variety the what these are basically 2 flavors of CRT TVs won the state based among the operation based data structures that store something register it to be a list to be set to be mapped whenever an incentive structure it's they pay certainties why are these data should we ship them around all the clients we send the data structures to all the clients and then they have a merge function so this mathematically proven merge function will talk about why that's important and that allows them to merge the state of 2 and no matter if I get the updates from these guys before these guys were it does matter the ordering I always get the correct value the 2nd flavor which we're not going to talk about as much today is operation based and operations base sureties rely on a lot of them rely on Kozel delivery and but basically the general idea of an operations base year means that all the operations can lead to guarantee community and possibly for some subset of operations item potent and to guarantee that all the updates eventually end up with the right guy so given that this is a super so we have ton time will talk about the status your duties so what are the mathematical things that you need to know so tonsors talks in the Mathematikon algebra 1 so don't like something like scary and so monotonicity some monotonicity is a is basically the idea of that you have functions where you know as the inputs increase the output increase the a simple function the you know like if you imagine a quadratic function that is not monotonic and very simple principle so the things that we really really why I care about is associated with so if you remember from algebra whenever they teach that in here and you know and as I said is that the Timothy as we have a binary operation represented by the Dutch my god this too much mass fluid period so I guess so so the deuterium a binary operation and now the counter is gone for it I know how much time I have 11 minutes OK so so the infinity basically says that we have a binary operation so that for the dot represents and we say well if I have event and I've about why and batch those updates and then I apply event Z or I by do X. applied to the batch of wines I get the same value right so an example of something that that the example will talk about for associated but is addition right additions associated the matter the order that you add things then you get the same thing the that's property will talk about is community so community again binary operation and we have x and we a Y or Y with the function applied X we end up with the same value again edition is another example this fight for given ironfortified giving and right this is pretty simple and finally will talk about importance which basically is a binary function that takes x and x and again that's right so what why are these things important thing about those things I talked about at the very beginning the talk the problems of distributed getting right so the first one is associated I can receive events in batches that arbitrary should be only handle any batching lamenting that same value because this batch shifting can happen this thing can happen in network you get computers go down replicas take over this is totally possible and community obviously as possible race conditions obviously you know go a lot of race conditions go away if you consider that operation is a community because it doesn't matter if I send an event in this guy and he's in them and to me and we applied in the reverse order anything like that the order doesn't matter right so 3 people sending messages to 1 and I have to care about the race condition between those 2 finally I'm I'm poems we had to begin a message a volatile we get the same thing I mean this is like this is totally possible you should see this is pretty easy to understand so what's the important properties about and assisted to make misery and item so banner Johnson lets says where there's like more discreet map for moving up to and no college level 2 and so about a giant every lattices are basically a partial orders and that have a least upper bound function for performing emerge and I have a diagram so this is don't worry about all the fancy words but essentially if we can preserve these properties of monotonicity and preserve a Conf a a confluence property and we get to make this guarantee that these objects will converge so let's look at an example so
22:36
here's a little diagram that I made earlier today and what we have 3 events of the ANC so imagine that we wanna set so at the bottom we started with the sets be a and C and these arrows represent the various interleaving between these events and so we could say that if I had B I can observe the next Oregon observe a next of IoT I could observe the next I could observe the next 5 see I can observe a or B next and then eventually I'll get that set the a B and C so as this work so this function and this works because the union of sets is commutative of associative an item per right you know and only contain 1 element of that like it has a unique this property on the elements and its community of it doesn't matter the order I add items in the set all eventually get the rate value and its associated it doesn't matter how I batch of these events to do whatever I essentially going to get the same value so this is like that is the performance primer the banner joined several so another example that we have is implementing natural numbers you also can do this with decrementing natural number so if you remember natural 0 greater and you know we have 3 5 and 7 the fight you know 3 5 merge the file value 5 and 7 merciless at NYU and again we merge the value and this is because the merge function for the bandage isomer lattice on natural numbers are increasing natural numbers as max so we wanted to do decrementing that emerge from should be meant is also preserves the same properties finally I will talk about the Boolean lattice civilian lattice basically the lattice of start false progresses Tschira so a trees observed always observes the true so we see that here and obviously the merge function for a boolean lattice that we want a monotonically increase is or so another interesting property that we have about these data structures that we can map 1 to the other so if we have a monotonic function a function that always going to increase we can basically take this increasing natural number lattice and mapping into will let so we could say well 1 thing it's 5 since we know it's never going to get less we can have a predicate that says become true once it hits 5 once it's got less of a greater than or equal to 5 so that a monotonic function of a once it becomes true it stays true and that allows us to like lattices in the lattices which is very interesting thing when wanting to do justice to computing's so sign out on this is that there is very interesting research at UC berkeley on building a programming language based on these data structures so rather than express your programs as imperative this is what I want to do a B and C where you can say is I express relationships between things and as the values basically get thresholds your program basically proceeds so it's a it's written like an a Datalog it's a it's a Datalog with negation and call out and poor old analog litigation called Douglas and it's a very interesting reading you wanna know more about that so
25:25
let your example so I can't show an example of every surety but I can show you an example of the observer removed set and so this is a set of structure that allows me to add an item and then remove it at the last known observation of its edition so imagine we have 2 replicas
25:42
hopefully that's being Imagen we have 2 replicas so we have 2 sets we haven't had set and removes that so that's what's represented the the 2 tuple is basically going to be of value and the actor who added it so I will say 1 is in the set because it's in the ad on both sides so if I remove it I have another set that models the removals so I basically say I've observed a added at once so I'm going to remove it at once I've added to the Remove set on both sides so now if I can currently have the addition of 2 but to get added to this set and so gets under this set with the ID the ID number 2 I said in reverse performs sorry and then we added to that set so now only merge these replicas essentially what we have is this 1 in Turing the ad a ones the removes that so the result here is that a is still this because we observe the addition of 1 we observe the removal at 1 and now we observe the addition of 2 so since we continuously store all of the operations we basically get to do this merge functioning guarantee leakage commercial or a value and obviously the set you know like from a programming API point of view the set you going to get back is going to just have the value and not all the stuff that's not exposed to the so again another example is in the set a is the set its observed with an additional 1 but we basically add so concurrent ads we can currently removed a and add the again the values and of merging correctly we basically have 1 in the add and remove set 2 in the ETS so a is in the set so why do I think any of this is
27:18
important and so consensus is hard and this is really hard it's very expensive the algorithms are very errorprone implement and there are numerous raft implementation sorry Paksas implementations are correct Google wrote a paper about how hard it was to implement just to get working in the Stock consensus is really odd obviously spanners a result of this right and so what what we really want to do in our applications avoid coordination we want to make sure that we don't block on things when we don't have to we want the system to progress independently we want the system to be able to make progress in clients are offline we want the system will make progress when a portion of the clients are online we want all these guarantees so essentially what we're striving to build these applications is a weak consistency application something that's eventually consisting we want we consistency is OK if things are out of date for a while so cave clients have values that are correctly represented because that provide higher availability and allows the system to progress even when all the parties on online but we do want this convergence property that surety tease provide because it's provide a mathematical foundation that allows us to reason very easily about how things are going to end up is very determinist it's very predictable and it's in the informal method the nice it's nice to have things the proven by mathematicians and so you don't have to test them heavily so what's the conclusion I have 3 minutes left and again I'm going to say that you're building a distributed system these are things you should be looking at and is always I talked about is 1 example and there's plenty of other examples were actively working on this in our reactivates all of the work that we're doing is published most of this work is out in the open there's tons of libraries you should look at them as you read about how these things work they're incredibly valuable but the data types that we're putting in reactivity not overly are counter so we're counters that can only be implemented with account as I can be incremented and decremented and we have sets so we already saw an example is that there's an ad only set there's an never removes there and there's a observer move set and finally we have our registers so registers are basically values that don't progress there mutable values and we're believes on the believes to be the from true false false the mass is really interesting and maps really cool and this is what I'm super excited about it so we'll talk about that in a 2nd what am encourage you to do is look at this existing literature out when you you should be looking at some of these papers you should kind of look at how these datasets to use there's tons of examples of presentations on my coworkers have done at least 3 of 4 presentations at a variety of conferences on how Kingsbury's on a presentation on any that the depth and work on proving databases lose data and there's all sorts of stuff specifically at this conference here this is pat back 2014 so this is the conference that's happening in 2 weeks basically exactly to the sum of undiscovered is called principles and practice of eventual consistency and we're gonna talk about our mandate structure here and we have 2 papers as Basho except in there's about 10 other papers the accepted and what's really sad about the map is that we have a map that can compose values in the post your duties to the values of the Napa duties so this is really nice and you can use it to build very nice web applications you can use it to build lots of interesting data structures in databases and they match pretty well do the job Republic literal right leg so these a cool things and they're very you this is how new April 13th 2014 getting is so new it's in the future OK so that is phenomenal and yeah and having a read my presentation that conference yet so I get so eventually get so this is
30:59
the thing free project is the project there's a bunch of papers published we're wiki exit would be part of it is public I think not all of it but there's a ton of information about who the people are involved it's a lot of people on the original series work march appealer prose bacarro new this year but a lot of the original authors of Memorex there were sky and so yeah and if you want any information about anyone help understand papers India bunch where I will talk about computers so and final thing distributed if you're interested in knowing more about these issues is a concept that you don't know what a start you know how to do it I am on my blog of a reading list of recommended papers on how to start understanding distributed systems and we have a bunch of podcasts the videos you get to see people too nice there's academics is not academics there's Peter Bayless rind auslese look like in the distributed and community and we talk about consensus consistency causality what it means philosophy computers Math and French roots so they much at
32:05
the end of the way to the end and then the the the the the the school of the the the ability to understand and the and