We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Distributed locks with Python and Redis

00:00

Formal Metadata

Title
Distributed locks with Python and Redis
Title of Series
Part Number
103
Number of Parts
173
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceBilbao, Euskadi, Spain

Content Metadata

Subject Area
Genre
Abstract
Sebastian Buczyński - Distributed locks with Python and Redis Traditional methods of coping with concurrent programming problems are well-known and described in literature. Many programming languages, including Python, contain in their standard libraries tools and primitives such as semaphores and can spawn threads or subprocesses. However, in the face of increasing interest in service oriented architecture and building distributed systems, that span across many independent server nodes, emerges a need to adapt traditional solutions, so they can be applied in the new environment. In this talk I will share my experiences gathered during building a modern contact center - highly concurrent system, which requires certain resources to be accessed exclusively by several self-contained components.
Keywords
51
68
Thumbnail
39:40
108
Thumbnail
29:48
Red HatGoogolInformation systemsTabu searchBoom (sailing)Water vaporMultiplication signRoboticsProcess (computing)WordJSONLecture/Conference
Context awarenessMessage passingSoftware maintenanceProduct (business)Parameter (computer programming)Token ringSoftware developerTelecommunicationFocus (optics)Basis <Mathematik>QuicksortHypermediaSystem callJSONXML
GoogolBoom (sailing)Message passingData structureDatabaseLoginString (computer science)Cartesian coordinate systemCASE <Informatik>Key (cryptography)Video gameObservational studyData storage deviceImplementationAreaDuality (mathematics)Group actionLecture/Conference
Metropolitan area networkData storage deviceMessage passingElectronic mailing listMultiplication signBinary codeSemaphore lineMultilateration
GoogolBoom (sailing)CASE <Informatik>Thread (computing)Connected spaceDatabaseObservational studyDatabase transactionRight angleContext awarenessLecture/Conference
Task (computing)Context awarenessEmailTask (computing)1 (number)System callData conversionDependent and independent variablesMessage passingFamilyTap (transformer)Carry (arithmetic)Data miningComputer animationProgram flowchart
GoogolBoom (sailing)Task (computing)Port scannerMusical ensembleSimilarity (geometry)Water vaporTask (computing)Physical systemData structureForcing (mathematics)Client (computing)Lecture/ConferenceJSON
GoogolBoom (sailing)Client (computing)Queue (abstract data type)Task (computing)Different (Kate Ryan album)Lecture/Conference
Port scannerTask (computing)Musical ensembleData managementView (database)Point (geometry)Task (computing)Parallel port
GoogolBoom (sailing)Server (computing)Event horizonTask (computing)String (computer science)System callBinary fileSemaphore lineData conversionMultiplication signEmailThread (computing)Centralizer and normalizerMehrprozessorsystemStandard deviationTask (computing)Library (computing)Semaphore lineLoginRadiusState of matterDifferent (Kate Ryan album)Data managementServer (computing)Lecture/Conference
GoogolBoom (sailing)Maxima and minimaState of matterTask (computing)Set (mathematics)ImplementationString (computer science)Total S.A.LogarithmMereologyNewton's law of universal gravitationSystem callCASE <Informatik>State of matterDatabase transactionThread (computing)Group actionTask (computing)MereologyStaff (military)Phase transitionRight angleStreaming mediaInternational Date LineConnected spaceDatabaseSound effectData managementMultiplicationString (computer science)Control flowMultiplication signWeightCondition numberNoise (electronics)Message passingKey (cryptography)Event-driven programmingSemaphore lineGraphical user interfaceSingle-precision floating-point formatArithmetic meanLecture/ConferenceJSON
GoogolBoom (sailing)Block (periodic table)Curve fittingElectronic mailing listSemaphore lineSystem callTask (computing)Information systemsImplementationMereologyMathematicsMultiplicationScripting languageMultitier architectureComputer iconEuler anglesState of matterPort scannerSemaphore lineSoftwareOffice suiteElement (mathematics)Scripting languageInstance (computer science)CASE <Informatik>WeightControl flowState of matterMultiplication signMereologyVariable (mathematics)Right angleBlock (periodic table)BlogProcedural programmingElectronic mailing listNumberCartesian coordinate systemTask (computing)System callOperator (mathematics)Maxima and minimaExterior algebraBit rateBinary codeClient (computing)LengthWordLoginAtomic numberCondition numberObservational studyPointer (computer programming)Lecture/Conference
GoogolBoom (sailing)State of matterDatabaseMultiplication signWater vaporLecture/Conference
Set (mathematics)Right angleArithmetic meanJSONComputer animation
GoogolBoom (sailing)Condition numberLecture/Conference
GoogolBoom (sailing)ImplementationJSONComputer animationLecture/Conference
GoogolBoom (sailing)ImplementationReduction of orderClient (computing)LoginCodeCASE <Informatik>Social classServer (computing)Right angleMereologyJSONComputer animationLecture/Conference
Semaphore lineForcing (mathematics)Endliche ModelltheorieInstance (computer science)Uniform resource locatorJSONComputer animation
GoogolBoom (sailing)Limit (category theory)Windows RegistryReplication (computing)Instance (computer science)Different (Kate Ryan album)Point (geometry)Distribution (mathematics)Process (computing)Single-precision floating-point formatGene clusterLecture/Conference
Instance (computer science)JSONComputer animation
GoogolBoom (sailing)CASE <Informatik>Goodness of fitPhysical systemLecture/Conference
EmailMultiplication signRight angleJSONComputer animation
GoogolBoom (sailing)Observational studyMultiplication signTask (computing)Pairwise comparisonEndliche ModelltheorieArithmetic meanVideo gameLecture/Conference
Red HatRobotComputer animation
Transcript: English(auto-generated)
So, well, thank you for coming, hola, everyone. Did you notice that there are three talks at the same time with word distributed in name? The EuroPython folks have some sense of humor, for sure. Well, first of all, I would like to introduce myself,
as it was already did, so I'm a software developer and a maintenance guy at Focused Telecom Poland. My main toys are Geovent, Twisted, and Celery. What we are doing at the company is to providing a business-to-business
telecommunication solutions, right? We host teleconferences, we sell web access, but our main product is a contact center. So, nowadays you have multiple ways to communicate with each other. There are social media, there are SMS text messages, well, there are emails, and there are phone calls.
So, a contact center is a way to unify all this, to gain a much better contact with your customer. So, throughout this talk, I will show you the principles of implementing distributed logs
in Redis using Python, and show the usage in a real-life application. So, does anyone know what Redis is? Okay, there are some hands, so I will
quickly refer to that. The Redis is an OSQL database. It's basic key-value storage, but the key can be only a string, but the value can be a more complicated data structure. For the sake of this talk, most important are string and list.
They are pretty much like Python counterparts. And also, Redis has a simple support for a message passing. So, first of all, I will introduce some case study. Then I will show you how to implement correctly
distributed binary semaphore, so-called mutex, and later on, semaphore. And finally, I will give you some tips, and there will be some time for questions. So, first of all, this is a loose definition that I came up with a few days ago. The gido on his talk yesterday
has gave a great example of locking and keeping safe the reference counters. The other case would be that we have a shared database connection between threads in Python, and we would not like to other thread
sending some SQL comments while the other thread was trying to carry out a transaction. So this is basically what lock is. It's to protect a shared resource. So, the case study will concern a contact center.
Hope nobody expected that. So, in the contact center, work people, of course, and we refer to them as agents. So, an agent can answer on originate phone calls,
email messages, text messages, or carry out chat conversations, and this kind of stuff. Well, and my advantage of contact center is that it automates all of this kind of work. So, it keeps the agents busy. So, they are not looking at the floor or sailing.
Okay, but to be this concept more familiar, I have replaced agent with a giant W that stands for a worker, and the E are tasks.
So, you could say that this looks like a task QE. Who doesn't know any task QE solutions, like Celery, RQ, Germans? Are there anyone? Hasn't heard any one of them? No, there are, it's good.
So, basically, you know the concept, and this is the problem we are trying to solve. Yeah, it looks similar, you know the stuff. You can apply a working solution, as it will be great, right? Wrong. There are a couple of things that you have to worry about. First of all, a worker is an autonomous creature.
And it can choose whether to work or not. He can simply log out of the system, and he will not be available anymore. And this is a normal situation. This is not a failure of system or some kind. You cannot force somebody to stay logged in and work,
at least not in Poland. Well, the tasks must be prioritized, and this is a heavily desired feature from our clients. So, yeah, I don't know if any other distributed task queues like Celery support this.
As far as I know, in Celery, you can partially achieve this by routing to different queues, and that's all. So it's also not sufficient. And the last, mostly mind-blowing, is that one worker can handle multiple tasks simultaneously.
And from, I don't mean concurrently, but simultaneously, so in parallel, at least from a point of view of task manager. So this will be viewed that way. Of course, Q1 cannot carry out two phone conversations at the same time. But why not write an email and speak to somebody?
That's fine. So our solution depicted as a whole looks for some centralized task manager, and it's written using Twisted. And the Redis will serve us a logs server, and we'll hold them all.
So the first thing I would like to say is about binary semaphore. It has only two states. It can be locked or unlocked. To lock it, I will refer to this as acquiring, and to unlocking as releasing. This is the same as the, in standard Python library.
Who has background with multi-processing or multi-threading library? Yeah, that's quite a lot. And who was on the Monday talk on this? Yeah, that's quite a few things. So, and it will be stored as a Redis string,
and all calls to acquire it or release will be non-blocking from Python. So the first use case is about keeping state of a worker, whether he's busy or idle. So, meaning that the lock is released,
it's the same as idle, and so on. And the lock will be acquired when a task manager will sign a task, and it will be released afterwards, so it's over. And the second case is about preventing race conditions.
The one I mentioned that a worker would like to log out sometimes, and I don't know, go for a break or something, that he can do it using this mechanism, because it can acquire a lock before a task manager do. Or it will simply do not release it
after finishing its previous task. Okay, so, since this is storing this Redis string, we would use two Redis commands, a set to store a lock, to set the string, and a get to retrieve it.
What we need is atomicity. Every single command of Redis is an atomic one, so no one else can disrupt your data. So, the first draft, we have a Chrome disk, it was something like that. We're getting the log,
checking if we can lock it actually, and then do it. But there's a problem, because between get and set, there is some instruction running, and this as a whole is not atomic,
so someone else might have changed this in the meantime, and this is a problem. And it's not acceptable then. So we introduce some kind of Redis transactions. This is not a real transaction in a sense of SQL databases, but it allows you to execute a bunch of commands
in an atomic way. So, we issue the command watch, and giving the key name. So this will be carefully watched during the transaction. And finally, if we cannot do anything, then we unwatch it. So this connection will be free of side effects later.
The next step will be to introduce the transaction denotions, multi and exec. These actually will denote a transaction, or the things that should be done conditionally,
only if a watched key has not been changed. So I guess that's quite simple stuff. Okay, the important part that task manager sometimes would like to know whether the state changed. Well, when we log the worker, he will find it eventually.
But when we do not, when we are freeing it, it must know when the worker has freed, so we can actually perform some action. Like for example, look for the next task for it.
And to receive the notifications, we issue a subscribe command. But to be more precise, on the Python side, so this is a task manager. As I stated, it's written in Twisted. There is some kind of magic like ready subscriber
will handle the subscribe and actions for them for you. And this is the event driven. You see the very ugly method name message received. Which is written camel case. And by overloading this method, you actually define the behavior for this.
So the second part will be about distributed semaphore. So the semaphore is different from locking that way, that it maintains an internal counter. So it means that it can be released or acquired many times.
And if we define a semaphore with counter two, we can acquire it twice, and also release it twice. But the original counter cannot be exceeded by further calls. And this will be stored as ready's list, and the calls to it will be blocking.
So the use case is that we can handle multiple tasks at the same time. As I stated before, the usual lock is not sufficient, and the use of a bunch of binary semaphores would not be sufficient to.
This is similar to string operations, but now we are using rpush, which stands for write push, so it will append an element to a list. And the rpop, which stands for blocking write pop, so it will pop an element from a list.
And rpush will serve as a release operation, and the blocking write pop as the acquiring operation. So when there's nothing there, and it means that semaphore has been acquired a maximum number of times,
the command will block until there's something there. So in the Python part, we did not change anything, because it reacts only on events. Eventually you can implement your checking, whether the semaphore is free or not.
And there are cases when you might need a state of semaphore after changing it. You could use previously showed a multi-exec block, but that's not always the case. An alternative is to write a simple Lua script. You see that it's very much like Python in this case,
except this word local, so it means we are allocating a local variable, so it will be garbage collected after this block. And this script only releases our log. And return a length that we have left.
Okay, but there's a warning. Multi-exec blocks and Lua scripts all are atomic in Redis. The problem is that blocking invocations in atomic way has not sense this time.
So it would mean that the Redis instance has to be blocked forever to serve this one. So making barepop inside multi-exec will return you nil, so it's none in Python client. And invoking such a Lua script,
Redis will respond with an error. Alternatively, you can use an rpop, so it's non-blocking feature. It will return you either nil if log cannot be acquired, or a value if it has. Some final remarks before you rush
and implement these solutions in your software. Please care about starting conditions. Make sure that the logs are properly instantiated before using them. And study carefully control flow in your application to not introduce too many logs or too less.
And also, please work up a restoring state procedure. When something goes wrong, there might be cases when the state within Redis is not finally consistent. So you might want to fix that some way. For example, we persist a state in SQL database
as some kind of history of activity of given worker, so we can restore a state properly after some kind of failure. So, well, that's all for me. And now it's time for some questions, if you have some, of course. Thank you.
Hello, have you considered using setnx for the log, the setnx command?
You mean setnx, right? Yes, yes. For log? Yes. Yeah, we have read about this, and this is described also on Redis documentation as far as I know. But, well, this is an expiring command as far as I know, yeah?
You can use it as a non-expiring command. You don't need to set an expiring. Yeah, but somehow it didn't fit our needs. I can't remember now why, but no, it wasn't sufficient for this kind of locking. Yeah, but that's an alternative for sure.
Okay. Have you tried to look at other implementations besides the built-in lock implementation in a Redis client? You mean besides Redis, right? Yeah, besides the official Redis client. Well, ideally it would be to write
your own logs server in Python, but the case is that some code base we have, and the most is written in PHP, so it has to be something outside. Ah, I see, okay.
Hi. When you're talking about distributed semaphores and locks, are we talking about, so are the locks in one location, so in one Redis instance, and the workers can be distributed? Or are the locks in semaphores?
Actually distributed? Yeah, that's a good question, and I hope that someone asked that. Well, basically we use a single Redis instance for serving our locks, and due to different limitations, we do not need more resources, but distribution you can achieve
is by either sharding Redis instances, and this is done automatically by using Redis clusters starting from Redis 3.0, or replication, which is also in this kind, but when I prepared this, I meant distribution by,
yeah, the workers are distributed, and these are different kind of resources that might achieve a lock, sorry, acquire. Does this answer your question? Okay, thanks. Any more questions?
Have you tried, have you thought about using Zookeeper? Because I'm not entirely familiar with it,
but my colleagues at work use it, and it's supposedly that, which is synchronizing few distributed instances of anything to only get one resource at a time. Well, we didn't,
and I have read about something like Zookeeper just yesterday on O'Reilly stand, but I will surely consider that, so thanks for a suggestion, but no, I don't have any experience with that. Me too, but it seems like a good idea, so. Yeah, if it would be simpler or better in some case,
why not replace? Yeah, because it's used supposedly to, you know, select the name node in a Hadoop cluster, for example, so this is also a really distributed system, and only, you know, you have to think about it
and synchronize it, so Zookeeper is supposedly for that. Yeah, I will investigate this for sure, and if I came up with something interesting, I'm sure I will give a talk at the next occasion, not this year, I think. Okay. Thank you.
Any more questions? Well, I do have one though. Have you actually ever seen anyone talking on a phone and writing an email at the same time? Yeah, myself. Do you manage to actually make sense? Well, you know, there are some studies
that shows that a human that is carrying out a few simultaneous tasks is completely reduced by its brain capabilities, so there was a comparison in the book I read that he behaves as if he had a brain of a six-year-old child.
So, you know, I don't find personally that multiple tasks at the same time are a super killing feature, but that's what the business wants.
Yeah. Yeah, because they respond with some delay, so yeah. Maybe. So, any more questions? I guess we're good then. Thank you again, Sebastian.