We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The tricky truth about parallel execution and modern hardware

00:00

Formal Metadata

Title
The tricky truth about parallel execution and modern hardware
Title of Series
Number of Parts
50
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer
Production PlaceMiami Beach, Florida

Content Metadata

Subject Area
Genre
Abstract
Concurrency and parallelism in Ruby are more and more important in the future. Machines will be multi-core and parallelization is often the way these days to speed things up. At a hardware level, this parallel world is not always a nice and simple place to live. As Ruby implementations get faster and hardware more parallel, these details will matter for you as a Ruby developer too. Want to know about the pitfalls are of double check locking? No idea what out of order execution means? How CPU cache effects can lead to obscure crashes? What this thing called a memory barrier is? How false sharing can cause performance issues? Come listen if you want to know about nitty gritty details that can affect your Ruby application in the future.
Computer hardwareBitPresentation of a groupTerm (mathematics)Computer hardwareMultilaterationDifferent (Kate Ryan album)Parallel portForm (programming)
BitMultiplication sign
Inclusion mapLine (geometry)BitCASE <Informatik>Order (biology)MathematicsCausalityComputer
CausalityNumberMixed realityCausalityConcurrency (computer science)String (computer science)CodeComputerBitShared memory
BitBefehlsprozessorCASE <Informatik>Variable (mathematics)Thread (computing)State of matterOrder (biology)Standard deviationComputer animation
BefehlsprozessorCASE <Informatik>CodeString (computer science)Order (biology)Computer configurationBit rateComputer animation
CASE <Informatik>ComputerCompilerExpressionMathematical optimizationSlide ruleWordString (computer science)CodeOrder (biology)Presentation of a group
Mathematical optimizationCompilerDifferent (Kate Ryan album)Virtual machineArithmetic meanPhysical systemSound effectComputer programmingCodeLevel (video gaming)Single-precision floating-point formatThread (computing)Process (computing)Multiplication signContext awarenessStatement (computer science)Concurrency (computer science)Order (biology)
CompilerString (computer science)CodeCausalityOrder (biology)Compilation albumVirtual machineCASE <Informatik>Sound effectPhysical systemDivisorIntelLaptopComputer animation
EmailVolumeSoftware developerSoftwareDifferent (Kate Ryan album)Structural loadSound effectSemiconductor memoryVirtual machineDivisorLaptopIntelConservation lawCodeComputer architectureData storage deviceStructural loadVolume (thermodynamics)Operator (mathematics)SpeicheradresseUniform resource locatorLine (geometry)SoftwareString (computer science)CASE <Informatik>Term (mathematics)Flow separationDifferent (Kate Ryan album)AreaComputer animation
ArchitectureSemiconductor memoryLaptopData storage deviceBefehlsprozessorCache (computing)Semiconductor memoryArmStructural loadPhysical systemWeb-DesignerComputer architectureCASE <Informatik>Web pageRevision controlOperator (mathematics)Tablet computerMoore's lawConnected spaceBitSoftware developerSinc functionWeb 2.0Physical lawData structure
BefehlsprozessorLevel (video gaming)Multiplication signBitSemiconductor memoryCore dumpComputer architectureCache (computing)Different (Kate Ryan album)Descriptive statisticsCycle (graph theory)Shared memoryIntel3 (number)
Cache (computing)Cycle (graph theory)Structural loadSemiconductor memoryOrder (biology)SpeicheradresseDifferent (Kate Ryan album)BefehlsprozessorCache (computing)
BefehlsprozessorData storage deviceIntermediate value theoremBitBuffer solutionLaptop
BefehlsprozessorPhysical systemCodeResultantString (computer science)Computer animation
BefehlsprozessorSoftware developerSoftwareVolumeDifferent (Kate Ryan album)Structural loadOrder (biology)CASE <Informatik>InformationIntelSlide ruleStructural loadData storage deviceReading (process)Computer animation
Data storage deviceCodeVirtual machineBefehlsprozessorStructural loadSemiconductor memoryVapor barrierComputer animation
Semiconductor memoryRead-only memoryConstructor (object-oriented programming)BefehlsprozessorAbstractionReal numberLevel (video gaming)Vapor barrierCompilation albumSemiconductor memoryAreaPattern languageRevision controlCodeOrder (biology)Data storage deviceStructural loadSemantics (computer science)Computer hardwareFormal languageComputer programmingLine (geometry)CASE <Informatik>Doubling the cube
Pattern languageGraph coloringObject (grammar)Exclusive orPhysical systemMultiplication signConstructor (object-oriented programming)CASE <Informatik>CodeWechselseitige Information
Wechselseitiger AusschlussInstance (computer science)SynchronizationVapor barrierCache (computing)CompilerLie groupThread (computing)Social classBuildingProgramming languageObject (grammar)Attribute grammarSynchronizationCASE <Informatik>Mechanism designMereologyMultiplication signInstance (computer science)Concurrency (computer science)Exception handlingFormal languageOrder (biology)CodeShared memoryConstructor (object-oriented programming)ArmWordBitPhysical systemData storage deviceVapor barrierData structureDifferent (Kate Ryan album)BefehlsprozessorCache (computing)Doubling the cubeMultilateration
Physical systemSemiconductor memoryThread (computing)Core dumpBefehlsprozessorDifferent (Kate Ryan album)Computer animation
MathematicsDifferent (Kate Ryan album)Level (video gaming)Total S.A.NumberSemiconductor memoryObject (grammar)CodeMultiplication signBenchmarkPlastikkarteThread (computing)Address spaceChord (peer-to-peer)SynchronizationSpeicheradresseBefehlsprozessorCore dump
Thread (computing)Variable (mathematics)Data structureCASE <Informatik>BenchmarkCache (computing)AreaBitObject (grammar)Physical systemSpeicheradresseNumberSingle-precision floating-point formatComputer animation
Line (geometry)Cache (computing)Cache (computing)Semiconductor memoryNumberLine (geometry)Network topologyVariable (mathematics)Data storage devicePhysical systemComputer animation
Thread (computing)Cache (computing)Line (geometry)Category of beingCASE <Informatik>Object (grammar)Semiconductor memoryBenchmarkComputer animation
Thread (computing)Semiconductor memoryNumberDifferent (Kate Ryan album)CodeComputer animation
CodeThread (computing)CodeContext awarenessBit
Strategy gameSpeichermodellMathematical optimizationCache (computing)Semantics (computer science)BefehlsprozessorFunctional (mathematics)ImplementationSoftware developerFormal languageSemiconductor memoryOrder (biology)Level (video gaming)Strategy gameMathematicsTerm (mathematics)Exterior algebraSpeicherbereinigungPosition operatorObject (grammar)Greatest elementEndliche ModelltheorieBitStatement (computer science)CASE <Informatik>Workstation <Musikinstrument>Rule of inferenceLibrary (computing)Virtual machine
Inclusion mapThread (computing)BitCode2 (number)Cache (computing)Greatest elementMoment (mathematics)Disk read-and-write headSoftware bugObject (grammar)DebuggerCrash (computing)SynchronizationView (database)Network topology
Level (video gaming)
Transcript: English(auto-generated)
Hello everybody, I'm going to pronounce my own name so everybody knows how to pronounce
it. It's always a challenge. It's Dirk-Jan Buschink. I'm used to all kinds of different butchered forms of it so don't worry about it if you want to ask questions later. So what I want to talk to you today is about, I call it the tricky truth about parallel execution in modern hardware.
It's a bit of a broad term and I'm going to talk about all kinds of behavior that might seem crazy but can be interesting and actually matter even to Ruby these days. So it's a bit of a journey for me. It's been a thing that I've touched upon for at least the last few years and
sometimes less, sometimes more and even today, like even looking into stuff for this presentation, there were still things that only then I understood better and actually correct. So I'm not going to tell this is easy and you should all understand everything and
all the nuances after this talk. It's something that takes time and is something that you should be interested in. So I want to start the journey with actually a step across the way, which was a getup commit I made, I think it was two years ago, according to the screenshot at least it
was. And it was actually a commit to Rabinius and it was something that actually, it only, if you look at it, it's a three line comma and it actually only changes one line. And this talk is a bit of the story about how I came to, in this case, add this one line in this specific example.
So, one of the things I want to talk about first is a very basic concept, it's called causality. It's basically things happen in a certain order, in this case, in your computer. So let's talk about, you know, reasonably trivial code where basically we say, okay,
we have a number and we have a string and we have a variable a and b and we just set them. You know, so far so good, so far it's pretty straightforward. But we're gonna make it a little bit more complex, we're gonna add some stuff to the mix and that is parallelism, concurrency.
So we're gonna change it a little bit, we now make a and b a shared variable. So everybody knows shared mutable state is bad. This talk will hopefully, you have a little bit better of an idea of what all, except for the standard problems with shared mutable state, what other concerns are with
it. So, but, nevertheless, we're just using mutable state because we got a problem that we want to solve in this way. So what we do is we change, we initialize first these variables, then we change them and we use them in another CPU or thread in this case. The question is, what can happen here?
What are possible orders in which you might see things? So, of course, there's this. We first initialize a and b, then we copy the values to x and y, and at the end, x is the empty string and y is, in this case, one. Note that actually swapped the order for the second case where we first load b and
then load a. So another way this could happen is we first copy these values, this code gets run first, and then we end up running the other two examples, and at the end, x and y are both zero.
There's, of course, more possibilities. There's this one that actually says x is zero and y is one at the end. So as you might have noticed so far, this is three possibilities. You can change them in different orders and all this kind of stuff, but there's only
actually these three options that should come out of that if you reason about this in a logical sense. But the question is, is that really what can happen on your computer? And there's actually cases how this can end up not being in the right order and
actually x ending up being an empty string and y ending up zero. The question is, what happens? I didn't copy this. No, I did, but it's actually a perfectly Dutch word. It's the only Dutch slide I have in my presentation because this is exactly the expression that we just say in Dutch.
What? It actually sounds almost the same. So why is this happening? What can do this? What can cause this stuff in your code? Well, there's a few things that can cause this. The first one I want to talk about are compiler optimizations. So if you imagine this code not, it could be Ruby,
it could be C, Java, whatever, the same principles apply. So there might be something that actually works in optimizing your code, so it doesn't just emit this direct code and run that at a low level in assembly on your machine. So it could be a C compiler, it could be a just-in-time compiler, it could be anything.
And one of the things to think about is, you know we had these two statements, and the question is, how different is this from doing this? You know, does this matter? Does this have an effect that is different on your system?
And that actually all comes down to the perception of what is the same, what is equal? And a lot of systems define what is equal in the sense of what is equal for just a single thread of execution. So in that context, this is perfectly fine.
You can swap these, and it doesn't make the meaning of your program in any way different. But actually, it doesn't work anymore if you consider this in a concurrent scenario. Because what then happens is we see, hey, we suddenly run this in a different order, and we get this example of x being an empty string
and y being zero, even though if we look at our code and reason about it in the order that we see things, this is not what is supposed to happen. So that's one example. So that's one cause of what can cause these problems, and that's in this case, your compiler. But of course your compiler is not the only thing
that can do that, because that would be too simple. There's another principle here that could have an effect on this. And I say could because it depends on a lot of factors. For example, let's think about the machine you have,
I think a lot of people have in front of them. Usually there are Intel systems, laptops still these days. So what does the Intel manual say about memory and running code and stuff like that? So actually Intel is actually pretty nice. It's fairly conservative. It doesn't do things that you might find confusing,
but it does say one thing, and that is that loads may be reordered with earlier stores to different locations. And that's from the Intel 64 and IA32 architecture software developed as manual volume 3A. It's a long term, but what does it mean? Well what it means is that if we look at code
like in a more assembly style of those examples we had, is that we now define this here as a load and a store operation. X and Y can be other memory locations, can be registers, but that's not, it's all about the As and the Bs here.
So what we see here is that okay, we can load A and load B and store them. And this still has the same examples. And if you look at that common, like that line from the manual you saw, there's not really a way, because the only thing the manual allows is that we could swap instructions
that store and load, like only a store and load, that actually use a different location. So if you wanna go back to having this example where we end up with X being an empty string and Y being zero, there's not really a way to do that here
because well we can see actually in all cases, we see that there's a load and a store in like the left lower corner that actually talk about the same memory location. So we cannot swap them around. But of course not all architectures are created equal.
This is where it gets interesting and that is that, like I said, X86, the thing you have in your laptop, it's fairly strict. It doesn't do a lot of things that could be confusing. But if you compare it to what you have in your phone these days, which is like ARM v7 architecture or newer even,
I think they're working on like version eight with 64-bit support, like the new iPhone has that, all that kind of stuff. And even ARM these days has multi-core systems. So on ARM actually it is allowed that we reorder loads among each other. So we can swap loads or we can swap stores and loads
or stores and stores. So if you look at that, you can see we can actually create a new example by swapping some instructions and actually causing the same problem we had before. So this is something that can happen on your phone
or your tablet or your whatever. So the question is why? Why do CPUs do this? Well, there's a few things that are going on. One of the things is that this is something they do for efficiency. So they can, some operations might be slower. Some of them might take longer.
And one of the things that instruction can be slower is for example a CPU cache. So we all know that if you have something that is slow, and in this case memory, memory is slow. You know, if you look at Moore's Law and you look at the past, we had CPUs and memory were much more on par.
But the speed at which CPUs got faster, it went a lot faster than memory. So the disparity between the speed of your CPU and the speed of memory has only become wider and bigger and bigger over the last years. So how slow is memory?
Well, we all know that if you have something slow that, you know, we gotta add a cache. That's the thing we do as developers. If you're a web developer and your web page is slow, you look at adding caches. If your CPU is slow, you look at adding caches. So what kind of caches do we have? In this case, I'm talking again about
the Intel Core Duo architecture. Basically, this is based on a reasonably recent Intel CPU. So we have very descriptive names, of course, because we have different levels of cache. Of course, level one. Level one is like really close to your CPU.
It's very expensive memory. It's not very big, and it's pretty fast. So if there's a memory value that's in there, we can get it in just four memory, four CPU cycles, which is reasonably fast. If it's in level two, it's a bit slower, but it's bigger, so we still have that. It takes 10 cycles.
L3 is even slower. It depends a bit on, because L3 is usually shared between CPU cores, so there are different timings depending on which CPU core actually manages that piece of the cache. So, and actually RAM is really slow. It's like hundreds of cycles, and usually it's even measured
in actual time, because they don't actually always match up with your CPU, because they have different timings and stuff. But, like I said, caching is hard. So we have to do all these tricks to keep this stuff in sync, and if you look back at these timings,
this could be a way to say, okay, so if your CPU has something that we have a load of two memory locations and one load is already available in our cache, and the other load is still further away in that other cache or in memory, we might want to start running the load that's far away first, so we don't waste CPU cycles
on waiting for that slow load later, so we might run stuff in a different order. That is actually one of the reasons that this stuff could be happening. But like I said, caching is hard. So one way we implement this is in the, or not we, that's a bit presumptuous, I'm not smart enough,
I know, I don't feel smart enough to work on this kind of stuff, but actually not designing CPUs, that's not my thing. I would love it if I could, but. So one of the things we use is a store buffer. It's actually a very small place in your CPU where we actually store intermediate values we might use up later. So I'm giving you another example.
This example is actually slightly different from the one we saw before, but it shows that even on your laptop, you can have issues with running code on your system and getting and seeing very weird results. So one of the things is, here we say, okay, we have A is one, and we set B to a string,
and in the end, the question is, what comes out? So here we have, again, X is zero and Y is one. And we have another example where we do it, of course, in a different order, and we get the enter string and zero. So then there's the case where we intermix them.
You can intermix in all these different orders. It doesn't really matter much. You'll always usually get an order like this. But if you think back about what did the Intel manual with the long name say, it said that we can actually reorder loads and stores. So if we go back to this slide, we can see, okay,
so what can we do here? We're allowed to reorder stores and loads, so let's do that. So we now say we do the loads first before we do the stores, and it means we have X is zero and Y is zero, which is not a very intuitive answer that your machine might be giving
if it's running this code on your CPU. So of course, we have a problem now, but what would your problem be without fixing it? So there's actually a fairly straightforward way to fix this, and that's a concept called memory barriers.
And basically what memory barriers say is that we have, your CPU has a specific instruction that says, I wanna make sure that this happens in the way, in the order that I want it to happen. So how does a memory barrier look like? Well, a lot of languages, like, you know, this is low level concern, so we might need C,
but well, C actually doesn't have this. There are built-in constructs in GCSE and compilers that could do you, but there's no like native thing in C that actually has that. C doesn't even tell, C doesn't even say anything about the semantics of this stuff. So you end up doing stuff like this,
inline assembly and trying to solve this problem. Well, it's not really elegant. If you're a Ruby programmer, that's not the thing. You know, we all know what Mats talked about, about the hardware abstraction and not wanting to think about, and well, this is actually an area where, in some cases, you might have to think about it.
So there is, on x86, there's three versions. Basically, sfence that says don't reorder stuff with, well, s stands for stores. I guess everybody knows what the l stands for then, that stands for loads. So don't reorder loads around it, and mfence is basically both. So mfence says, okay, we're not reordering anything,
any code that happens before this with any code after this. So I've been talking about this stuff, and it may look like a very, you know, artificial problems, constructed problems that are not real problem in actual real world code that is out there. And that's actually not true.
There's actually a pretty often used pattern that is broken in sometimes all these subtle ways that can be very complicated, and that's double-check locking. So double-check locking basically builds on the idea
that you have something in your system that is expensive to make. Like, you know, people know the singleton pattern. It's not usually a very good pattern, but maybe there are cases where you have an object in your system, and you don't want to build an expensive object every time because you might need it, I don't know, like 100 times in your Rails request, and you don't want to build
that expensive object 100 times. But, so you think about it, and you say, okay, that means I have a system that runs concurrently, so I have to think about mutual exclusion, so I'll put a lock around it. But that means that even if your object is very expensive, you still have a lock around it,
and a lock is actually not always that cheap either. So you have to think about, okay, oh, let's remove this lock. So you end up with code that looks something like this. So we store it, we have a mutex, so we can do the synchronization part, we initialize it, and we do this.
So I don't know. The thing is, we say, okay, we have an instance, and if we don't have it, we do a lock, and if we still don't have it, we make a new object. You can now see where the name double check locking comes from, because you see the same unless check twice. This is actually why I wrote it like this,
and not with a or or is, because this is a little bit more explicit. And the reason for that is that if you think about it, if two threads are running at the same time, both think, hey, the object has not initialized that yet. They both try to grab the lock. Only one of them succeeds, and if that object succeeds,
it would see, it would build the object, and then unlock and return it. Then the other thread sees, oh, this code is now unlocked, and I was waiting for it, so I can run this code. But if it would not check again, if some other thread had run it and initialized it, it would initialize the object again, and we would build it twice.
That might not be a problem, it might be a problem, but it's something, in this case, we don't wanna have. So there is a thing that is actually, and this solves this problem. There's actually something far more subtle going on here that you might not think about.
And that is if you remember what we just talked about, is that things might happen not in the order that you think they are. And in this case, maybe the compiler or CPU you're running this on tries to optimize something for you. And what it does, it actually ends up creating,
it's perfectly valid in that sense to say, okay, we are building a new instance. And you might ask, why did I put that constructor up there? And that's actually not just for fun. Because what might happen is that, because this stuff runs out of order,
it might even say, okay, I already assign this variable to instance, but I'll finish up the constructor later. So it actually might happen that another thread already sees that, hey, instance is there, so I don't have to initialize it.
But it also sees that bar is still nil and it might crash or throw an exception or whatever. So what we have to do is find some way to explicitly synchronize this. If you run on x86, it's good enough to insert, in this case, compiler barrier, which means that it's a way to say,
all right, our compiler is not allowed to reorder this, because it would happen if we reorder stores. And if you remember, x86 doesn't do that. But if you're on an ARM system, that's not enough. You actually have to do CPU instructions to do this correctly.
If you're thinking if you Google for double-check locking, like Google will auto-suggest double-check locking is broken in programming language x or y, because a lot of languages don't provide explicit mechanisms that allow you to handle this stuff.
So how does this all matter to Ruby? Because people always say Ruby is, you know, this is not the stuff we wanna worry about in Ruby. And for that, I have a little other example. And it goes back, it's called false sharing, and it goes back to the CPU cache problem
that we talked about before, the performance of a CPU cache. So imagine we're now doing something different. We have some concurrent data structure, for example, that we're using and we're doing things with. And we have this magnificent foo class that we need. And it has just a single attribute, in this case, A.
And we have two threads that are actually modifying this. We don't really care about that they actually have to interleave this or what. It's just that we run this stuff in two different threads on our system and it runs on two CPUs or cores or whatever. Usually cores, because that's what usually ends up happening.
So what we see here is actually we change the same value in memory. So if we think about this diagram again, we're actually changing a shared variable between two on the same object in the same place of memory.
So what happens here is that the CPU needs to synchronize that, because if one core changes it, it needs to notify the other that it needs to flush its cache and update the value because it just changed it. And there's a lot of chatter going on about that stuff, because, well, that's the downside of caching, you get to keep it all in sync. And it has to go at least to the level three cache,
because that's usually where it ends up sharing data between different CPU cores. So if you see, look at this, it's like at least 10 times slower to use that memory address. So in total, this benchmark is not 10 times slower, but somewhat less because there's other code running,
but I have the numbers later, but this is like four times slower if you compare this to just running the single thread compared to running two threads that are doing this. But we're smart, we have a way to solve this, because we think of a new data structure, and that allows us to use two different variables
that are not the same. And if they're not the same, we don't need to keep the cache synchronized, so we don't suffer from this problem. So in this case, we have the value one and the value of value A and B, and in one thread, we modify A, and the other one, we modify B. And we benchmark this again.
And what do we see? This is just as slow as the previous example. It's just as slow as the case where we're writing both A's. And the question is, why? Why is it happening? And that comes down to a principle that's called cache lines, because your CPU doesn't work with caches
of just the single memory location. Now, in Ruby, everything is an object, and in this case, everything here is, even this number is a reference, so it's on a 64-bit system, it's an eight-byte value. We constantly store an eight-byte value. But this cache line is on this x86 machine,
it's 64 bytes. So it means we have, we always synchronize the cache in a pair of eight consecutive entries. So what happens is that, even though we modify different variables, there's a pretty good chance that they're in the same cache line, and we actually end up trashing the system
in exactly the same way as modifying only variable A. But of course, for this, we have a solution, because we just add more properties. You know? Let's do K. That's probably not on the same cache line as A. And actually, if you run this,
it actually is as fast as it was before. And this is actually like a bench, like I run this on Rubinius, mostly because it actually creates object layouts for this that actually are consecutive in memory.
So if you run this benchmark, this is pure Ruby code, these are the numbers I get. They might be different on different runs, because it might end up in different memory locations, but this is the kind of idea of what is happening here and what can happen and what is confusing can cause. So again, Ruby, it's all just Ruby,
and also the other examples that I showed are actually something that is a problem in what Ruby is currently defined as. So it asks the question, what is thread-safe code? So I was in Paris a few weeks ago at a conference,
and Emily, who also gave the previous talk, talked a bit about threading there, and she said, there's no such thing as thread-safe Ruby code. And I have to agree with that, basically because the only way currently we define thread safety is basically,
oh, this might work on Rubinius, or this might work on JRuby, or this works on MRI. So it only is defined in that context. There is no definition of what it is in Ruby. So what does it mean for the future? And actually, I immediately had to think of that
when I heard Matt's talk this morning, is I don't think we should follow the Ostrich strategy. So I don't think this is a problem we should ignore. I think the garbage collectors, as he called it, of this should think about this, and think about how do we solve, or not solve, but how do we define semantics,
how do we define what is it that Ruby developers can expect from Ruby? Is a Ruby optimization, like is a Ruby JIT, is it allowed to reorder statements, or is it not? You know, may it change order of initialization for optimization purposes, or not?
In what level do we want to abstract our machine? Are the things, the concerns that I talked about, about your CPU reordering things, are those things that, semantics that end up in Ruby, or do we say, no, that's not something that should be allowed?
And those are things, are questions to think about, because it matters a lot for, in this case, alternative implementation, is actually to think about that. So that thing is called memory model, basically. It's an umbrella term for a lot of those aspects of what is a Ruby language allowed to do,
and what is a Ruby implementation allowed to do, and what is it not allowed to do? So there has been some discussions about this. It also means we need better APIs with new things, and support to do this, because we want to be able to build libraries that, as a Ruby developer,
you could just pick up and use. If you don't want to be that garbage collector, you just want to say, I need a thread-safe cache. And there already are APIs, there are gems that provide that, but I think it would be nice to see that in Ruby and define it as a functionality in Ruby.
So bringing back this commit of two years ago, I don't know, maybe it's not very readable, but down the bottom you see we're actually updating a globally shared cache object with a new entry. So actually the fix was something where we needed
to synchronize, because otherwise another thread would see a half-initialized cache entry, and it would just crash. And it's one of those bugs where you're looking at code and staring at it, and just running it, and once every while it crashes and you're stretching your head.
Because the moment you're down in your debugger, everything looks fine, because a few seconds have passed, a few seconds are an eternity for your CPU, it has already flushed everything and synchronized everything back up. And you're like, why did this happen? So this is the story a bit about how I went through this,
the things I've learned, and I think it's important for the Ruby community to think about what is the level of learning and what are the things here we want to solve for Rubyists and what are the things we don't want to solve. So, that's it.