Building a Better OpenStruct
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 67 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/37659 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Place | Cincinnati |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Ruby Conference 201662 / 67
3
5
9
10
14
18
19
20
21
22
23
27
28
29
32
33
34
36
38
40
41
46
51
56
57
58
59
62
66
00:00
StrutBuildingBuildingBitLevel (video gaming)SequelComputer animationEngineering drawing
00:33
Scripting languageObject (grammar)Java appletPersonal digital assistantConfiguration spaceObject (grammar)Software repositoryOpen setAttribute grammarCASE <Informatik>Positional notationStandard deviationLibrary (computing)Hash functionCodeInformationMereologyCategory of beingPoisson-KlammerPointer (computer programming)String (computer science)Symbol tableElectronic mailing listKey (cryptography)Multiplication signPoint (geometry)Computer animation
03:08
Mountain passDependent and independent variablesStrategy gameParsingHash functionInheritance (object-oriented programming)Configuration spaceObject (grammar)Software testingOrder (biology)Product (business)Total S.A.Instance (computer science)Hash functionGateway (telecommunications)CodeTwitterReal numberSoftware testingPlastikkarteParsingQuicksortPattern languageSystem callSummierbarkeitConfiguration spaceCategory of beingNumberDoubling the cubeSet (mathematics)Point (geometry)Order (biology)Key (cryptography)CASE <Informatik>Encapsulation (object-oriented programming)Object (grammar)AbstractionMultiplication signSymbol tableSocial classDependent and independent variablesTerm (mathematics)Positional notationSuite (music)Revision controlVideo game consoleInformationPoisson-KlammerComputer hardwareParameter (computer programming)Dot productRight angleOpen setMobile appBlock (periodic table)Product (business)Sinc functionDisk read-and-write headComputer animation
07:10
CodeAttribute grammarObject (grammar)Computer virusHash functionTable (information)Variable (mathematics)Instance (computer science)Revision controlTranslation (relic)Cache (computing)Kernel (computing)SicComputer-assisted translationLie groupOrder (biology)CodeOpen sourceError messageCASE <Informatik>Exception handlingPresentation of a groupCache (computing)Social classOpen setObject (grammar)NumberKey (cryptography)Single-precision floating-point formatInformationMultiplication sign2 (number)Point (geometry)Row (database)System callParameter (computer programming)Set (mathematics)Computer-assisted translationKernel (computing)Archaeological field surveySymbol tableGoodness of fitDivisorTable (information)Real numberInstance (computer science)Computer configurationHash functionCategory of beingNatural numberModule (mathematics)Moving averageCoefficient of determinationGraph (mathematics)Positional notationEquals signMusical ensembleString (computer science)Computer programmingVariable (mathematics)Attribute grammarComputer animation
14:05
Correlation and dependenceWindowBefehlsprozessorCache (computing)HierarchyMultiplication signCore dumpOrder (biology)Row (database)Open setMereologyHierarchySeries (mathematics)Module (mathematics)BitEvent horizonCartesian coordinate systemSocial classCache (computing)Profil (magazine)Computer animation
15:12
Kernel (computing)Object (grammar)Cache (computing)Computer-assisted translationHierarchyComputer-assisted translationCoefficient of determinationInformationCache (computing)Greatest elementSocial classLine (geometry)BlogComputer programmingSequenceOpen setReading (process)Link (knot theory)Computer animation
16:07
CAN busCartesian coordinate systemCodeSource codeProfil (magazine)SoftwareOpen setSlide ruleComputer animationLecture/Conference
16:46
Lattice (order)Perspective (visual)Lattice (order)Presentation of a groupComputer animation
17:16
Physical systemHash functionService (economics)Physical systemNumberOrder (biology)TelecommunicationService (economics)EmailInformationProfil (magazine)Computer animation
17:50
Block (periodic table)Stack (abstract data type)Asynchronous Transfer ModeCore dumpSample (statistics)Bit rateUser profileCAN busBuildingBefehlsprozessorStack (abstract data type)InformationObject (grammar)Block (periodic table)BitSlide ruleProfil (magazine)System callSemiconductor memoryMobile appOrder (biology)Total S.A.Multiplication signComputer fileInterface (computing)Resource allocationOpen setStreaming mediaStatisticsAsynchronous Transfer ModeBoom (sailing)Computer animation
19:43
Attribute grammarSimilarity (geometry)Data structureBenchmarkHash functionInfinityChainingExistenceKey (cryptography)Single-precision floating-point formatParameter (computer programming)Sign (mathematics)CASE <Informatik>Interface (computing)Table (information)Right angleOpen setHash functionEmailEquals signLink (knot theory)Multiplication signSoftware repositoryDecision theoryTerm (mathematics)Cache (computing)Pointer (computer programming)Block (periodic table)MereologyResultantDefault (computer science)Symbol tableElectronic mailing listMathematicsInstance (computer science)ReliefRoundness (object)Computer animation
23:04
Mobile appAttribute grammarSimilarity (geometry)Data structureHash functionBenchmarkChainingInfinityInstance (computer science)Address spaceDrop (liquid)CAN busSample (statistics)Data typeEmpennageInformation securityPoint (geometry)Condition numberRight angleInfinityPointer (computer programming)Dot productPositional notationChainAttribute grammarInstance (computer science)Object (grammar)Category of beingType theoryMathematicsNumberoutputServer (computing)Crash (computing)Line codeDecision theorySocial classInformation securityCASE <Informatik>Link (knot theory)Key (cryptography)InformationRun time (program lifecycle phase)Line (geometry)Open setLevel (video gaming)Mobile appCodeRevision controlPoisson-KlammerCoefficient of determinationVapor barrierAddress spacePatch (Unix)Different (Kate Ryan album)Standard deviationComputer animation
27:33
Pairwise comparisonMathematicsInformationPositional notationObject (grammar)Proof theoryNumberPoint (geometry)Multiplication signCartesian coordinate systemMarginal distributionObject (grammar)Term (mathematics)Positional notationSoftware repositoryInformationCASE <Informatik>Operator (mathematics)PurchasingMathematicsStandard deviationSocial classProcess (computing)Right angleBenchmarkOpen setAttribute grammarComputer animation
28:53
Software frameworkMathematicsLibrary (computing)BenchmarkMobile appMathematicsMobile appLibrary (computing)Software frameworkBenchmarkLinear regressionRevision controlMultiplication signComputer animation
29:33
BenchmarkCodeCalculationPairwise comparisonObject (grammar)Shape (magazine)CASE <Informatik>CodeRoundness (object)Parameter (computer programming)Self-organizationMultiplication signBenchmark2 (number)Open setObject (grammar)Software testingSlide ruleCharacteristic polynomialFunction (mathematics)Block (periodic table)IP addressFrequencyStandard deviationString (computer science)Library (computing)Traffic reportingLinear regressionPairwise comparisonPoint (geometry)Computer animation
31:02
RoundingGamma functionAttribute grammarMaxima and minimaSample (statistics)Symbolic dynamicsSocial classEmpennageClefInstance (computer science)Hash functionVariable (mathematics)Pairwise comparisonCodeMultiplication signOpen setSuite (music)Attribute grammarSocial classInstance (computer science)Slide ruleKey (cryptography)Poisson-KlammerBlock (periodic table)Standard deviationRegular graphDifferent (Kate Ryan album)Theory of everythingVariable (mathematics)Set (mathematics)Existential quantificationSymbol tableTable (information)Roundness (object)MereologyParameter (computer programming)Regulärer Ausdruck <Textverarbeitung>Goodness of fitImplementationDynamical systemContext awarenessFactory (trading post)Mobile appDeclarative programmingSensitivity analysisProduct (business)DampingPoint (geometry)40 (number)Software repositorySoftware testingHash functionCASE <Informatik>Metropolitan area networkLevel (video gaming)WaveComputer animation
36:42
Social classLine (geometry)Social classSymbolic dynamicsGoodness of fitMobile appRight angleComputer animation
37:14
Library (computing)CASE <Informatik>BenchmarkSource codePerspective (visual)CASE <Informatik>Series (mathematics)Standard deviationEuler anglesTerm (mathematics)Programmer (hardware)Stress (mechanics)Event horizonDifferent (Kate Ryan album)Core dumpHypothesisLibrary (computing)Mathematical optimizationPoint (geometry)BenchmarkBitCategory of beingMultiplication signComputer animation
39:27
CAN busIcosahedronOrder (biology)MereologyDifferent (Kate Ryan album)Computer animation
40:01
Coma BerenicesXML
Transcript: English(auto-generated)
00:03
This talk is called Building a Better OpenStruct,
00:20
and that's what we're gonna talk about, but I do have to say it's a little bit scary to come up on this stage to be kind of the sequel to Matt's, so I want to also introduce something big to the Ruby community. I call it Matt-o-an Swavin. It's short for Matt's talk was nice, so this talk will also be nice.
00:42
So I'm gonna do my best to give also a nice talk. Matt-o-an Swavin. Okay, so let's talk about OpenStruct. Before we can talk about what it means to build a better OpenStruct, we have to know what OpenStruct is, how does it work, what are the problems with it,
01:01
and why would we want to build a better one. I like to think of OpenStruct as Ruby's JavaScript object. It's funny, if you look at the initial commit on the Ruby repo, it goes back to 98, so not quite to the beginning of Ruby, but OpenStruct was already there, and it was described as a Python object. But at this point, I like to think of it as a JavaScript object, and you'll see
01:21
what I mean in just a sec. You can require Ostruct anywhere in your code, it's just part of Ruby standard library, so it's always there. And when you initialize an OpenStruct, you can either initialize it raw, just OpenStruct.new, or what you'll do most of the time is you'll initialize it with a hash. So in this case, we're initializing it with a hash
01:41
with a key of foo and a value of bar. In this case, both symbols. And generally, keys have to be either strings or symbols. Once it's initialized, it's not too late, you can still add properties. So in this case, we actually have two ways to add properties. We can use dot notation, like dot baz equals four, or we can use bracket notation,
02:00
in this case we're setting a key of the symbol something to the string of whatever. Once you have all your information in your OpenStruct, you can start getting properties out. So you can either use dot notation, here we can see it doesn't matter when you put the information in, it's gonna be available. And if you try to access a key,
02:21
you'll see at the bottom we're accessing a key that's not there as part of the attributes that you've added, it'll just return nil. And it doesn't matter whether you use dot notation or you use bracket notation, again with either symbols or strings, it all works fine. So you have an object with arbitrary attributes, you can use dot notation or bracket notation to add and retrieve attributes.
02:43
That sounds pretty much like a JavaScript object, so that's why I like to think of it that way. Now that's really nice, but why would we use it? So Eric Michael Zober gave a really great talk at Rails Israel last year, where he listed three common use cases of OpenStruct. And the code examples that we're gonna use
03:01
are my code examples, but the list is his, and I have to give him credit for identifying these three cases. The first, and the one that we're really gonna focus on in this talk, is consuming an API. The sort of more straightforward way that you might do this is you make a call to an API, you get some JSON back, parse out that JSON
03:20
into a Ruby hash, and then you pass that hash into an OpenStruct. A more complex pattern that you might follow, and we certainly do this a lot in our apps, is you actually subclass OpenStruct, and then create a new instance of your subclass with that hash. So it comes out to pretty much the same thing,
03:40
but the nice thing about it is that now you can actually put a name to the kind of object you're creating. So let's say you hit the Twitter API, you get back some kind of a hash, you feed that into your OpenStruct subclass, and now you have a tweet object. So it makes it much easier to work within the console to debug stuff, it's much nicer that way.
04:01
Just to flesh it out more, if the API response looks like that, we have a couple keys, couple of values, if you were just to leave it as a hash, you would have to use bracket notation. But because we put it into an OpenStruct, we can use dot notation. And it seems like a pretty subtle distinction, like okay, why do we really care? But now you're actually starting to think of your information as not just raw data,
04:21
not just hashes with keys and values, but as an object. And if you use that more advanced version that I mentioned, you can do this, which is super cool. So here we have a user class that inherits from OpenStruct, and it has a name method. We actually can start defining our own methods, which interact with the data that was passed in. So in this case, we make an API call to slash user slash one,
04:43
which for some reason, I guess I'm the first user, Ariel Kaplan. It gives back the first name and the last name, and we feed that into a user, and then we can call dot name. And what's cool is that if we call dot first name, it would just give us a raw value. If we call dot name, it'll give us a computed property. And from the outside, we don't know which is which.
05:00
And that's basically the point of objects, right? Is abstraction and encapsulation. And that's a really nice way to get that from an API call. Okay, so that's all case number one, consuming an API. Common use case number two is a configuration object. Let's say that you made a gem, which requires some configuration. You want to have a DSL like this.
05:20
So you call dot configure. You get a block, which takes a configuration argument, and then you'll start setting settings on your configuration. This is all the code that you need to do it. The configure method on your gem is just gonna yield your configuration object, which is an OpenStruct. And since OpenStruct has that really nice property where it lets you just set and get random keys,
05:44
so it just works. It's really simple, and it's built in. Third use case, it's a little complicated, so if it goes over your head, don't worry too much about it, but it's good to know about is a test double. So in this case, we have an order class where it takes a payment gateway and some products,
06:01
sums up the total cost of the product, and charges the payment gateway. Well, if you're actually gonna feed it a real payment gateway and put it in someone's credit card information, your tests are gonna be really expensive. And I'm not talking in terms of hardware. I'm talking in terms of your credit card is gonna get charged every time you run your suite. So you don't want to do that. So instead, what you're gonna do is put in something that's going to quack like a payment gateway but not actually charge a credit card.
06:23
And it turns out OpenStruct is a really, really easy way to do this, because every key value pair that you put into your OpenStruct becomes a method and its return value. So in this case, we have an OpenStruct, which has a key of charge and a value of the symbol paid. Every time you call .charge on that OpenStruct, it's just gonna return paid.
06:42
So it's a really simple, easy-to-use test double. We can inject it into our order class, assert that something's returned. We can also, if we want for some reason, to assert that something is set on the OpenStruct. But again, the basic point is that OpenStruct, because it has the charge method now, it's a really easy-to-use test double.
07:02
Okay, so that's why you might wanna use OpenStruct. But how does it work? Let's peel back the surface and look at the code. And before I move on, I have to note, just to be honest, the code has been brutally edited in this presentation. And it's not because I wanna lie to you. It's because this code has a lot of little details, edge cases and error handling,
07:21
in everything that I'm gonna show you in this presentation. That's the way the code works, right? So I've tried to pare it down to just the code that you need in order to very, very quickly understand how these things are working. But it's all open source and you can all check it out later and find out what I hid from you. So how does it work? Well, under the hood, OpenStruct defines attributes,
07:42
setter and getter methods on the object's singleton class. Now, I know that not everyone is necessarily familiar with singleton classes, or you might be a little fuzzy on it, so here's a quick refresher. In Ruby, if you have an object, in this case we'll call it foo, but it could be any object of any class, you can define a method that exists just for that object.
08:02
So in this case, we define a .bar method just for the foo object, we can call it. And if we call foo.singleton methods, we'll see that the singleton method bar has been defined on foo. So the reason we talk about a singleton class is that technically that method doesn't live on the object itself. It lives in the secret class that every Ruby object has
08:21
that exists just for that one object. Because methods live in classes and not in objects. But if that's too much to remember, the basic point is that a singleton method is a method just for one single Ruby object, one single object in your program, and nothing else has access to it. So let's look at what happens when you create a new open struct.
08:41
In this case, we pass in a key of foo and a value of bar. We're gonna hit initialize, and this is what initialize looks like. The first thing that we do is we create an at table instance variable, and we set it to a hash. That table, you wanna follow it very closely, because that's where all of our information is going to be stored inside of our open struct.
09:01
Now, assuming that you've passed in a hash as an argument, which in this case we have, we take every key value pair, coerce the key to a symbol, we set that key value pair inside of our at table hash, and then we call new ostruct member. And I will show you what that is, but you're gonna have to wait just another minute, because I wanna talk about something else first. What happens when you already have an open struct,
09:22
and then you set a new property using dot notation, dot baz equals four? The first thing to remember is that in Ruby, that's just syntactic sugar for calling the baz equals method with an argument of four. There is no baz equals method, so we're gonna hit method missing. The first thing it's gonna do is convert that baz equals to a string.
09:43
We're gonna say, okay, what is the nature of this method? It's gonna be either a setter or a getter. If it's a setter, like we have right here, so we chop off that last character, which is an equal sign, we're left with just baz. And then we call new ostruct member on baz. We get back the symbol baz,
10:01
and then we're gonna set baz to four inside of our at table, inside of our internal hash. The other option is that it's a getter, like dot baz. So all we do is we pull the value for baz out of our internal table. Now, I said we're gonna talk about new ostruct member, and this is where that happens, okay? But remember that it's called during initialize and during method missing.
10:22
Here's what it looks like. This is kind of where the real magic of open struct happens. So the first thing that we do is we just coerce the name to a symbol. And then we say, okay, let's say we call those with baz. Do we respond to the baz method already? If so, then we're done. We don't wanna overwrite any methods. But if there's no baz method yet,
10:41
then we're gonna start defining methods. So the first thing that we do is we define a baz method, which just gets the value for baz out of the table. And the next thing that we do is we define a baz equals method, which sets the value of baz to the argument within the table. Okay, so it's kind of like add a reader and add a writer, just that instead of what we'd normally do, which is write to an instance variable,
11:01
instead, we're using that internal at table hash. What you might have figured out from this is that no two open structs share a set of methods, even if they look exactly the same from the outside. But those methods are being defined every time for each individual open struct object.
11:20
Now, here's the problem. Open struct is slow. And you might be wondering how slow is open struct. So I had my Madawans to have in however it was. Mats has animals, so I have animals. Here are a bunch of snails. So my snail friends will tell you that it's about, depending on your use case, 10 to 40 times slower than an explicitly defined class.
11:42
And that seems really, really bad, right? That's terrible performance. Why is that happening? So I would say that there are two and a half reasons why that happens. Reason number one, and this is the main reason, defining methods is slow. In Ruby, when we define a method, it just takes a long time.
12:01
Number two, as you've seen, we're gonna rely on method missing a good deal. Method missing is also pretty slow. Not as slow as defining methods, but it's a contributing factor. And before I tell you the third reason, or the two and a half reason, so let me do a quick survey. How many of you have heard, just by show of hands, in the last, let's say, year and a half, two years, don't use OpenStruct because it's going
12:22
to invalidate your global method cache? All right, we got a good number of hands here. You've been lied to. It's not true anymore. In Ruby 2.1, this is fixed. It's no longer a problem. And this bothered me because I see a lot of misinformation out there about OpenStruct, about this particular issue.
12:40
It's not a problem anymore in 2.1 and above. But why that is is actually kind of an interesting story. So I wanna delve into that for a second. And we have to talk about how method lookup in Ruby works. So let's say we have this object graph. We have an animal class, which is just a standard class. It has dog and cat, which inherit from it. And meowable is a module included on cat.
13:00
Now we're gonna create a new cat object, and then we're gonna call 2s on cat, the 2s method. Well, where does that exist? Initially, we don't know. We have to figure it out. So Ruby is gonna actually start hunting around in the class and its ancestors and include modules. We found it in kernel. So we're gonna mark, okay, in the future, anytime we call 2s on a cat instance, we know that that is defined in kernel.
13:21
We know how we can execute that method. So let's not look it up next time. But what if you reopen the cat class and you find 2s? Well, that information is no longer true. So because of that problem, every time you reopen a class in Ruby, that happens. The global method cache is blown away because it's the simplest way to say
13:41
we don't know what to rely on anymore. Now, the problem is every time that you use an open struct, you're defining methods, and that happens. Okay, so our global method cache has been busted every single time that we created a new open struct. And it turns out that looking up methods
14:01
is actually really expensive. And if you're wondering why, look no further than active record. Every method you look up an active record the first time, you have to look through all those places, all those classes and all those modules in order to find where that method is. And that method probably defines other methods, which you then have to go through that same series of events again. So it's really expensive.
14:22
James Golick was a New York City Rubyist. He unfortunately passed away about two years ago very suddenly in a car accident. And this is one of the big things that he did for the Ruby community, so I like to talk about it because it's part of his legacy. So he did a little bit of profiling. He found that 10% of the time spent
14:42
in his Rails application was just rebuilding the global method cache. It was reset 20 times on a request because of all these method lookups. And open struct was, of course, one big culprit. So he said, look, I could complain about it, I could go on Twitter and rant about it, but why don't I just fix it? So he did. He implemented what's called a hierarchical method cache,
15:01
and we'll see how it works. But he actually wrote it for 1.9. It was not included in Ruby core until 2.1. But from 2.1 and on, this is no longer a problem. So I'll show you how it works. We effectively have one method cache per class in our Ruby program. So we have an animal method cache and a cat method cache,
15:20
and there are links between them. So that if we invalidate animal, then it's possible that some of the information we have about cat is also incorrect. So we also invalidate that. But let's say I reopen dog and define a method there. Well, dog is gone, but cat's still fine. There's no reason why defining method on dog
15:40
would ever affect cat. So we only invalidate the cache with respect to the classes that would be affected by it. And the same thing is true with OpenStruct now. When you add methods to your OpenStruct, we just invalidate that OpenStruct cache, but everything else in Ruby is totally fine. That's a blog post where you can see all the details.
16:01
It's actually a really interesting read, all kinds of cool stuff about monotonic sequences. Anyway, check it out. But the bottom line is, again, let's bust that myth. With Ruby 2.1 and above, we can absolutely create OpenStructs without slowing down the rest of our application. But there's still that little problem that OpenStruct itself is kind of slow. So, hi everybody.
16:22
Don't worry, the intro was supposed to take this long. It's fine. My name's Ariel Kaplan. You can find me online at amkapelin. I tweet about mostly programming, but increasingly about software teams and what it means to write code. I also have my GitHub profile there where you can find much of the source code
16:41
for this presentation, and on amkapelin.ninja I'll post the slides after the talk. I really like coming to conferences because I get to meet new people and hear new perspectives and ideas and just geek out. So I really want to meet all of you, and I want you to say hi after the presentation. And just to bribe you, I brought a lot of chocolate that I want to give out.
17:01
So, and it's not just chocolate, it's actually popping chocolate. So come over and say hi after the presentation and get some sweet treats. Even if you don't like chocolate, I just don't want to meet all of you. I work for Vitals. We pay people to use high quality, low cost healthcare. Basically we're sending people checks in the mail
17:22
for going to their doctors. So that's kind of a fun business. And in order to get all this information, it's really a lot of number crunching. And so we use a system of microservices and external APIs communicated via JSON, parsed into Ruby hashes, which are fed into OpenStruct.
17:40
So essentially in our whole system, OpenStruct is mediating the communication between our various services. Now we had some performance problems. I did some profiling and I found, oh, OpenStruct. Yeah, big problem. Now, I know people are afraid of profiling, so I just want to quickly show you what I had to do.
18:02
This slide is basically all that I had to do in order to get profiling information. So I use the stack prof gem. I find it to be really nice. What you do is you call the run method on stack prof. You give it a mode, which will be either CPU if you want to test how long things take, or object if you want to check object allocations, you have memory issues.
18:21
And then you just give it an out file, a place to dump the information. And then you just run the block that you want to profile. So that's all it takes. Then you have that file, which you can't really read, but if you use a stack prof command line executable, just give it the name of the file and call dash dash text. Boom, you have all this information about what's going on in your app and what's taking the most time.
18:42
So, yeah, OpenStruct. It was taking a lot of time. Here you see it was 17% of the time, but this is not even all the information because there's a long tail there. So it ended up being about 20% total of our app time, at least the CPU time, was just spent on OpenStruct. And that was not acceptable.
19:01
But as an aside, congratulations, now you know how to profile. So there you go. So we wanted to keep OpenStruct because it was really nice. It was really a flexible interface. It let us change things upstream, have access to them downstream. It was a very pleasant to work with thing, but it was so slow. And so the question became,
19:20
can we have our cake and eat it too? Can we build a better OpenStruct? Nope. So I'm sorry that you're all bored by this talk. You're gonna get out a little bit early today. Enjoy the chocolate. I'm just playing with you. This talk would not have been accepted, I hope, if the answer was no.
19:40
Yeah, we can absolutely build a better OpenStruct. And I'm gonna tell you four stories, or really four more stories, because what James Gollick did in terms of method caching is sort of like the zeroth story, because it's not OpenStruct specific, but it is a story of how someone basically fixed stuff because of OpenStruct. But we have four more stories to share with you today. So let's get to it.
20:00
Round one is OpenFastStruct. So I still remember opening up my Ruby Weekly email. This was around the time we were having these issues, and I was trying to figure out what we were gonna do about OpenStruct. And I saw this. OpenFastStruct, a faster OpenStruct. And now it's really exciting. That looks really cool. Let's follow the link, check it out. So it says, yeah, this is the GitHub repo.
20:21
It says it's still slower than a hash, but four times faster than OpenStruct, which is a significant improvement. And it basically works the same. So I said let's open up the code, let's see how it works. Now again, it's supposed to be the same interface, so let's talk about, this is gonna look the same, right? Just instead of OpenStruct, it's OpenFastStruct.
20:41
So what happens when we call OpenFastStruct.new and we pass in a hash with, in this case we'll just keep it simple, a single key value pair. We hit initialize, which first defines an at members hash, which is actually, it's the same as the at table. It's just, I guess Arturo Herrera, who created this like the name members better than table.
21:01
Maybe it's more specific. And then we call update with the arguments, with our hash. Update just basically takes each key value pair and calls assign, which converts the key to a symbol and puts it into our members hash. So if you kind of do the math, you add this all up, it basically comes out, it's the same thing that OpenStruct is doing, minus new Ostruct members.
21:20
So we're not defining any methods. Well if we don't define any methods, what's gonna happen when we start trying to access stuff or assign things? We're gonna hit method missing. And this method missing is kind of wacky, so I'll walk you through it. But it's trying to deal with not two, but three separate possible situations. The first situation is,
21:40
let's say we're calling method missing and we're trying to get a value out of our OpenFastStruct. It's gonna call fetch on our at members hash. And fetch is a method which has, it has different behavior depending on what arguments you give it, but if you give it a single argument and a block as we're doing here, it will first check, is this a key in the hash? If so, return the value.
22:01
If not, run that block as a kind of a default result. So first we say, okay, is bas a key in the hash? If so, then we're done. If not, run the block. And the block has to decide, is this a setter or a getter? If it's a setter, right, and if the last character of the method name is an equal sign,
22:21
so just take the rest, right, the beginning of that method, right? If it's a bas equals method, so take the bas part and assign bas to the first argument which in this case is gonna be four. So what's left is when we're trying to, it's a getter but the key doesn't exist in the hash.
22:41
And this is kind of interesting behavior here. What he does is, or what it does, I guess, is it assigns to that key of bas a new instance of open fast struct. Okay, so where open struct would just return nil here, it's gonna assign a new open fast struct. So that was an interesting design decision,
23:02
but okay, the main thing to remember is that open fast struct does not pay the upfront cost of defining methods. Like I said, it costs a lot to define a method. And it's really only worth it if you're gonna get that debt repaid. But if you're only gonna access the properties once, right, then it doesn't really matter.
23:21
Method missing is still gonna be much, much cheaper. So that's the idea of open fast struct and there was just one little problem, it broke our app. Because of the different behavior that I alluded to earlier. But I wanna understand why Arturo Herrera made that design decision. So if we go back to the documentation, it actually says outright there are a few differences
23:41
between open struct and open fast struct. And if you look at the last one, it says it allows infinite chaining of attributes. And there's a link to an example. So the example says that open fast struct is a black hole object, which supports infinite chaining of attributes. And we have an example of that. We create an open fast struct called person. And then we call person.address.number equals four.
24:02
And indeed we can retrieve person.address.number. And that's kinda nifty. So how's it working? Well again, when you create an open struct and you try to call a property that does not exist, you just get nil back. But if it's an open fast struct and you try to get a property that doesn't exist, then it's gonna give you back a new open fast struct.
24:22
So when we call person.address, we get a new open fast struct and then we can call .number equals four on that. So that's really nice. The problem is in our app we did a lot of this. So let's say our API would sometimes leave out a key if the value would be nil. So when we call open struct .foo
24:40
and there's no foo key, then it's just gonna return nil and that's gonna be a falsy value. If it's an open fast struct, we call .foo, it's truthy. So our conditionals were all flipped. Turns out that's not very good for your app working. So my point here is that if you're gonna create a drop-in replacement,
25:02
it should just be a drop-in replacement, right? It should be the same thing but faster, right? Faster but also the same thing. And if you break the API, that's gonna create certain barriers to adoption. And for us it was to the point that it would've been easier to monkey patch open fast struct not to do this than it would've been to rewrite our app.
25:21
So the question became can we do better? Can we have something that's closer to the actual open struct API and still is performant? So I thought about it and I came up with an idea and I called it persistent open struct. I wanna show you how it works on the outside and then we'll dive into how it works on the inside.
25:42
This is specifically designed for the use case that I spoke about, right? The advanced version of API consumption. So you have your class which inherits from open struct in this case, right? So you're gonna inherit from persistent open struct. We're basically just substituting in persistent open struct. Then you can define methods. In this case we have a speak method which relies on type and sound.
26:03
And then we can, you know, we're off to the races. So we create a new animal with some information. This is our dog. We can call that dot speak method and we see that type and sound are properly used. And we can set properties with dot notation or with bracket notation, it all works fine. But when we look at the instance methods
26:21
that are defined on animal, meaning the methods that are accessible to every instance of the animal class, we start seeing type equals and type, sound equals, ears, nose, tail. We have all of these methods already defined, ready to go for every instance of the animal class. How's that working? Well it turned out it did not take a lot of code to make this happen.
26:41
I literally copied over the open struct code and I changed two lines. And just to make it pretty obvious, I'm gonna underline it. So instead of defining singleton methods, we're basically just reaching out to the class and defining standard instance methods. Otherwise it's working exactly the same. All we're doing is skipping that stage of method definitions.
27:01
So it's kind of a terrible hack, because you have instances of your class changing the way that the class works in the middle of run time. It might have security implications also, by the way. If you ever had this facing user input, they could start passing in random keys and then eventually you would run out of memory because you're defining all these methods and your server crashes. So never for user input, right?
27:20
This is only for trusted APIs. But user responsibly, which I think we are doing, it made our app 10% faster, which I thought was pretty good for a two-line code change. If you look at just the solutions themselves, so in this benchmark, which I'm not even showing you honestly, but you can look it up on the Persistent OpenStruct repo.
27:41
In the use case that we're looking at, Persistent OpenStruct is about five times faster than a standard OpenStruct. It's still about five times slower than a regular class, but that is a much, much closer margin. And I noticed what some of you were thinking, like, okay, that's nice, looks cool, but I want to see math. I want to see incontrovertible, absolute proof
28:01
of the supreme superiority of Persistent OpenStruct over OpenStruct, and I will give that to you. I will give you the information that you crave using Big O notation. Let's define our terms. N is gonna be the number of methods that you have to define, which is just two times the number of attributes, right? One getter, one setter apiece. O is gonna be the number of objects that you want to create.
28:21
If you use Persistent OpenStruct, it is an O of N definition, or sorry, O of N operation, because you only have to define all those methods once. It doesn't matter if you create a million objects, you're just defining those methods once. If you use OpenStruct, it's O no, because that's the sound that you make when you realize that OpenStruct is killing your application's performance.
28:44
Okay, so the goal here was, okay, honestly the goal was to make a terrible pun, but there is also a serious point here, okay, which is that that doesn't tell you anything, right? Math is a really useful framework for thinking about problems in the abstract, for considering various solutions, and figuring out where you should put your efforts,
29:02
but we're engineers, we work with real apps, and we have to have answers that are grounded in reality, and that means benchmarks. If you're writing a library or a gem, benchmark it. You're writing an app. All of you are presumably writing apps, otherwise I'm not showing you for a living, so benchmark your apps. Have running benchmarks which are checked into source control
29:22
and you can follow them over time, make sure you're not having performance regressions. Just benchmark everything. And I know a lot of you are thinking, okay, Ariel, that's really nice advice, but how do I benchmark? That sounds hard. Nope, it's not hard. This is all that you need to know, this one slide. So I like to use a benchmark IPS gem.
29:41
There's a lot of really nice benchmarking tools even in the Ruby standard library, but this gem is just beautiful. Also written by Evan Phoenix, who was one of the organizers here, so if you have used this in the past, you can go thank him. So what you do is you just call benchmark.ips, you pass it a block which takes an argument of X. I don't know what X is, I just know that you have to call all the methods on it.
30:02
So for every test case, you call X.report. You give it just a string that labels your test case and then the block that you want to check gets performance characteristics. And at the end, if you want, you can call X.compare. And then you get this beautiful output. So what it does is it'll run your code for a short warmup period and then it'll run the code each block that you gave it
30:23
for five seconds, as many times as it can, and it'll tell you how many times was I able to get this done in those five seconds. So in this case, we see that our new code is 1.28 times slower than the old code and it looks like a performance regression, so you might want to check that out. So now you know how to benchmark.
30:40
That wasn't too bad, so congratulations. Back to our topic now, persistent open struct. So it's really good when you're repeatedly creating data objects with the same shape, like in the case of API consumption. This would be terrible for, or at least overkill, for any of the other use cases that we mentioned. But in the case of API consumption, it's really, really good.
31:01
Okay, round three, open struct. Turns out, open struct actually still had a few surprises up its sleeve. So Eric Michaels over, wakes up one day and tweets out this thing. Hey, I just made open struct initialization 10 times faster. Ha ha, cool. So like that actually is really cool. So how did he do it?
31:22
How much code did he have to write? Not very much. It was actually negative code, believe it or not. He got rid of new O struct member in initialize. So essentially, the idea is that let's hedge our bets. Why are we paying down that upfront cost
31:40
of defining the methods when we don't even know if we're ever gonna use them? Let's say I'm gonna call out to the GitHub API just to get like, I don't know, the name of a repo. It's gonna give me 20 different keys and I'm only gonna use one of them. So why would I bother paying the cost of defining the other 39 methods that I'm not gonna use? So just don't. And as we see that we need the methods, we define them.
32:02
This was introduced in Ruby 2.3. It's just part of Ruby now. And it's a really dramatic improvement when some keys are never accessed. Because defining methods is the slowest thing of all this. So it's nice to skip it. Round four. So I wrote a persistent open struct and I waited about a year
32:21
and I just kept having this nagging feeling. Like, I think I could've done better. You know, I was very bound by the open struct way of thinking and I said, you know, I think if I just try to re-engineer it from scratch, I can make something better. So finally I found some time and I got the motivation and I just did it. And that's where dynamic class came from.
32:41
This slide is hopefully gonna look very, very familiar. The major difference is that unlike persistent open struct, here you actually call dynamic class .new and give it a block. So if you've ever worked with struct or you've worked with delegate class, there are some tools in Ruby that work like this. It's a class factory. So when you call .new and you give it a block, it creates a new class for you
33:01
and then it'll run the block in the context of that class. So you can do anything in there that you would do in a standard Ruby class declaration. Okay, now the rest of this is gonna look really similar. You know, we can initialize a new instance, give it all the attributes and all those instance methods, all those getters and setters are defined. Okay, so pretty close to persistent open struct.
33:22
But the implementation is totally different. The first thing that you have to see is that on the class level, we have two methods. We have an attributes method which just holds onto a set. A set is kind of like an array but you can look up much more quickly whether something's a member or not. That's probably its own talk. It's actually pretty cool.
33:41
So we have our set of attributes that we've defined on our class. And then we have the add methods method. And what add methods does is it just calls add a writer and add a reader. Okay, just like a regular Ruby class. And of course it just checks if you wanna write your own add a writers and add a readers. So we're not gonna step on your toes. But otherwise we just call add a writer and add a reader.
34:00
And then we mark that attribute of having already been added. Right, we're gonna just add that key to the attributes set. Okay, so keep that in mind. What happens when we create a new animal? Again, animals are a dynamic class example with a key of baz and a value of four. So this is how initialize looks. It's very different from what you've seen before.
34:20
So for every set of attributes that you're passing in, it actually will send the key equals method with an argument of value. So in this case, it's basically the same as creating a new animal and then you call .baz equals four. Well there is no baz equals methods. We're gonna hit method missing. Here's how method missing works. That regex is scary, I know.
34:40
But just trust me that it works. It's testing, is this a setter? So if it is a setter, then we're going to call the bracket equals method with our key and our value. So baz and four. Otherwise it must be a getter. So we call the brackets method. So like .baz, we're just gonna call bracket and then baz in the middle of the brackets.
35:02
Here's how those methods look. So the brackets method is pretty straightforward. It's just an instance variable get. We're getting whatever at baz is inside of our instance. And the bracket equals method is where the magic starts to happen. We coerce the key to a symbol. We do an instance variable set. We're setting, in this case, at baz to four. We're setting our key to our value.
35:21
Again, just standard instance variables. And then we say okay, if we got here, it's possible that we have to define a new attribute. So do we already have the attribute defined? And if not, then we just add those methods. That's what we saw before, that add methods method on the previous slide. And what's cool is that the next time that you create an instance, those methods are already defined. In initialize, when we call baz equals,
35:42
we're just gonna have that baz equals method. And it works just like a standard adder writer. So we no longer are relying on this internal hash table. We're just using instance variables. It is a regular class. Just it's dynamic because on the fly, we're defining the adder writers and adder readers that we need.
36:00
And that's actually a lot faster. It's 40% faster than persistent open struct. It's eight times faster than open struct. And it's still about three times slower than a regular class. But it's close enough at this point, I think, that you can justify using it in your production apps unless they're really, really performance sensitive. It's good for the same purposes
36:20
as persistent open struct, but it's faster. And to me, what's really important is that it works the way that persistent open struct really always should have. When I wrote persistent open struct, I was very, very focused on the open struct way of doing things. And when I freed myself of that restriction, I realized that there was just a better way to do this and basically keep the API, but have a different underlying way of how it works.
36:43
Okay, let's start to wrap up. Persistent open struct made me happy. It was a really good learning experience, and it also worked really well. Dynamic class made me happier because I think it was just a much cleaner solution. And to me, that's the bottom line. Maybe people are gonna see this talk and start using it in their apps,
37:00
and that would be great. I'm really happy to help out people. But even if that doesn't happen, even if I'm the only one who ever uses this, that's fine, right? I had a really great experience and learned a lot creating these solutions. So what have we learned? I would say five things. One, we kind of had this worshipful attitude
37:21
towards the standard library and the Ruby core. It can be improved. It was written by human beings, and there's always room for improvement. We saw from what Eric Michaels over did, and we saw from what James Gollick did, there's room for improvement. Two, optimize for your use case. This is really important in terms of performance work. It doesn't matter how good your solution is
37:40
in every use case. All that matters is your use case. And sometimes a specialized tool might work a lot better than a standardized tool. Think about knives. If you have to cut watermelons and you have to cut grapes you could theoretically do them both with the same kind of standard knife, but it'll probably work a lot better if you get specialized tools for each situation. Three, I can't emphasize this enough,
38:00
you have to benchmark. Okay, don't trust your intuitions. I can tell you my intuitions are wrong all the time, and yours will be too. So make sure that you benchmark to validate your hypotheses. Four, I want to encourage everybody to experiment. Experiments are awesome. You learn tons of things. It's just a lot of fun. And occasionally you could create something
38:22
that's actually really useful. So do more experiments. There's little downside and you always learn something. And here's the last thing. And to me this is the most important point here that I want to stress. I had a problem, I looked at the, well I profiled to zoom in on the problem,
38:41
looked at the source code, looked at the solutions that are out there, and thought about it in a little bit, came up with an idea, benchmarked to verify that my idea was correct, and then made it into a gem. There's nothing in that whole series of events that any single one of you can't do, okay?
39:01
You will come into problems, you don't have to be the most brilliant programmer to make a difference. You will come in with your own perspectives and your own ideas and your own experiences and your own intuitions, and they will enable you to create solutions that no one else can think of, or to make improvements to existing solutions that no one else would think of. So please don't sell yourselves short.
39:26
You can do it. It only takes one person to invent a really great tool. But if that tool's gonna work for the community, everyone in the community has to be involved. OpenStruct is a really, really nice tool,
39:41
but it took a lot of different solutions coming from different people in order for it to really work for everybody. There's more work to do on OpenStruct, there's more work to do on a lot of things. Be part of the community, and make our community so much richer. Thank you.