How to load 1m lines of Ruby in 5s
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 69 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/37772 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Raw image formatMagnetic stripe cardLine (geometry)CodeRight angleCodeCore dumpLibrary (computing)Software developerLogicMacro (computer science)Universal product codeSoftware testingQuicksortSampling (statistics)Hydraulic jumpTerm (mathematics)2 (number)Entire functionExecution unitLine (geometry)BitThumbnailLoop (music)Multiplication signService (economics)Point (geometry)Computer fileIterationProduct (business)Group actionRow (database)Flow separationQuantum entanglementServer (computing)Endliche ModelltheorieMassXMLUMLComputer animation
02:00
Java appletScripting languageGamma functionRepetitionImage resolutionQuicksortType theoryComputer fileWebsiteInformation2 (number)BenchmarkRevision controlSoftware testingMathematical analysisSequenceBitCompilation albumExistential quantificationMathematicsStructural loadFluid staticsBootingInterpreter (computing)Library (computing)Category of beingLine (geometry)Equivalence relationCodeService (economics)Software developerVideo gameMetadataMultiplication signTheory of relativitySystem callData structureModule (mathematics)View (database)Network socketBytecodeNetwork topologyMagnetic stripe cardPoint (geometry)ExpressionOrder (biology)Electric generatorWeb browserAttribute grammarPattern languageEndliche ModelltheorieBasis <Mathematik>WordNamespaceFigurate numberSocket-SchnittstelleSheaf (mathematics)Machine codeScripting language
07:47
Image resolutionExecution unitValue-added networkInheritance (object-oriented programming)Resolvent formalismDirectory serviceNetwork topologyIncidence algebraFlow separationElectronic mailing listMereologyRight angleImage resolutionContext awarenessDirection (geometry)QuicksortMassMathematical analysisMultiplication signBitStructural loadLine (geometry)Social classSound effectOrder (biology)Logical constantGraph (mathematics)TheoryResultantRenewal theoryCASE <Informatik>CodeVideo gamePoint (geometry)Reading (process)Power (physics)Validity (statistics)Euler anglesRippingProduct (business)Uniform resource locatorTask (computing)Search engine (computing)NamespaceFluid staticsDemonWordLocal ringInsertion lossInheritance (object-oriented programming)RootScripting languageCanonical ensemblePOKEFigurate numberComputer animation
13:34
Image resolutionNetwork topologyLie groupError messageInheritance (object-oriented programming)QuicksortMereologySocial classImage resolutionCASE <Informatik>Computer fileSoftware bugLogical constantPoint (geometry)CodeCycle (graph theory)2 (number)Hash functionMultiplication signDatabaseData storage deviceTrailElectronic mailing listLoop (music)ParsingProcess (computing)Statement (computer science)LaptopIntegrated development environmentRepresentation (politics)Formal grammarSelf-organizationExtension (kinesiology)Message passingReal numberResultantBit rateResolvent formalismLevel (video gaming)GenderFile formatComputer animation
16:25
Code refactoringRouter (computing)CASE <Informatik>Inclusion mapProper mapType theoryComputer fileMultiplication signContext awarenessCodeComplex (psychology)Social classMereologyBootingMathematical analysisFront and back endsMobile appHookingDefault (computer science)Lie groupElectronic mailing listSoftware testingOrder (biology)HoaxQuicksortInformationCASE <Informatik>Exception handlingWritingRouter (computing)Network topologySound effectPower (physics)Cycle (graph theory)Basis <Mathematik>Structural loadProduct (business)Image registrationString (computer science)BitLocal ringOnline helpACIDForcing (mathematics)Chemical equationNormal (geometry)Fluid staticsNominal numberLogical constantInteractive televisionSpacetimeCountingSource codeArray data structureSymbol tableWindows RegistryFunctional (mathematics)Military base
21:27
Code refactoringInheritance (object-oriented programming)AuthorizationBooting1 (number)Social classAirfoilInheritance (object-oriented programming)Graph (mathematics)Image resolutionSoftware testingDependent and independent variablesCASE <Informatik>Module (mathematics)BitHookingQuicksortMultiplication signLogical constantProcess (computing)Order (biology)System callCycle (graph theory)Software frameworkPoint (geometry)AliasingException handlingSlide ruleLoop (music)2 (number)Software developerCodeSuite (music)Theory of relativityWordReal numberMetadataEvent horizonProgrammable read-only memoryEndliche ModelltheorieDynamical systemMereologyComplex (psychology)Musical ensembleFile archiverCausalityResolvent formalismTheoryCountingMachine codeTrailPanel painting
26:30
QuicksortConnectivity (graph theory)InformationQuantum entanglementData modelCore dumpGraph (mathematics)Disk read-and-write headMathematicsCodeLine (geometry)WhiteboardMultiplication signWritingMassSoftware testingConnected spaceComputer animation
27:30
NamespaceFunction (mathematics)Magnetic stripe cardLine (geometry)Type theoryComputer fileQuicksortMathematicsReal numberRun time (program lifecycle phase)Line (geometry)Magnetic stripe cardFluid staticsStructural loadSoftware testingCodePrimitive (album)NamespaceProjective planeFormal languageMathematical analysisData conversionSoftware developerMereologyMachine codeValidity (statistics)Directory serviceAreaCovering spaceMessage passingLinear codeView (database)Natural numberSinc functionFood energyGroup actionText editor2 (number)Computer animation
29:50
Magnetic stripe cardFluid staticsMathematical analysisData structureFigurate numberType theoryQuicksortCodeCASE <Informatik>PlanningMathematicsPoint (geometry)Goodness of fitOpen sourceSoftware frameworkEndliche ModelltheorieOcean currentReal numberCellular automatonState of matterSource codeForm (programming)Logical constantLecture/Conference
31:19
Coma BerenicesXML
Transcript: English(auto-generated)
00:13
I'm right about at start time, so I'm gonna jump in and let people straggle in if they need to. I'm Andrew, I work at Stripe, and I work on
00:22
the developer experience in our core product code base. That's a few million lines of Ruby, it's a pretty big monorepo, it's got a few big macro services, and for example, sort of the entire Stripe API, most of this core business logic is one big service. Pretty notably it's not Rails, it's built on Sinatra
00:41
and a bunch of other libraries we've grafted together over time to have our own sort of thing. And one of the goals on my team is to keep the iteration loop really tight in development, so make it really fast for someone to edit a bit of code and then test that it works, either by running a sample request against a service, running a unit test, whatever that might be.
01:02
And last year we had a bit of a problem. So originally this cartoon was about code compiling, Ruby doesn't compile, unless you watch the talk about compiling Ruby earlier today. Instead it took us forever for our code to reload, just like actually loading all of the Ruby code
01:20
in our code base took too long and was kind of getting in people's way. And the big challenge here was that we have this big, organically grown code base where everything's sort of tightly coupled and a bit of a mess in some ways. And long term we really want to untangle that, we want sort of clearer separation between modules, clearer kind of definition of roles and interfaces,
01:40
but in the short term we still need to be able to quickly iterate on this code. And we were at a point where sort of touching basically any file meant reloading the entire code base and spending sort of 30 plus seconds twiddling your thumbs before you could actually see if, say, fixing a typo actually did the right thing. And that wasn't super happy for us.
02:01
And a lot of this was just because Ruby is slow at loading code. I generated a few million lines of sort of roughly equivalent code in Ruby, Python, and JavaScript. This is completely synthetic, this is not, I would not say this exactly reflects the real world, but it gives you some sense. Python, sort of that rerunning Python over this code
02:21
was actually the fastest, because Python has an interesting little built in thing where the first time you run a bit of Python code, the interpreter will write out the parsed byte code somewhere in these PYC files. And so if you're handed a completely new Python code base and have to load the whole thing for the first time, it's gonna be really slow, slower than Ruby. But in development usually you're like changing one file
02:40
and then loading everything back up, and Python's great at that. JavaScript's also pretty darn fast. I'm completely making this up, but I'm gonna attribute that to the fact that the V8 VM is sort of tuned for this. It's like you're in a browser, you get handed a bunch of code over the wire you've never seen before, and it's like run this as quickly as you can. And then Ruby is just really painfully slow at this when your code base gets large enough.
03:02
One thing I wanted to note is that actually recent versions of Ruby sort of let you do the Python PYC trick. The instruction sequence library in the standard lib let you parse into YARV, write that YARV out on disk, and read that in and run that instead. And with the benchmarks I was doing,
03:20
that actually works really well. You sort of save a lot of load time. If you want to check this out more, there's a library called BootSnap from Shopify, and if you're interested in learning more about instruction sequence, actually the talk I mentioned about compiling Ruby from earlier today talks about that a little bit. In practice for us, the instruction sequence compilation didn't get us quite the advantages of the synthetic benchmark
03:40
because there was a lot more complicated stuff going on when our files were loading than just parsing the code. So what we ended up doing, instruction sequence didn't solve our problem. We wrote an autoloader that took us from reloading all of our code in 35 seconds to reloading it all in about five seconds. The other advantage we got out of this is that we were able to delete
04:01
every require relative in the code base. You no longer had to sort of explicitly point to the path to a file you were going to need when you were authoring some piece of code. So let's talk about how that works. So this is a sort of simplified view of what a service that has a couple of calls
04:22
might look like. So you've got this API service that lets you either make a charge or refund a charge. Stripe works in payments, so there are gonna be a couple charge and refund related examples in here. And what you'll notice is that each of these files has these require relatives at the top, which means no matter sort of what you change,
04:42
you have to reload all three of these files in order to sort of boot up this app. But really, presumably, if all you need to do is get this thing to start listening on a socket so you can start making a request to it, maybe all you need to do is load api.rb. And if you're testing something that never makes refunds, like why are we loading this refund.rb?
05:00
Like are there ways for us to kind of not load code instead of trying to figure out how to make it load really, really fast. And the solution to this for us was to use a feature that's actually built into Ruby, which is an autoloader. Ruby's autoloader, it's defined on module, module.autoload, will let you say,
05:21
hey Ruby, if you see, if you're in the sdb namespace, here, and you see charge and you can't find charge, charge is const missing, not defined, go require this path, go load this file, and then charge is gonna exist, everything's gonna be happy, move on in your life. And so what we did is we autogenerated,
05:41
we used a bunch of static analysis to autogenerate stubs that looked like this. Autogenerate stub files that would tell the Ruby autoloader, here's where to find all the files in our code base. And we had this build daemon that was always running in the background while you code, and constantly sort of parsing whatever you had recently saved, and updating these autoloader stubs so that Ruby would sort of know where to find code as you needed it,
06:01
so we could just boot the api up, get it listening on a port, and then load stuff as you actually made requests. One of the kind of nice properties of autoloaders in particular is that this sort of only changes if you change where something's defined. So if all you're doing is mucking around in some method inside charge, or editing a test
06:20
or something like that, this doesn't have to change for your code to work, so we don't have to wait the like 30 seconds for our build daemon to figure everything out to run your code, you can just sort of start running your code and only if you do something that happens to change a definition site are we gonna have to sort of make you wait. So let's talk a little bit about how we do this,
06:41
and then I'll sort of get progressively deeper into some of the static analysis we did. We've got a chunk of code here at the left, so we start with just some code that defines some modules and some constant literals. We take that in and we use Ruby's parser gem to parse it into an AST, which is represented by a bunch of S expressions here,
07:01
and then we extract a bunch of information out of that, and I'll talk a bit more as we get into it about the types of information we need to get out of this AST and sort of one of the most important data structures we end up having to be able to create is this sort of what we call our definition tree that is just a sort of tree data structure that we can traverse of all the definitions we've found in all the files across the code base,
07:21
and each of these nodes has a bunch of metadata attached to it that will let us do some sort of more useful things. And this is basically all you need to do this sort of autoloader generation, because all you need to know is where is stuff defined, so you can tell Ruby when you're looking for this thing, go look at this file over here.
07:41
But we wanna do more than that. We didn't wanna just know where is everything defined, we also wanted to know what is every reference to every definition in our code base. We wanted sort of the full dependency graph of how our code is connected. And I'll talk a bit about some of the kind of cool tools you can build with this later, but sort of in the context of the autoloader,
08:00
one of the things that this gets us is more confidence that we're doing this right. We're talking about rewriting massive swaths of code to not use requires and then deploying that in the production that people depend on to run payments. They'd be really sad if that didn't work, and so we wanted some validation that our static analysis was sane. And so the first thing we tried to do to do this,
08:21
like how do you figure out that this value points to this value and this value points to this value? The first thing we tried is, okay, just load all of the code and poke Ruby and say like, hey, in this context, what does this constant resolve do? In this context, what does this constant resolve do? This wasted a few weeks or maybe more than that in my life.
08:42
This is really hard to do, especially in a very large code base. Sort of arbitrarily loading Ruby code that was not meant to be loaded in arbitrary orders is hard because people make all sorts of ordering assumptions. They might have all kinds of require time side effects. The most fun thing that came out of this is people would, as they're sort of working away,
09:02
they'd like write some throwaway scripts in the root directory just to do whatever kind of local tasks they wanted to do. And our build daemon in the background would happily see that as Ruby code and just go run it. So let's really hope that wasn't like rm dash r or something, because otherwise we would run that and rm dash r or whatever that pointed to. So instead we pursued doing this mostly statically,
09:22
doing this without actually running Ruby, or I mean we wrote the static analysis in Ruby, but without actually running this Ruby code. And we sort of had to mimic Ruby's own reference resolution, but do it using static analysis instead of actually kind of running this through the MRI. So I'll talk a bit about both how MRI,
09:41
sort of how Ruby does constant resolution, and also kind of how we mimic that with static analysis. The first kind of concept in Ruby constant resolution is actually one of the concepts I've struggled with most in my own Ruby, especially early on in my own experiences with Ruby, and it's this concept of nesting. It's this idea that the way a constant resolves in Ruby
10:03
depends on the kind of constant, potentially kind of distant, many hundreds of lines above, context in which the enclosing namespace is defined. And so if you look here, these different references to value, we've got one here, and one here, and one here, they all resolve to different things
10:21
even though they're both all in this like out, mid, in namespace. And the reason is that in Ruby it matters that in here out, mid, and in are all on separate lines, in here they're on two lines, and in here they're all on one line. And this creates this concept of nesting, this sort of location-dependent concept of nesting
10:41
that you can actually ask Ruby at any point, like hey, what's the nesting right here? And it'll tell you. But it creates this concept that is kind of crucial to reference resolution because Ruby will sort of search each of the nestings. It won't sort of search each of the pieces. As I said, I strolled with this a lot when I first came to Ruby. Still make this mistake sometimes.
11:01
And so given our definition tree, we can look at kind of how we use the nesting to resolve a simple reference. So we have these various steps of the nesting, it's out, out mid, out mid, in, and so we just walk them one by one down the tree. So first we kind of walk the innermost nesting, out mid, in, is there a value defined in there?
11:21
Nope, not defined there. Try the next one. We go out mid, is there a value defined there? Yep, we found it. Great, we've resolved that reference. Okay, it's not just nesting. The other way that Ruby resolves constants is by ancestry.
11:42
So after it resolving by nesting, it checks through the list of ancestors. And so in this case, you can see that because this mixin, mix, is an ancestor, where is it, an ancestor there, of this class child, you can resolve value in the context of child by kind of walking through that ancestor list.
12:02
I should note here, include, sort of including in Ruby appends something to the ancestry list. I guess inserts it in a particular place. Append might not be the right word. And so let's use that to walk through kind of a more complicated example
12:22
and sort of a more realistic example of some of the reference resolution that we might have to do. So in this case, we're trying to resolve something that has two parts. We've got child first and value, and we sort of have to take each of those names individually. So the first thing we do is we use nesting to resolve the child part. And so we walk each of the nestings.
12:40
There's out other child. That doesn't resolve. There's no other defined anywhere else in here. But then we look at the other nesting, which is out colon colon child, and yay, there's a child defined in there. So we've matched child, and now we move on to the next step of resolving value within child. And so now we're looking at the list of ancestors on child
13:01
to figure out where that resolves. And the same way, we walk them one by one. So child is the first ancestor of itself, doesn't define value. Its parent class is its next ancestor, doesn't define value. And then the mixin defines value, is the next ancestor, and we've resolved the reference to its sort of canonical definition.
13:22
And that actually covers, this covers most of Ruby reference resolution. There's some edge cases. We'll get into a little bit of what we had to do to protect against some of those later, but this pretty well covers it. So putting the pieces together here, this is kind of the thing that's running
13:40
in the background any time someone's editing code. We sort of run through this whole pipeline any time you change or edit a file. And as I said, you can kind of keep working asynchronously. You don't have to wait for this to finish because otherwise you'd be waiting 30 seconds all over again. So the first thing we do is we parse all the files in the code base. This is nice because it's really easily cacheable, and it's really easily parallelizable.
14:02
We fork off a whole ton of processes, as many as you have CPUs, if we're sort of able to do that in that environment without it hanging in the case of your laptop. And each of those processes parses this, stores it in a SQLite data, or actually passes it back to the main process, which stores it in a SQLite database
14:20
associated with the hash of the file it processed. And so we can very easily sort of do incremental parsing, and this step ends up being incrementally quite fast. We then use that data to form that definition tree, that tree representation of all the defined constants in the code base. And at this point we can sort of fork off and write the autoloader. We don't have to kind of wait for the full reference resolution to do that.
14:41
We fork off and do that on the side. But the fun stuff happens in the reference resolution track. We go through these iterative cycles of resolving all the references we can. So you could imagine a case where a class could define a parent that's not a sort of fully qualified name. It could define something that itself needs the parent reference to be resolved
15:02
before we even know what the class's parent is, so that we can then resolve constants within that class. And so we go through this iterative loop of going over every unresolved constant in the code base and trying to resolve it again, and trying to resolve it again. When we get to the point where we can't resolve anymore, it would be really, really nice if at that point
15:20
all the constants in the code base were resolved. They're not. This is the part where this is not entirely static and I kind of lied. We can't statically analyze gems. You could be writing C extensions. You could be doing all sorts of crazy metaprogramming to define constants. And so instead of trying to figure out
15:40
what gems are doing, we just load them. We parse, in this first parsing step, we parse out all the require statements in the code base, and then at this gem resolution stage, we just require everything the code base requires and use const get to poke at the VM and say like, hey, have you heard of this constant? Have you heard of this constant? And then we kind of keep doing this loop until we end up with a complete list
16:01
of resolved references. Again, it would be really, really nice if at this point everything was resolved. The fun thing was we found some real bugs in our code base where there were sort of error paths or paths that weren't taken often that had like legitimate this would never resolve, this code does not work at all. We also found some things that we just had to whitelist and walk away from for the time being.
16:20
But for the most part, we found some kind of fun errors. So we've talked through a sketch of what the autoloader looks like, kind of why it made things faster for us in production, and the static analysis that's behind that. I also want to talk a little bit about what it took to make this safe to actually run in production for us. Not all code is safe to use in an autoloader,
16:41
especially an autoloader that's kind of being dynamically auto-generated like this. One of the first things I did when I sort of started hacking on the autoloader part of this was to try to run a couple of the functional tests of our API. And they immediately failed declining charges because the backend that's supposed to sort of fake out,
17:04
yep, this worked in tests, didn't exist. There was an empty list of backends. And the problem there was that the backends were all getting registered with this router when those files got required. So there was this assumption baked into the code that these files would get required when the app booted up
17:20
and would have some require time side effect that had to happen in order for the app to work properly. But that doesn't work for an autoloader because stuff's gonna get loaded on demand. There's no require time side effect. And if the only way your code gets accessed is through this registry, it's never gonna get accessed. It's never gonna get registered because the registration depends on it getting called, which is the only, yep, goes in a circle.
17:43
And so the simple way to solve this is to just make engineers explicitly write out the list of here are all the things I need in this context. And every time they add a new, say, this is just kind of a fake concept, but every time they add a new handler, add that handler to a list of handlers somewhere else that you need to load.
18:00
But the cool thing is, we just did all this static analysis to figure out what all of the full ancestry tree of every class in our code base is. Why don't we use that? We can use that information to go and dynamically generate a list of all of the subclasses of this base handler. And so you can just say, hey, when I boot this thing up, by default just use all of the subclasses of base handler.
18:23
You don't have to kind of type them out every time. And we used, sort of to do this, we have this bit of code running in the same build daemon that runs every time you change things that powers the static analysis that generates the autoloader. It's actually been a really powerful tool for us having this thing that runs in the background
18:41
that can do stuff on the basis of code that people write. We've used it for extracting localization strings when people save the .rb files, a bunch of other stuff that's been kind of helpful. So I showed you a stub kind of at the beginning that I said was, this is the autoloader thing we generate.
19:02
That was a lie. This is something that looks a lot more like the autoloader sub we generate, and there's some added complexity in here that we had to add to catch some weird edge cases. The first thing you'll notice in here is that we actually require gems before we load the file that you've written, the human-coded file.
19:20
We do this to avoid certain kind of dependency cycle issues, especially when people are inheriting from gems, referencing gems in a particular order in their files. Other ways we could refactor around this, but just requiring the gems was the simplest. We then call this hook that basically we run with the name of the thing we're about to define
19:41
before we define it. We use this for things like if you try to load test code in production, we're gonna raise an exception here before we let you define anything rather than risk, sort of any test code running in production. This is sort of a risk with an autoloader, that code is just going to load whenever you reference it, so it is very possible to accidentally reference something
20:00
you did not mean to load in that context. We then predeclare the constant that we're actually going to define all of the behavior for later when we load the file that the human engineer wrote. We do this because of certain dependency cycle issues. Ruby gets very sad when you get into a place where it calls the autoloader, and by the time you leave this file,
20:21
it has not defined the constant that you were supposed to define. Next we get to the part that was sort of the meat of what I was showing before, where we tell Ruby sort of what all the next steps are. Like if you're inside this context that we're autoloading, and then you look for charge, or then you look for refund, you look for these other constants,
20:41
where should you go find them? But again, we wrap that in some custom handling to handle this case where something is already loaded or already partially loaded. So imagine you've already defined a part of a charge class somewhere, or have for some reason brought this into scope. Ruby's never gonna see a const missing for this.
21:00
It's never going to actually load this definition file that we wanna load. And so before we actually tell Ruby's autoloader, like if you get a const missing for this, go load this file, we check, does this thing already exist, because then we should just require the thing, or we should raise an exception to say, there's some partial loading going on here. This is not good.
21:20
And finally, we get to the fun part, which is actually loading the code you wanted to load in the first place that defines the sort of engineered behavior. And again, this is wrapped in a helper. This is to detect more things that have to do with dependency cycles. One of the fun things you can do here is if you imagine defining some constant that gets autoloaded,
21:41
in the process of that, say it includes a mixin, that mixin somehow goes through some weird dependency cycle that calls respond to on the thing that you were loading. It's only partway done loading. Half the methods don't exist, so it returns the wrong value for respond to. You now have some very weird, hard-to-debug, order-dependent behavior in your code.
22:01
And so instead, we kind of hook into that, catch that, and throw an exception, and say, you have to rewrite this code to not be order-dependent in this way. Finally, there was a bunch of stuff we just had to ban that just does not work well with an autoloader. Some of the more obvious stuff is that if you dynamically set constants with constset,
22:21
or dynamically get constants with constget, we can't statically analyze them. So those have to go away. Objects-based, similarly, it's a way to dynamically access constants. That had to pretty much go away in our codebase. We wrote RuboCops to ban these things, and spent two months migrating every single one of them out of the codebase.
22:41
Kind of in my opinion, that was time well spent. I think these and a lot of the things I'll talk about on the next slide that we had to ban, I think not only make it hard to statically analyze code, but potentially make it harder for humans to interact with code. Certainly in the case of dynamic, constget, and constset, they sort of foil some attempts to use simple tools like grep to look around for things.
23:05
So some other fun things we had to ban, we banned reopening of classes. So you're now in our codebase only allowed to define the behavior for a particular class or a particular module in exactly one place, and there's some subtleties to how to do that with modules without making Ruby a pain.
23:20
But we'll just focus on classes here, because it's a little more straightforward. But the problem with reopening classes in particular is that there can again be order dependence there. So imagine that you loaded this second definition of this helper class first. It's gonna sort of call alias method here and then raise an exception because the my helper method hasn't been defined. Somebody assumed that this class would get defined first.
23:43
And so we just banned this. You're not allowed to reopen classes in our codebase anymore. And again, in my opinion, this makes our code sort of easier to reason about, because the definition of a thing is in the one place that you're looking at it. Another really fun one that I, I only found in a couple places in our codebase, but I threw in, because it threw me for a loop for a while.
24:03
I really would have thought that this would not resolve, because it's like foo colon colon foo colon colon bar colon colon value, and I haven't defined that anywhere. But it turns out that this, there we go, this foo, actually goes through Ruby's reference resolution
24:20
and resolves to this foo. We just, you know, we could have probably done some iterative resolution to figure this out, but it just wasn't worth the complexity, and I, at least the places in our codebase where I found this, it was code doing things that the author had not intended. It was a mistake.
24:41
They did not want this to be behaving this way. Finally, I talked a bit about how we have to be able to track ancestry. We have to know what all the ancestors of a given class or module are, and that doesn't really work if you define those dynamically. So if in your included hook for a mixin, you call prepend and sort of then add another mixin,
25:03
we can't really trace that, and so we had to get rid of that. So what's this good for? We talked about the autoloader. We got much faster boot time, sort of 35 seconds of boot time to five seconds of boot time. We talked about kind of deleting
25:21
all the required relatives in our codebase, kind of letting people just reference things and we'll handle getting them into scope when they're needed. But there were some other kind of cool things we were able to build once we had this dependency graph of our codebase. One of the more powerful ones was selective test execution. So if you look here, we've got two classes defined, and each of those classes has a test defined for it.
25:43
And it sort of makes sense that if you change some behavior in the parent or the child class, you're gonna have to run the tests that test the child, because the behavior in the parent presumably influences the behavior of the child class. But I really hope that the behavior of the child does not influence the behavior of the parent class.
26:03
That if all you do is change the child, you shouldn't have to run the parent tests at all. And so we implemented this in our CI framework and are at a point, well, this is old data, but we're at least at that time at a point where for about 25% of our test runs, we were running less than 5% of what's otherwise
26:21
a multi-CPU day highly-paralyzed test suite. So it just saves an incredible amount of time and money for developers. This is what happened when we sort of repeatedly tried to plug the dependency graph we have into Graphviz.
26:40
It's a mess. We call this the Gordian knot, and aside from not making pretty pictures, this sort of entanglement of code makes it hard to reason about it, makes it hard for somebody without sort of being able to hold millions of lines of code in their head at one time, that is to say, I think nobody,
27:01
to actually kind of make safe and quick changes in the code base. And so we're trying to figure out how to untangle this, and one of the things we can do now with this dependency information is we can kind of do some refactoring, take something out of the strongly-connected component, and then write tests and say, no one is allowed to reconnect this to the strongly-connected component.
27:20
And we were recently able to do that with one of sort of our main core data models and are kind of figuring out how do we do more and more of this to slowly pick apart this mess. And one of the primitives that we've given developers for doing this is the concept of packages. So you can now, in our code base, define a package.yml file in any directory,
27:40
and in that file say, everything in this, every Ruby file in this directory has to come under this namespace, and it can only reference things from these imported namespaces. Yes, we're copying a whole bunch of other languages. And if you violate that, you're gonna get a RuboCop error, either in your editor or in the command line like you see here, where it'll tell you,
28:02
nope, you're not allowed to do that. You have to go and import this. And the nice thing about this is it makes it really easy to see when someone adds a dependency in code review, and so you can sort of have that conversation as part of a code review of should these things be coupled. And we also have a tool that lets teams set up watches on files, and so teams that own a particular package can say like, I wanna watch for changes
28:22
to the package.yml file that I own, and can kinda help make sure that they get involved in the discussion of anything that's gonna kinda change this important behavior of the code that they own. So what's the future? This was kind of our entree into static analysis in Ruby. The future for us is type checking.
28:40
We have an ongoing project to type check, sort of gradually type check our Ruby code. We've started adding type annotations in our code that are checked at runtime, so it's sort of valid Ruby code that you annotate your methods with, and we'll make sure that the types that you pass in at runtime are valid. We've started writing a static type checker that can read that same syntax, and we've actually already caught
29:00
a couple of real typers in our code base with it, but we're sort of very, very early in this work. We've really only been seriously poking at this for a few months, so hopefully stay tuned for more on that next year. So in summary, I, sorry, I lied. How to load a million lines of Ruby in five seconds?
29:20
Don't. Rather than solving hard problems, unhave the problem. Don't load code faster. Use an autoloader and load less code. Don't run tests faster. Run fewer tests. Actually, there's somebody in the back who works with me on test stuff at Stripe who also would like us to run the tests, just make the tests faster. Don't try to reason about millions of lines of code. Just modularize the codes
29:40
that nobody has to reason about the whole thing at once. Thanks. I probably should have preempted that one because, yeah, I feel, so. Can you repeat the question? Ah, sorry, good point. Are we planning to open source this? We would love to, but I'm worried,
30:01
I think at this point, so much of it is kind of tightly coupled to changes that we've made in our code base that it's not especially helpful in its current form, so we're not, we don't really have any plans to kind of figure out how to bridge that gap. I'm hoping we can do more with the type checking work
30:20
because I think at least hopefully that'll be a little more general purpose, though certainly there's an amount of metaprogramming you can do that will foil any attempt at static analysis. Yeah, so I'm gonna have to apologize that I'm not deeply familiar with a lot of Rails since we don't use Rails, but I did look at the Rails autoloader when getting into this, so you can yell and scream at me if what I'm saying is completely wrong.
30:42
My understanding of the Rails autoloader is that it is based on being able to make good assumptions about the names of things, and because Rails values conventions so highly, it's able to make those sorts of good assumptions, but it's very dependent on sort of the way Rails structures and names constants,
31:00
and so it sort of doesn't work in the general case, and we just, as you were saying, we have a very different case there. The nice thing about this is it also sort of generalizes beyond things that are inside the framework of, say, Rails. Great, thanks everyone. Thank you. Thank you.