We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

High performance APIs in Ruby using ActiveRecord and Goliath

00:00

Formal Metadata

Title
High performance APIs in Ruby using ActiveRecord and Goliath
Title of Series
Part Number
57
Number of Parts
94
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
We had a real-time API in Rails that needed much lower latency and massive throughput. We wanted to preserve our investment in business logic inside ActiveRecord models while scaling up to 1000X throughput and cutting latency in half. Conventional wisdom would say this was impossible in Ruby, but we succeeded and actually surpassed our expectations. We'll discuss how we did it, using Event-Machine for the reactor pattern, Synchrony to avoid callback hell and to make testing easy, Goliath as the non-blocking web server, and sharding across many cooperative processes.
Multiplication signPoint (geometry)1 (number)Data structureQuicksortView (database)MetreSet (mathematics)Group actionObservational studyImplementationGrass (card game)Ocean currentRule of inferenceStudent's t-testMathematical analysisLibrary (computing)Information securityQuery languageNumberBit rateAreaData managementBitTerm (mathematics)TrailGraph coloringPredictabilityStructural loadSoftwarePhysical systemCartesian coordinate systemConnected spaceResource allocationRoundness (object)Interpreter (computing)Wrapper (data mining)Process modelingSoftware engineeringBefehlsprozessorWeb 2.0ConsistencyCodeCase moddingLoginSystem callAttribute grammarFrequencyLimit (category theory)Stack (abstract data type)2 (number)Client (computing)Buffer overflowLevel (video gaming)Concurrency (computer science)Thread (computing)Pattern languageVirtual machineLine (geometry)CASE <Informatik>Uniqueness quantificationFacebookSoftware testingValidity (statistics)Parameter (computer programming)Traffic reportingDecision theoryResponse time (technology)MultiplicationInsertion lossGreatest elementQueue (abstract data type)Event horizonSerial portMathematical optimizationComplex (psychology)Computer animation
Complete metric spaceType theoryBoilerplate (text)WindowPosition operatorEmailCASE <Informatik>Parameter (computer programming)BitEvent horizonCodeResultantGreatest elementClient (computing)Line (geometry)Fiber (mathematics)Block (periodic table)Thread (computing)State of matterVariable (mathematics)Physical systemSoftwareDecision theoryLevel (video gaming)CountingRevision controlSynchronizationWeb 2.0Different (Kate Ryan album)Web applicationBefehlsprozessorComputer programmingMereologyMobile appLastteilungServer (computing)Function (mathematics)Virtual machineComplex (psychology)Dependent and independent variablesPattern languageSet (mathematics)Cartesian coordinate systemScheduling (computing)Link (knot theory)Service (economics)Insertion lossCellular automatonComputer virusRule of inferenceRight angleUniverse (mathematics)Graph (mathematics)Information securityOutlierAreaInstance (computer science)Process modelingSerial portDatabase normalizationMultiplication signContext awarenessPresentation of a groupGroup actionComputer animationSource code
Window1 (number)DistanceSoftware testingSet (mathematics)QuicksortOffice suiteNumberBitEvent horizonDifferent (Kate Ryan album)Wechselseitiger AusschlussPopulation densityInterrupt <Informatik>MultiplicationDisk read-and-write headThread (computing)Multiplication signTouchscreenGroup actionLoginFiber (mathematics)Demo (music)Computer animation
Pattern languageMassParameter (computer programming)Procedural programmingProcess modelingMessage passingVotingStandard errorMathematical analysisPerformance appraisalInsertion lossQuicksortError messageDifferent (Kate Ryan album)Information securityVideo gameMultiplication signHypermediaSoftware testingComputer fileVirtual machineFiber (mathematics)Validity (statistics)Server (computing)Fitness functionData managementEncapsulation (object-oriented programming)Greatest elementException handlingEvent horizonPersonal identification numberProxy serverSynchronizationArtificial lifeLine (geometry)Degree (graph theory)ImplementationRight angleGroup actionAxonometric projectionUnit testingLinear codeFlow separationCodeRevision controlResultantSoftware developerCodeSoftwareLevel (video gaming)Decision theoryBuildingRow (database)Data storage deviceLinearizationDevice driverConnected spaceOpen sourceForcing (mathematics)Maxima and minimaSpeech synthesisEmailPopulation densityAddress spaceBefehlsprozessorSocial classLoginXML
CASE <Informatik>Dependent and independent variablesFile formatVariable (mathematics)Message passingPoint (geometry)Constructor (object-oriented programming)Object (grammar)String (computer science)Hash functionMoment (mathematics)WeightEvent horizonVirtual machineView (database)Level (video gaming)Exception handlingEvelyn PinchingServer (computing)Design by contractWritingMereologySocial classError messageUnit testingFunctional programmingMatching (graph theory)Data structureFactory (trading post)SynchronizationCodeCoroutineLibrary (computing)Pattern languageReverse engineeringClient (computing)Rule of inferenceProjective planeMultiplication signProxy serverBlock (periodic table)ResultantLine (geometry)Concurrency (computer science)Formal languageFiber (mathematics)Functional (mathematics)Sound effectSeries (mathematics)Universe (mathematics)Programming paradigmStudent's t-testRight angleSummierbarkeitPhysical lawSet (mathematics)AreaProcess modelingVotingStructural loadService (economics)Software testingBasis <Mathematik>Food energyOntologyTask (computing)Source codeSheaf (mathematics)Term (mathematics)Musical ensembleBit rate
InternetworkingProcess modelingRevision controlQuicksortExecution unitSocial classOpticsProcedural programmingVariable (mathematics)Universe (mathematics)Natural numberWordVideo gameStaff (military)Student's t-testMultiplication signResultantSolvable groupSlide ruleMetropolitan area networkInsertion loss1 (number)Level (video gaming)Barrelled spaceOntologyTheoryProcess (computing)Electronic visual displayReading (process)Right angleScaling (geometry)Arithmetic meanStandard deviationMereologyPhysical lawForm (programming)Power (physics)Sampling (statistics)File viewerArmNormal (geometry)Pattern languageConstructor (object-oriented programming)Parameter (computer programming)CodeCASE <Informatik>Computer fileInstance (computer science)Address spacePoint cloudDiagramConnected spaceDependent and independent variablesResource allocationBuffer overflowNumberMoving averageBit rateVirtual machineLimit (category theory)Software maintenanceProjective planeSystem callGroup actionMixed realityServer (computing)Ring (mathematics)Physical systemTwitterEvent horizonBitMomentumOrder (biology)Roundness (object)2 (number)Factory (trading post)Entire functionChemical equationStructural loadSoftware testingSuite (music)Complex (psychology)Set (mathematics)Client (computing)Single-precision floating-point formatBand matrixSynchronizationConcurrency (computer science)LastteilungProxy serverTelecommunicationDomain nameSocket-SchnittstelleElasticity (physics)RotationMeasurementLie groupMathematical optimizationComputer animationPanel painting
Maxima and minima
Transcript: English(auto-generated)
CTO and I'm Daniel McCloskey, I'm a software engineer at Endo. I've spent the last two years or so working on highly performant APIs in Ruby.
High performance in Ruby typically aren't the phrases that are in the same sentence. But we've had enormous success using the line without having to set up a device to get use of coding without compression. So before we get too far into things, a little background on the API that we had to scale
which is necessary, it's called the Rainpool API. The Rainpool API allows you to track calls similarly to how you would track a web click. You pass us a unique set of parameters and we'll give you a unique phone number back. And if that phone number gets called, it shows up in reporting and allows you to make more intelligent decisions
about where you're spending your marketing budget. Internally, a rainpool is a collection of phone numbers. A customer can have many rainpools, they're usually attached to a particular marketing campaign. And each rainpool will contain between 200 and 500 numbers. To protect validity of call attribution, we reserve a phone number for a preset period of time
after each application. This requires some architectural consideration because there's a time limit customer who has a number against a rainpool that doesn't have any available. It's all locked out. So, to ensure that we always have one to give out, each rainpool has an F plus one number that we call the overflow. We'll give it out if we don't have any other numbers available. We don't track any call attribution data with it.
In some extreme situations, we've seen these overflow numbers handed out several hundred times a second. And so, there's no way to correlate with the rain calls. So, we were approached about two years ago by a client who wanted to use the rainpool API but had much higher throughput requirements than a rail stack was going to be able to support. Our existing API, the old rail stack one, could support a connection of about five requests a second.
We got one phone number per request. And the round trip 90th percentile response time was about 350 milliseconds. The client requirements were a little bigger. They wanted to be able to handle at least a thousand requests per second for connections.
They wanted at least 40 phone number allocations per request. And they wanted a round trip total response time of 90th percentile less than 300 milliseconds. So, some quick back and back and back. That's an 8,000 fold increase in throughput just for a single customer. So, well, the requirements were pretty clear.
We needed a very predictable load latency. And we needed to be elastically scalable if we brought more users onto the network. Additionally, we wanted to leverage our existing API. So, we knew that the new system needed to be able to communicate with the old system asynchronously. We also knew that eventually we were going to want to share business logic, all the complex stuff that we had over in our main app, with this new API and other APIs somehow.
And so, bonus points if we could find a solution that was in Ruby. So, while Ruby brings a lot of awesome to the table, there are some considerations necessary when you start thinking about highly performant applications. The main thing that we were working about, because we're on the mod's interpreter, was the global interpreter logs.
For most dynamic languages, this is like the elephant in the room. It's worth noting that neither the JRuby nor the Rubeus interpreters implement a GIL, and so they don't have this issue, but who is running a Rubeus or JRuby? Okay, so it's still a problem for most of us. So, in short, the GIL is a wrapper placed around every Ruby statement,
and it prevents the process from consuming more than one CPU. It's nice because it simplifies the interpreters to sign because the implementers don't have to think about multi-threading, and it prevents a lot of data consistency issues that can pop up in highly concurrent code. The downside is that since we can only run one Ruby process per core, we can't write truly parallelized applications.
In the ideal world, this is what your CPU uses would look like. You'd have one application process, one application running, and it would evenly spread its work across all the available CPUs on the machine. However, with GIL, you end up totally pegging out one process and then just leaving the rest doing absolutely nothing. So the only way to get around this in the MRI, at least,
is to run multiple cooperative Ruby processes. So we sharded our API across cores. The second issue we needed to address was called the C10K problem. The question is, how can we best optimize our application to handle 10,000 synchronous incoming requests? We could write highly threaded Ruby code, but that's really difficult to get right,
or we could avoid that complexity entirely by using the reactor pattern. The reactor pattern is one solution to the C10K problem we chose. The gist of it is that whenever we're writing code, we block the main thread, referred to as the reactor, like an HTTP request or a database insert. We toss it on the bottom of the queue with an asynchronous callback and pull the next action off the top of the queue and start working on handling it.
The application then selects for events for which the blocking I.O. is completed and runs the callback serially. This allows us an optimal level of concurrency without having to walk that thread. If you've seen Node.js, that's the reactor pattern in JavaScript and Python, it's a twisted library, and in Ruby, the reactor pattern library is a vet machine,
which you may have heard me and James know that since you know that. This is an example of a call to a remote API like a vet machine. For any action we expect to block, in this case queries to quote Facebook and Forceware, we define a callback for when the request succeeds and an airback for when the request fails, one of which will be executed when the action completes or unlocks.
Unfortunately, in many cases, the callback and airback might be 80% similar. At Invoca, we have another API that runs on a vet machine, and these callbacks make the code really difficult to read, write, and test. Without very careful art testing, our code was becoming incredibly brittle and nearly uncolorable. You can end up with a complete mess of spaghetti code.
If you've worked in Node.js, this might be very familiar to you. Fortunately, because Ruby, there is a third way. Synchrony is a gem that sits on top of a vet machine, and it leverages the fibers introduced in Ruby 1.9 to make writing asynchronous code much easier. Instead of using callbacks,
when a method blocks the reactor, its state is stored in the fiber stack, and the reactor goes on to the next vet. When the fiber unblocks, sorry, unblocked, should I say the words? When the method unblocks, the reactor resumes it from the stack. Synchrony handles all of the complicated stuff in the background and lets us program linearly even without fear of being arbitrarily interrupted.
This pattern of application scheduling using fiber, which is also known as code routines, will be familiar for any of you who are working in Go, this is a technical Go routine, for new code. So instead of this potential pile of smells and confusion, we ended with something that looks a lot more like the kind of joke code that we enjoy writing and testing.
We wrap blocking code in fibers, and synchrony handles the rest. Enter Goliath. Goliath is a Ruby web server that leverages a vet machine, a vet machine synchrony, and the async rack, and handles each incoming request as a separate fiber. All asynchronous IO will transparently suspend and later resume without us having to do any of the heavy lifting.
We define a response method that takes a single argument through which Goliath leads you over the requested event. Do whatever work you need to do, and return the result to Goliath which should go over the client's HTTP response. This last non-blanked out line here is the array that you hand back to Goliath.
Position 0 is the status code, in this case 200, position 1 is the headers that you want to give back, in this case nothing, and position 2 is the body of your request. So using a vet machine, synchrony and Goliath enabled us to build a really performant system. The problem is going to illustrate the difference between threaded and reactive patterned Ruby code, and then talk some about the software design decisions
that we found to be very effective in synchrony. Thanks Dan. You guys ready to see some code? That's my favorite thing. Not going to laugh at you.
We named all of our conference rooms. The fish bowl is all glass, so that's kind of a fish bowl. So at the bottom level here I've got a very simple web app that just counts. It does nothing but count. So here's the threaded version of the code. Then if you've written the threaded code you'll recognize you typically would have a new text to detect it.
You've got a state variable and a counter. Down here is the increment method. You do that inside the new text. You've got a special case where you want to shut down and stop. Here's the counter itself and we want to retrieve it. You also grab the new text, so you have the same copy of it. Here's my run method when I kick this thing off.
All it does is create a thread. And then it gets stopped. And down here is increment forever. This is the actual part of the app. Here it just increments into where it has to stop. So there's a simple threaded version. Here it is eventually. I basically deleted a quarter of the code.
There's no need to mess around with new text because you're not going to be asynchronously irrelevant. So it's much similar to the recent app. The only difference down here is spider.new instead of thread.new. And this little piece actually takes off the fibers. They give it two steps to find its own threads that are run immediately.
And so down here I have to do one other thing. Because this happens to be a CPU on application, I did need to periodically yield. This is fairly rare that you would have a giant chunk of CPU, but this is the downside of writing cooperative code.
You have to cooperate. So here I've set every hundred thousand within a million pods. And then here is the actual app itself. This is the little Goliath app. By the way, this is all up on GitHub. I'll get the link at the end. So here's the actual server it inherits from Goliath API.
It gets set up with an account object. I've got one or two counters that have been passed in. Do a little bit of start up here and then run the counter. And then this is the actual heart of the web server. Every request comes in. These are the paths and your output. So the slash counter hands back this little bit of JSON.
It also got Israelized if we're connected to an ELB or other load balancer. We have a way to go through a live. And here's the 404 handling. It's that simple. There's also a async Sinatra if you wanted to do more complex web software. A little bit of code down here at the bottom.
I look at the command line to decide which type of counter to run. And then this is the boilerplate actually getting the Goliath running. So let's fire this thing up. I've got
Alright, so in one window here I'm going to run the threaded one. Another window.
There's the vent tube, there's the threaded one. I'm going to give the threaded one a head start. And here's the event tube. Alright, so I've got the log coming out right to the screen here. Those are the live logs. So those things are running now.
And down here in this window I'm going to hit both of those. One's running at port 9000 and one's at 9000. So they both start at zero. And the C and the speed the event tube runs more than twice as fast.
And that's because it's not messing around with the mutexes and it's not being asynchronously interrupted. also you might notice they happen to always be multiples of 100,000 in a single manner. And that's exactly what you'd expect because it's cooperative multi-threaded. The only time it's going to have a chance to serve
my slash counter action here is when the fiber goes through. Alright. So that concludes that demo. I think 2x is more than 2x here. I think 1.5x would be a more typical kind of number you can see a little bit in the lab. This is obviously kind of the general but significant
difference in density. Ok. I want to talk a little bit about some of the software design decisions that we made that really seem to pay off in building this API server. To start with Synchrony supports many popular active record stores
including MySQL MySQL 2.0 driver for that Postgres, Mongo, Redis Lemcache, MPP, all of those and there's plenty of third party drivers as well typically very visible.
The biggest problem we had was dependencies. When we pulled in active record code like this the autoloader in Rails kind of encourages or at least enables dependencies between models. In this example, my dependency in the user of the account is actually ok. Rails doesn't force the account to load in there. But as soon as I mentioned email address
the user pulled in the email against Glass and then that class mentioned the network right in. You mentioned a gem, we pulled a gem in. So we had to spend a fair amount of time snipping dependencies and we ultimately went with turning off the autoloader entirely in our real life. We really wanted to get
just the models that were actually necessary to be pulled in to our server. You probably know that active record is fairly slow. So for maximum density in your API server CPU logs included loading objects, it's fairly slow validations and callbacks
TLDR, use thin models. Models with few dependencies and few validations were the best fit. We also found we needed to encapsulate handling of exceptions. So we have a gem that we put together. This is open source on GitHub called Exceptional Synchrony and it's just a layer that we added
above event to make it even safer. It has a couple shims, the most important but rich, and that's exceptions across callbacks. I'm going to show that. We've also got a proxy that sits in front of the event machine just to make the events a little safer and parallel sync helps with each of those in terms.
So here's the way we do callback exceptions. To start with, this is a straight event machine example. We've got a callback and an airback and you may notice connection.close appears in both places. This is often a pin code invention and that's typical for callbacks and airbacks. It's not a natural way to code. You wind up
with some degree of implementation right there. It's not dry. One of the things that drew us to Ruby was that it has Alabama exception handling. We were raised in rescue and still are involved in the actions of us plus. We really found that to work great to keep your area separate from
your white code. So here we can see that Synchrony returned. It unified the return path. This is the version that's using Synchrony. We no longer have an airback and callback separated, but the single result is being set back into that request and then the code is doing it else. This reminds me of coding in C.
We kind of hope that your fellow developers might remember to check return codes. This is not a great way to build really live code. Inspired by a piece of Synchrony code, we realized we could tunnel exceptions over the callback. By doing that, there's no need for a separate airback. Here's the ensure callback
method in this gem. It just calls that return exception project method down there. It's very simple. It just takes the return value or an exception, whichever happens, you get that back. That's what we set across. We call it a success in all cases. It doesn't really matter because you're going to pop right out. On the other side,
where we have a waiver, it's fairly clumsy long name here, but it's a map deferred result. It takes those results and knows how to check them. If they respond to an exception, it will raise them back. Those two put together, you get right back to straight linear
plain old Ruby code. You can see here now what we had to do was ensure the connection gets closed on the way out. Exactly what we decided. One thing I wanted to mention here is for testing, we often found we can unit test these things. This is linear code that will run perfectly fine without the machinery involved in starting on the line.
That was never true for our event machine projects. Unit testing was quite difficult. You always had to start the reactor. And speaking of exceptions, the exception gets loose. When a machine or synchrony fiber calls into your code, your process will exit.
That is not any fun. We learned this already. We had some of our server processes fall over and get restarted by a process manager. It seemed to be random. We eventually got to the bottom of it and it was friendly files. The internal exceptions often for very mundane things when the file commission wasn't
playing or something. Something like that. It blew up and shut the whole machine. After having been burned by that, we wound up with just a thin layer above all the event machine methods. They look like this. They just rescue and log the errors and don't let them
get any higher. These are basically the top of the stack. There is nobody in our view at this point. You might notice that this is actually rescuing the exception, which you ordinarily wouldn't do because that is the lowest level exception. You can rescue a script error
if you have a typo. You have an always running API server that is actually exactly what it is. The event machine proxy wraps all these methods. They are basically the top level methods. The last piece to talk about in this
gem is parallel sync. The code is natural for writing in parallel, but makes it harder to write in serial code. Co-routine libraries like Syngrity reverse that. Serial code is easy to write. It is a trivial effect. But when you want to run code in parallel you need to take some explicit steps.
Here is the heart of our API scanner data across the shards. You will see that it says response is equal and then it kicks off this parallel block. That adds all the work to be done to talk to the other shards. Then when you hit the end of that block where the tunnel is there
all of the results are going to run there and it is available now as a array to be used in the next statement. This really should be future promises pattern. We are working on that. I am going to try to work some other gems in to get back to work the same. I also want to call attention to
the last line there. It is a plain old Ruby object running in a hash. We use those steps already and I want to talk about them a little bit. We call them immutable value classes. Plain old Ruby objects that you would use instead of a simple string or hash typically. We found this to be great out of the moment.
Inspired by saving nets. You will see here we have an initializer that sets the object up. In this case it holds the array of responses and also the format. You will notice a design by contract enforcement here that we don't allow formats except those two. In fact these days we don't even allow them.
I left it in there just to show the point that it is a great pinch point for contract enforcement. There is no accessors or writings. This is read only. That is the ignitable part. If you want to change it you change a copy. That is the convention. These classes typically don't
serialize themselves. In the previous talk in this case JSON knows how to serialize itself as an array of responses. We have a matching method from JSON which is a factory method there. We found this pattern worked out great. These things are all unit tested. It is individual pieces
and composable. This has been a hash which is the typical way this kind of code would be written. A hash does have two JSONs but it turns out that it almost never has exactly what you want in JSON. As soon as you have some internal bookkeeping that is different than what you want to share with your external clients a hash would not work.
It turned out the bulk of our code wound up being immutable values. That was pretty neat to see it go like that. These were Sandy Betsch's rules. We actually read about them after we did this project the first time.
They nicely point out direct you toward plain old Ruby objects and I particularly recommend the beautiful stuff. This is required but it is great for functional programming and it was also perfect for concurrency because Ruby is a shared reference language. We will have many requests
coming in at the same time and if they were mutating data structures even though with fibers they won't happen in a sequence, you still have to worry about making new blocks and when you come back the data structure might be different. And the simplest answer is don't change them.
I would be very very happy if they were a functional Ruby object and FlyDot made that belief. We found that by using these classes we naturally decomposed into much smaller pieces of code and avoiding big monoliths. The last piece I want to talk about briefly is singletons.
I don't mean the Ivy class style singletons I mean the global variable singletons. TLDR is very evil. They are global variables. You can't control the lifetime, you can't pass constructor parameters, you can't use them you can't easily test them.
And singletons we get more singletons. They are hydrants. You start into those, you get more and more of them. So we have modified singleton patterns that work great for us. This is an example class that is what keeps our secrets credentials we use to talk to other third party APIs and such.
It is similar. It has got an initialized method but you notice it is actually allowed to take a constructor parameter. It is making its dependencies explicit. Turns out after we wrote the code this way about a month ago, we actually had a case that needed two secret files and this code worked great.
The singleton pattern would have been dead in one as soon as you needed two. Down here you will see there is a method to set any particular instance to be the global one and a class method to get that. So you still have the convenience of getting at things without having to pass them through if you needed.
But you also have all the parts and address all the shortcomings I listed. So we also learned some important lessons about architecting large concurrent systems in the cloud. The first of which was about TCP latency. This diagram shows the
handshake that is performed each time a TCP connection is established. It is performed to determine the bandwidth of the connection. But if you can see the x-axis, this handshake can take about 20 milliseconds. On the left side here is HTTP 1.0. Each time the client wants to send a request they open a new HTTP connection
perform the handshake send a single request get a single response and then close the connection, rinse and repeat. Given our SLA, losing 20-30 milliseconds to a handshake on each request was highly undesirable. Fortunately, HTTP 1.1 introduced connection persistence. Subsequent HTTP
requests are pipelined across the initial TCP connection. The slow start handshake is performed once and the connection is kept open for as long as needed. So we used HTTP 1.1 persistent connections to communicate between shards and further optimize communication between shards on the same server by using domain sockets instead of TCP connections.
One issue that we did run into was with ELB as Amazon's elastic load balancers. They can only forward requests to a single port per server in rotation. So we had to put an HTTP proxy instance on each API server in order to round robin requests to the individual shard processes running on that server. This is what the setup ended up looking like in EC2.
The client has an ELB which rounds robins to the available HAProxy instances which in turn round robin to local API shards. The receiving shards send requests to other shards for ring pools that they don't own and then the initial receiving shard consolidates the responses into a single JSON body and returns that to the client. Ring pools are assigned round robin
to API shards' balance load. The system also allows us to move more heavily traffic ring pools in one shard into a second that's experiencing a lower average load. So what happens when a shard falls over or doesn't respond in time? Initially we implemented what we call the watchdog. It would monitor the API shards and react quickly to reassign ring pools if one shard stopped responding.
However, we didn't like the complexity in that design. I mean, who's going to watch the watchdog? So instead we taught each shard the overflow number for every ring pool in the system. So if a shard doesn't respond quickly enough to an allocation request, we just hand back the overflow number. We also rate limits communication between shards. If one shard notices
another shard is taking too long to process the request to respond, we hand back and start handing back overflow numbers instead of actually hitting the slow shard. Over time, the shard that noticed one shard of being slow starts to allow more and more requests through instead of handing back overflows all the time. So we took these measurements from EC2
using Jane Reader, which ended up being a fantastic tool. With three Jane Reader instances slamming our API, we had about a 93 millisecond median, 90% lie at 124 milliseconds, and we were able to process 1400 requests per second.
With four Jane Reader instances slamming the crap out of our API, we had a median of about 102 milliseconds and 98% lie at 144 milliseconds, and we were able to process 1700 requests per second. With five, we were able to process 2000 requests per second. We found, awesomely enough, and as it shows, the bottleneck was the number of
requests that Jane Reader was able to hit us with, not our API. We later found that we could have used AWS as you can see to ensure that data goes through everyone's private backbone as opposed to going over the public internet, and that would have cut an additional 10 to 15 milliseconds off of each round trip. The final result exceeded the SLA with a fair amount of effort. Fun fact, there are 5.5
billion phone numbers in North America. At this rate, we could allocate every North American phone number in less than 12 hours. So, it was a tremendous success. We were very, very pleased with the results. Some other tools we used, we used Minitest and Factory Girl. Our entire test suite run in less than 10 seconds.
We used simple coverage to make sure we weren't being lazy. And that's 10 seconds with a 99.96% coverage ratio. And again, as mentioned, Apache Jane Reader for load testing was awesome. You can run it headlessly up in the cloud. It's very easy to consolidate results. It's very easy to create very unique and complex testing sets for it and share those with clients who are also load testing
against our API. We got a couple of shout outs. We'd like to really thank the maintainers of the Mat machine and Mat machine synchrony. I personally would like to shout out to the IRC channel Ruby and Ruby Lang on Freenet. We'd like to thank Google Labs and Larry Gruborg, who are the maintainers of the end and end synchrony respectively. And a shout out to Sandy
and that's for a lot of the best practices where we made this project possible. Anyone else? I've got a call to action here. I strongly encourage you to try Glide as a stack for your mixed API server. If you have any questions, we'll put it on Twitter. We'd love to help out. I think it's a really great stack
and I'd love to see a version in the future of Rails itself. I know Elia Greger was proposing that idea in 2011 so it didn't quite happen but I think it should have. Now that we've heard DHH mention event machine as being part of Rails 5 I would love to see synchrony as well.
So I'm going to see if I can help out with that project and try to steer it that way. It would be a great stack together with Fin for instance which is already built on event machine. We would get a lot more density and take a little bit of the momentum away from Node.js which is really good when it comes to density.
Alright. So that's the end.