We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Your App Server Config is Wrong


00:00

Formal Metadata

Title
Your App Server Config is Wrong

Title of Series
Part Number
74
Number of Parts
86
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
As developers we spend hours optimizing our code, often overlooking a simpler, more efficient way to make quick gains in performance: application server configuration. Come learn about configuration failures that could be slowing down your Rails app. We’ll use tooling on Heroku to identify configuration that causes slower response times, increased timeouts, high server bills and unnecessary restarts. You’ll be surprised by how much value you can deliver to your users by changing a few simple configuration settings.
35
Denial-of-service attackRight angleWordError messageProjective planeFingerprintControl flowDependent and independent variablesServer (computing)Bookmark (World Wide Web)Cartesian coordinate systemBitUniqueness quantificationPeer-to-peerTwitterWeb 2.0Group actionFigurate numberComa BerenicesMobile appOffice suiteSoftware developerOpen sourceQR codeWhiteboardCore dumpHecke operatorComputer clusterComputer animation
Numbering schemeMultiplication signAmsterdam Ordnance DatumMereologyCASE <Informatik>Process (computing)Thread (computing)Rule of inferenceDifferent (Kate Ryan album)Concurrency (computer science)Point (geometry)ThumbnailLimit (category theory)Mobile appInformation technology consultingQueue (abstract data type)Server (computing)Computer programmingClient (computing)DatabaseCartesian coordinate systemSystem callSet (mathematics)Semiconductor memorySet (mathematics)Dependent and independent variablesWordConnected spaceBlogInstance (computer science)Software maintenanceTerm (mathematics)Service-oriented architectureProcess modelingSocial classWiener-ProzessComa BerenicesSpeech synthesisLevel (video gaming)Right angleMeeting/Interview
Multiplication signQuicksortProcess (computing)DivisorNumberService-oriented architectureStructural loadAveragePhysical systemBit rateCASE <Informatik>EstimatorCartesian coordinate systemGreatest elementServer (computing)Reading (process)Line (geometry)Point (geometry)Dependent and independent variablesMaxima and minimaResponse time (technology)TheoryMoment (mathematics)Data managementVirtual machinePhysical lawFactory (trading post)Client (computing)GoogolRevision controlSemiconductor memoryNichtlineares GleichungssystemSlide ruleAxiom of choiceEndliche ModelltheorieTerm (mathematics)Execution unit2 (number)Graph (mathematics)Presentation of a groupArchaeological field surveyCalculationBefehlsprozessorMobile appBinary multiplierRight angleComplete metric spaceData miningXMLUML
WordPoint (geometry)NP-hardLine (geometry)Semiconductor memoryProcess (computing)LeakGraph (mathematics)NumberLevel (video gaming)Row (database)Statement (computer science)CurvatureCoprocessorWeb 2.0BitConcurrency (computer science)Set (mathematics)LogarithmThread (computing)AverageSawtooth waveRegular graphPattern languageCapability Maturity ModelMultiplication signMobile appFrequencyCodeCartesian coordinate systemReal numberMaxima and minimaBuildingCache (computing)Object (grammar)Limit (category theory)TunisQuicksortDiagram
Process (computing)Semiconductor memoryChannel capacityComputer animation
Multiplication signThread (computing)CountingProcess (computing)FreewareDifferent (Kate Ryan album)Core dumpMobile appSet (mathematics)Virtual machineKey (cryptography)Cartesian coordinate systemBefehlsprozessorProxy serverExecution unitRandomizationReduction of orderType theoryBitTheoryFile formatSemiconductor memoryProduct (business)Line (geometry)Term (mathematics)Quicksort2 (number)Physical lawChannel capacityServer (computing)Computer hardwareReading (process)Maxima and minimaCASE <Informatik>NumberSound effectCodePower (physics)Engineering physicsComputer programmingRouter (computing)MultiplicationData storage deviceCoprocessorInformation securityStability theoryService-oriented architectureBuildingDisk read-and-write headEnterprise architectureData managementInstance (computer science)XML
Thread (computing)Connected spaceProcess (computing)NumberControl flowCodeCoprocessorRow (database)Error messageVariable (mathematics)Social classQuery languagePattern languageLibrary (computing)Game controllerSoftware testingSemiconductor memoryAuthorizationDatabaseStandard deviationSoftware bugLimit (category theory)Multiplication signInternet service providerRight angleCountingMathematicsMobile appGroup actionService-oriented architectureState of matterSoftware maintenanceMaxima and minimaStress (mechanics)Default (computer science)PlanningGoodness of fitLevel (video gaming)Building
LeakSemiconductor memoryGroup actionCurveCurvatureNumberThread (computing)Multiplication signLoginCountingServer (computing)Goodness of fitStatisticsMusical ensembleObject (grammar)Speech synthesisGame controllerCellular automatonDefault (computer science)Exterior algebraRight angle2 (number)AverageSystem administratorProcess (computing)Maxima and minimaComputer programmingFile formatClient (computing)MathematicsService-oriented architectureBuildingBefehlsprozessorSlide ruleScaling (geometry)RoutingEndliche ModelltheorieIntegrated development environmentProfil (magazine)Open sourceWebsiteWeb 2.0BlogReal-time operating systemInformation technology consultingConcurrency (computer science)Cartesian coordinate systemSet (mathematics)Shared memoryProduct (business)TwitterLevel (video gaming)Response time (technology)Dependent and independent variablesParsingQueue (abstract data type)Set (mathematics)Stack (abstract data type)Variable (mathematics)Engineering drawingComputer animationXML
XML
Transcript: English(auto-generated)
We're doing some stuff at our booth if you haven't come by yet. We're doing a cool thing if you go to HerokuLove.com. You can vote for your favorite open source project and we're going to donate $500 to that project and there's a few just Ruby trend questions. I think there's four of them in total. Here's the QR code but I don't actually expect you to scan that with your phone. We're also doing a thing, so Ruby Heroes is not happening
anymore but I did enjoy the spirit of saying thanks to people in the community that helped you with your journey as a developer and so we have these cool postcards at our booth where you can write thanks and either give it to the person if they're here at RailsConf
or you can just post it up on these whiteboards that we have at our booth and then we will either tweet them or figure out a way to make that public. And right after this talk there is a break. There's going to be a bunch of people from the Rails core and contributor team at our booth doing office hours so if you have questions or want to meet those
folks like Aaron or Eileen or Raphael, Bitbol and other people, you can come do that, get those questions answered. I know a lot of people came by and tried to get shirts and we ran out within like the first 30 minutes, maybe even less, but we will have some more shirts tomorrow so if you do stop by tomorrow we hopefully will have shirts for you. So
with that I'll give this to Nate to take away. Thank you. Thank you. Alright, so this is Heroku's sponsored talk. I don't know if this is on. I'm on here. So I do not work for Heroku. I'm not a Heroku employee. But they were very nice
enough to give me this slot. This talk is called Your App Server Config is Wrong. When I talk about application servers, what I'm talking about are things like Puma, Passenger, Unicorn, Thin, Webrick. These are all application servers. They're the
first ones. First, a little bit about who the heck you're listening to right now. I am a skier. I recently moved to Taos, New Mexico just for the skiing, basically. I also am a motorcycle rider. I've ridden my motorcycle cross country three times on
dirt roads. This is my motorcycle taking a nap in the middle of nowhere in Nebraska. I was also on Shark Tank when I was 19. I was on the very first season. That's me on Shark Tank. One of my readers gave me this gift. I enjoy it very much. I'm also a part-time meme lord. I like to make spicy programming memes. I like this
one. Another spicy meme I made. You probably know me, though, not from any of these things, but through my blog. It looks like this. I write about Ruby performance topics like
making Rails applications run faster. I also have a consultancy that I call Speed Shop where I work on people's Ruby applications to make them faster, more performant, use less memory, use fewer resources. I have written a book, a course, about making Rails applications
faster. It's at railspeed.com. It's called The Complete Guide to Rails Performance. Incorrect app server configuration is probably the most common issue that I see on client applications. It's really easy to kneecap yourself by having an app server config which
isn't optimized. It's easy to overprovision. It's easy to have an app server config which makes you require more dynos, more resources than you actually need. It's very easy to spend a lot of money on Heroku, which is great for them, but it's easy
to scale out of your problems by just cranking that little dyno slider all the way to the right, and now I don't have a performance problem anymore. If you're spending more per month on Heroku than you have RPM, you probably are overprovisioned in this case. You don't have to spend $5,000 a month on your 1,000 RPM app. Maybe if you have
some really weird add-on that is something unique to you, maybe then you have to, but that's just kind of a rule of thumb that I've found and been able to get to that point at least on client apps, if not less than that.
The other thing that can happen with a misconfigured app server is you're not using your resources to the, you're overusing your resources. You're using too small a dyno for the settings that you have set. Let's talk about some definitions. Container. I use the words container
and dyno interchangeably because that's kind of what a dyno is, right? It's a container in a big AWS instance or whatever they use, and you get some proportion of their
larger server. This is a Heroku talk, so I'm going to be using Heroku terminology. I'm going to say dyno, but a lot of this stuff is not unique to Heroku. I'm just going to be discussing it in Heroku terms. A worker. In Puma, which I'm now a Puma maintainer
with Richard. In Puma, we have workers. I don't know what they call them in passenger or unicorn. They might use a different word. Basically, all of the top three modern Ruby application servers use a forking process model. What that means is is that they start
your application, railsapp.initialize or whatever, and then they call fork. That process creates copies of itself. Those copies are what we call the workers. That's part of the main config setting is how many processes we're going to run per dyno. A thread. I guess
we all kind of know what a thread is, but I just want to draw the difference here because it's very important in regular CRuby. The difference between a process and a thread is processes run independently, and so two processes can process two different requests
at the same time, but two threads cannot process two requests concurrently. The other thing is we can do things like start a request, start creating a response in one thread, and then maybe we're waiting for a database call to return, and we can release the global
VM lock in Ruby, then pick up another response in a different thread, do some work there, and then go back to the original thread. We can do some limited concurrency in Ruby. It's all usually just IO, but in general, one thread, one request.
Here's the overall process, and we're going to go through each step one through five. The first thing we're going to do is determine theoretically how many concurrent workers, how many requests do we need to complete concurrently. Secondly, we're going
to determine how much memory each worker slash process is going to use, then we're going to choose which container size we want to use, so which dyno size we want to use, and how many workers, how many processes we're going to put in each dyno.
We're going to check our connection limits, how many connections we make to the database, make sure we're not going over those limits, and then we're going to deploy and monitor queue depths and queue times, CPU usage, memory usage, how many times our processes restart, and how many times we have timeouts. This is a little hobby horse of mine.
Little's Law is a concept from queuing theory. It's used a lot in factory management, so when they want to know how many packer machines they need on the floor, they use things like Little's Law. It's a very small equation, which is why it's very small on this slide.
This is the fancy, like, Greek letter version. If you can Google Little's Law to, like, get the, you know, the process engineering version of it, the version that we're going to use here is just to say that the number of things, the number of things inside a system at any given time is equal to the rate at which they arrive multiplied by the
time they spend in the system on average, okay? So, translating that into Ruby application server terms, the number of requests that we serve at any given, are serving at any given time is, on average, the number of requests we get per second times our average
response time, okay? To give you a little, sorry, yeah, and dividing the average number of requests in a system by how many workers we actually have gives us an idea of how much we're utilizing the workers that we have. So, I'm gonna work through, if that was a little confusing, I'm gonna work through an example here in a second. It's
important to know this is just on average. It kind of assumes that our requests are arriving at equal intervals. It assumes that, like, a request will arrive every 300 milliseconds. That's not the case. Of course, we know requests arrive in bunches. They're randomly distributed. So, this is just sort of a starting point in the guideline. So, let's walk through
some numbers here to give an example. I found these numbers in an old Envato presentation from 2013. Envato runs, like, Theme Forest, if you ever use that. It's a big Rails app. So, they say that they receive 115 requests per second, which average a 147 millisecond
response time, and they use 45 workers, 45 processes. I forgot which application server they use, actually. So, what we do is we multiply the number of requests per second, 115, by the average time it takes to complete run requests. So, I have to keep
my units the same here, right? So, this is in seconds, and that is now in seconds, and that gives me 16.9. So, on average, Envato is processing 17 requests at any given point in time. They use 45 workers to do that, 16.9 divided by 45, 37%. So, they're using
35% of their workers at any given time. So, what I tell people to do is to do this calculation for themselves. You know how many requests you get per minute. That's know your average response times. That's also on the Heroku dashboard. Multiply them
together, and multiply that again by a factor of five, so you're using 20% of your theoretical capacity, and that gives you your initial estimate of how many processes you need, okay? Five is just the fudge factor. That's taking into account the fact that your requests don't come in uniformly, one after the other, 200 milliseconds apart, or whatever
your number is. If that was all very confusing, I find Heroku's dyno load number on their dashboard to be fairly accurate as a starting point. So, this is at the bottom of the dashboard.
This is impossible to read. So, these numbers here on the left go from zero to eight. The dark blue line here is the average load over one minute, and the lighter line here is the maximum load for the last minute. Just look at that max number, and so it looks
like on average here, my max load is five dynos. So, run it at five. Okay, fine. Of course, this does take into account the fact that it's running, it needs five dynos with whatever your config is at this particular moment, but it's just a starting point.
What you'll probably find with dyno load, and what most of my clients find is that this number is a lot lower than the number of dynos that they actually use because their app servers are not configured correctly. So, we'll get into how to fix that. So, that's step one, estimating our work account. So, we know how many processes we
need. We need 45 processes to serve our load. So, how do we divide that among containers? Do I want to use a 1X dyno, 2X dyno? Now you have perf dynos, perf M, perf L. What's the right choice? So, I find most people mess up with container sizes because they
have an incorrect mental model of how Ruby uses memory. What most people think application memory graphs should look like in Ruby is like this. They should look like a flat line.
We've been duped! Doped! Bamboozled! We've been smeckledorfed! That's not even a word and I agree with you! That's not true. They look like logarithms. So, a regular application, a regular Ruby application will look like, their memory usage over time will
look like this. It'll have a pretty steep start-up period. This is when we're requiring code, building out caches like active records, statement cache, and a bunch of other things like that. We're creating these long-lived objects and then after a while, it'll start to level out. But it never goes flat and I don't want you to think that it ever will.
This is probably partly why Heroku restarts your dynos every 24 hours, because if they just let them run forever, this line would just eventually go on forever. It doesn't mean you have a memory leak. If memory usage isn't flat, that doesn't necessarily
mean you have a leak. But you just need, and I'll talk a little bit more about that in a minute, but you just need to be aware that that line never will completely level out. So, you're gonna have to use a little bit less memory than the max of your dyno. You're not gonna be able to run it right next to 100%. You're gonna have
to give it some more headroom. A common mistake I see here is to use things like Puma Worker Killer and then give it a RAM number and say, kill my Rails process when it's more than 300 megabytes. And if you set that number too low, your memory graph instead of looking like that long red line, it looks like this. It looks like this purple
stuff. And you'll see that sort of like, it goes up to here, it kills itself, it goes back down, kills itself. And people see that memory graph then, and they think, wow, look at that. It's that sawtooth pattern. I must have a memory leak. But really what's happening is they're not letting their processes live long enough to get
to that stable point. People sometimes use Puma Worker Killer as like a faster restart so you can also give Puma Worker Killer like a six hour limit and say restart my processes every six hours. That can also produce this kind of, this memory graph as well.
So what I'm telling you is, let your process run for 24 hours. And if you have to tune the number of processes per dyno down to do that, do it. And just as a temporary thing, you know, tune web concurrency down to one, let that process run out and see what it looks like after 24 hours. And you're gonna have to, you know, run more
dynos. But see what it looks like after 24 hours. If it does this, if it eventually starts to level out, that's the real number of how much memory you need per process. So deploy with one extra 2x dynos, one worker per dyno, five threads per worker,
and look at the average memory usage after 24 hours. The average app will come out to about 256 megabytes all the way up to 512. So if that's the number you're getting, you know, that's average. 512 is not great. But that's kind of what happens
with big, old, mature Rails apps. They use a lot of memory and that's what you get. There's really no magical way to reduce that number. I have another, if you go back, I have a RubyConf talk that I gave RubyConf this year about reducing memory usage, but
there's no magical way to do this. It's a long and hard process. Okay, so that's step two. We determined how much memory we use per process, per worker. So now, how do we decide what size container to put it into?
They should feel nice and comfy in their dyno. You should be sitting at 80% memory usage
in that dyno. It should be just right, not hitting 100% and starting to swap, but just sitting at, you know, four-fifths, two-thirds memory usage is the total capacity of your dyno. So these are the main dyno types that you're going to use in production. I didn't include hobby and free for obvious reasons. So the main difference is that most
Rails applications are going to care about is the memory, right? So you can read the numbers here. I'm not going to read them out to you. Because Heroku dynos are shared kind of like a VPS, although a 1x and 2x dyno technically have the same count of CPUs,
the 2x dyno gets two times the amount of time, the amount of CPU time, and so on and so forth with perfm and perfl. So the perfm dyno should have, you know, 12x, well,
I guess 1, 2, 3x the CPU capacity of a 1x dyno. Although, from what I understand from what Terrence told me, so blame him if this is wrong. It's kind of interesting here.
2x dynos and 1x dynos have access to eight hardware threads. The perfm dyno only has access to two. So that's kind of like an interesting weird difference between perfm and all the other dynos. Although perfm does have more share of that time than 2x. And
the whole reason perf dynos exist is because you do not share CPU time with other people's Heroku apps. So you should get more stable performance from a perf dyno because you don't have someone else's, you know, badly tuned Rails application sitting alongside
it on this whatever server is actually backing it and crowding you out of the CPU time. Another interesting thing that I noticed when comparing perf dynos to the 1x and 2x is that
the perfm dyno, it does cost $250 a month, which makes it a little bit less cost effective than the other dyno types. Perfl dynos are just as cost effective in terms of like dollar per compute unit and dollar per RAM gigabyte as the 1x and 2x dynos. With perfm you take a little bit of a hit. And I already talked about 2x dynos have eight CPUs.
Which might mean they can support higher thread counts than like a perfm dyno. We'll get to how to set thread counts in a second. So if you have more than 25 app instances, if based on Little's law you need more than 25 processes, I would recommend using
perfl. The performance dynos do get more stable consistent performance than 1x and 2x because they don't share the server with anybody else. Otherwise try to use 2x. The reason you don't want to use 1x is because you should be aiming to have at least three
workers, three processes per dyno. If you can't fit three workers inside of a 2x dyno you might have to use perfm. The reason that you need three workers per dyno is because of the way Heroku does routing. So requests can be routed to any random dyno in your application.
I'm sorry, any random dyno in your, you know, formation I guess. If you only have one worker per dyno, if Heroku randomly routes a request to that dyno and that worker is already working on someone else's request, it's going to sit there and it's going to wait until that request is done. This kind of goes back to an old queuing theory
thing where instead of having at a grocery store, instead of having multiple checkout lines, you know, like at Walmart or whatever, you have 10 checkout lines. It's more efficient to have one line and then multiple people at the checkout, like the way Whole Foods does it. You remember when you were Whole Foods? So the more workers we have per
dyno, the more efficient routing we can get out of Heroku. So generally I've found that if you have at least three workers per dyno you're maximizing your routing performance. I just said that. If you're struggling to fit three workers in a 2x dyno, you can try reducing thread count. If you have Puma or Passenger Enterprise, if you
have a multi-threaded application server, reducing the thread count to three if you're running high thread counts can help. Or you can use J-Malock. Sam Safran at Discourse has been sort of the pioneer in using J-Malock for production Ruby applications. You can
Google him and read about it, read about how to do it yourself. It can sometimes reduce memory usage by 5, 10% and give you that extra little bit of headroom to squeeze into a 2x dyno. There's a J-Malock build pack, which I helped to maintain, so you can do this on Heroku. If you search J-Malock build pack Heroku, you'll find it and learn
how to use it. So if you have a bit of knowledge on how to use a J-Malock build application server management, you might think that the maximum number of processes you should run per dyno should be equal to the core count. You shouldn't run nine
processes if you only have eight cores. Because in theory, we can only run eight processes at one time on an eight core machine. What I found in production is that is not really the case. It can, applications can really benefit by having worker counts that are three to 4x the amount of cores available. So on a Perf-L dyno, I know Product Hunt
is a Rails application. Product Hunt runs 30 to 40 workers on a Perf-L dyno, which is 4x the amount of cores available. And they also run some like node processes in the same dyno. So there's tons of stuff being competing for the CPU time. But for
whatever reason, I don't know if it's just a lot of waiting on IO. But don't restrict yourself if you've ever heard that advice before to processes must equal core counts. It can be three to 4x that number. Key thread counts to three to five. This is now, the
way we set this now is Rails max threads, right? More threads per process than five tends to just fragment memory too much. It's also really difficult with high thread counts to keep yourself under the connection limit. So for Rails to connect to your Postgres
database, for example, each thread needs its own connection to the database. So in general, we keep the amount of threads we have per process equal to the size of the
worker. If you have like 20 threads per worker and now your connection limit for your database is only 100, it's really easy to outstrip that connection limit really quickly. So I found that thread counts of three to five offer a really good compromise between
processing requests concurrently, keeping connection limits with out of reach and avoiding memory fragmentation. How do you know if my app is thread safe? I get this question all the time because people don't, you know, are afraid of Puma or afraid of making
their app multi-threaded. So what Evan, the maintainer of Puma recommends people do is just to start slow. Just try two threads. If things start breaking, you can just change that config bar back and pretend it never happened. If you use mini test, you can try mini test hell, which you just require mini test hell at the top of your test helper
and it will run each test in a new thread. So if that doesn't break things, you're a god. And at the end of the day, if you're running MRI, it's probably fine. I don't see many people running into actually like weird multi-threaded bugs and if they do,
they kind of, they know it's their fault. They're like, oh yeah, I probably shouldn't have used the redis global to like, you know, in this controller or class level state like user.current or class variables. Generally they find it, they're like, oh yeah, that's like really obvious. I should have realized that. And the other thing I hear is like,
oh, but I don't know if my library is thread safe and like same thing. I know as a library author, I really pay attention to thread safety and make sure, I go through our code, you know, to make sure that it's thread safe. Because in MRI, Ruby code must be any time you execute Ruby code that happens with the GBL, the GIL around it, it's
actually kind of difficult to run into a threading bug. It's not to say it doesn't happen and it is annoying, but don't be so afraid of it that you don't even try it. Okay, we've got our container size, we've got our worker count now. So let's make
sure that we're not going to run over our connection limits. Things that use connections, ActiveRecord, ActiveRecord's DB pool, you probably have connections between your Dinos and Redis, maybe Memcache. I think most of the Memcache add-on providers for Heroku,
excuse me, don't limit connections, like I don't think Memcache EA, I think it has unlimited connections. Redis To Go used to like really heavily limit them, but some of the newer Redis providers don't really limit them so much. Postgres is really your database is really the main connection pool that needs to be watched because those
limits are very easy to hit. You change that in database.yaml, you need one connection per thread, that's the default I think in the database.yaml that gets generated, and I already talked about Redis and Memcache. You may need more than one database
connection per thread. If you use things like Rack Timeout, which most people do on Heroku because of the 30-second limit, what can happen is Rack Timeout can raise while we're waiting on a Postgres query to return, and when it raises, that connection
can get lost. So you may need to have up to double the amount of database connections per process, then you have threads if that's a problem for you. You'll know that's a problem if you're getting errors that say like active record is spent too long waiting
for a connection or doesn't have one available. These are the Heroku Postgres plans and how many connections they support, and after standard for the larger sizes, it's all still limited to 500 connections. You need more than 500 connections. For an example on Heroku Postgres, Heroku provides, I think it's a build pack, right? So the PG Bouncer
build pack, which you can add to your app, which will pool these connections for you, and you'll be able to share a smaller amount of connections per process than you actually have threads. So just do the math to figure out how many dynos would outscale
your connection limits. So as an example, if I have a Perf L dyno with 20 app workers, and each of those app workers has five threads, that's 100 threads and 100 DB connections. So if I have five dynos, that's 500 connections, and I've hit my, you know, standard for Heroku Postgres connection limit. So now we've checked our connection limits, we know
how many, the maximum number of dynos that we can scale to before we hit our connection limits, and we're ready to deploy. So here are some things to watch after deployment. Watch memory. This is a pretty typical pattern that I see, is memory usage is fine and
then blows out when someone hits like the CSV export controller. It looks like that, yeah. So that swap, that's really bad. That dark purple swap, you don't see that. That means you're using too much memory and you need to back off the number
of processes per dyno. Now when you have, this is not a memory leak, it is a fat action. The only way you can really track it down, if you're not, if you're seeing a curve like this where it's flat and then something, some action that someone used, you know, blew
it out to double that number, there's, you got to install an APM that does memory profiling. So New Relic does not do this very well for as much as I love New Relic for everything else. Skylight, Scout are both commercial services that have memory profilers in production and they can tell you, hey, this controller action allocates
18 million objects. And you can say, that's really bad, I'll fix it. An open source alternative is Oink. And Oink basically writes to your logs and says, this action did XYZ memory things and then Oink has like a log parser that will give you some
statistics about what controllers allocate how much. So if you're running out of memory, scale down web concurrency, that's the way Heroku has this all set up by default. If it's, if you're not using, you know, 75 percent of the amount of available RAM, you can scale up web concurrency. You can also tweak thread counts. So fewer threads will
use less memory. You may think that because all a thread, because threads technically share memory, right, so all you need to create an additional thread is just eight megabytes of stack. But the way that malloc works, and this was something that changed
in the C or 14 stack, is it allocates what's called arenas to each thread when they conflict. And at the end of the day, all it really means is glibc malloc can have really bad memory fragmentation for a high thread count or very highly multi-threaded
programs. You can control that with the malloc arena max environment variable. I can't get too much into detail about this because I'm running out of time. But if you just Google this, like malloc arena max Heroku, Terrence wrote a really good explanation of what it is, how to tune it. This is really only relevant for like people that
are running high thread counts or maybe your sidekick processes which run 25 threads or whatever. Or J malloc, which I talked about earlier, tends to do a good job of this. This is a customer example, a client example of tuning malloc arena max on a sidekick process. So they had a sidekick process that would balloon from 256 megabytes to a gig over
24 hours. That's really bad. And then right here, he changed malloc arena max to two and it almost completely stabilized his memory usage. Watch queue times. New Relic will tell you how much time on average a request spent queuing, so how much time it was not
actually being processed. Less than 10 milliseconds is good. More than that is bad. If you have high queue times, that just means you need more dynos. That's the time when you want to scale up. CPU usage. If your CPU usage is low, you may benefit from a higher thread count. Restarts. So if you're using PumaWorkerKiller, if you have to because you
have a leak and you can't fix it, you need to be watching how often those processes are restarting. What I find is that some people install these killer, automatic killer tools and then they don't know how often it's restarting. And it's like restarting
every other response, every request. That's really bad. You're going to really hamper the performance of your application if your processes don't get to live very long. At least six hours between restarts is a good goal. And if you can't, if PumaWorkerKiller is or whatever is killing your processes that quickly, you need to change those settings
or use a bigger dyno. So timeouts. So we all know Heroku has this 30 second timeout where if your application takes longer than 30 seconds to respond, it basically gives up on you and says, I will not return this response anymore. So we have things like rack timeout to fix that. If you have a lot of controller actions which tend to timeout
frequently and you don't have time to fix it, a good band aid is to change to a dyno formation where you're running more workers per dyno. So as an example, I had a client that had some controller actions which took like 10 or 15 seconds to complete.
It was like admin stuff. And what would happen is a bunch of these requests would come in, like one after the other, and they would back up all the other requests behind them. So dyno would take 15 seconds to like do this admin action thing and then
like a bunch of requests would pile up behind it. So now all of those requests now take 15 seconds plus whatever time they would take normally. If you have problems like that where you have these 95th percentile times which are really high, you're going to benefit from having more workers per dyno. And that's because while Heroku will route randomly
to whichever dyno it wants, your application server will not. They all work differently here. Passenger probably has the best model for this. But even Puma will do a better job of routing requests to open dynos which don't have any, sorry, open processes which don't have any work to do. So this customer, they were running two X dynos. I put them
on per failed dynos and they almost completely got rid of their timeouts and reduced their average response time by like 20%. Oh, this is not big enough. But you can also, so you probably have rack timeout. But Puma has a setting called worker timeout. And in passenger,
it's passenger max request time. I don't know what it is in Unicorn. But you can kill the, you can actually just kill the process after a request has taken a certain amount of time. In Puma, we do this by default. It's 60 seconds. And passenger, it's, they
don't turn this on by default. You have to turn it on yourself. In pass, if you are using passenger, I do suggest you turn this on because your requests probably don't need to take a minute. And if they are, you might as well just give up. So this is it. That's the process. Those are the steps. This is the slide you probably want to take a picture of. I'm Nate Birkapek on Twitter. I'm going to tweet these slides
out in, as soon as I'm off the stage here. And my, the website of my blog slash consultancy is speedshop.co. Thank you very much.