Operating Rails in Kubernetes
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 88 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/37326 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2018 | |
Production Place | Pittsburgh |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
RailsConf 201834 / 88
9
14
16
19
20
22
23
26
27
28
34
35
36
37
38
39
41
42
46
47
53
57
60
62
63
64
69
72
80
85
87
00:00
Human migrationShared memoryIntegrated development environmentBitProduct (business)Mobile appInterface (computing)DiagramComputer animation
02:26
Server (computing)BefehlsprozessorBitSemiconductor memoryMathematicsComputer animation
03:34
Food energyState of matterScheduling (computing)Channel capacityBefehlsprozessorComputer animation
04:01
PlastikkarteServer (computing)Dynamical systemPower (physics)Scheduling (computing)UsabilityPlastikkarteProcess (computing)Execution unitSemiconductor memoryMobile appElectronic mailing listInstance (computer science)Service (economics)Computer animation
05:56
Mobile appInstance (computer science)NumberProcess (computing)Web 2.0Computer animation
06:28
Repository (publishing)Process (computing)CountingMobile appSoftware repositoryServer (computing)Line (geometry)Function (mathematics)Shift operatorConfiguration spaceCodeCartesian coordinate systemComputer fileDynamical systemComputer animation
08:47
Server (computing)Configuration spaceFunction (mathematics)Scalable Coherent InterfaceSanitary sewerDivisorBitConfiguration spaceComputer fileServer (computing)Computing platformSubsetProduct (business)MathematicsNumberInstance (computer science)SoftwareComputer architectureDifferent (Kate Ryan album)Service (economics)Computer animation
12:15
FaktorenanalyseKeyboard shortcutConcurrency (computer science)Parity (mathematics)BlogService (economics)BuildingProcess (computing)Radical (chemistry)Scale (map)Data modelType theoryWorkloadHuman migrationDatabaseMathematicsCodeMobile appCodeWeb 2.0Process (computing)Server (computing)BitHuman migrationMereologyScaling (geometry)Revision controlDatabaseRow (database)IdempotentSoftware developerDesign by contractConcurrency (computer science)2 (number)Medical imagingHookingMathematicsProduct (business)Computer animation
19:00
Mobile appFacebookMereologyComputer animation
19:33
Integrated development environmentMIDIHill differential equationFaktorenanalyseHuman migrationNormed vector spacePoint cloudProgrammable read-only memoryWeb pageVolumeSoftware repositoryComputing platformMobile appRadical (chemistry)Human migrationProcess (computing)Arithmetic progressionCodeConfiguration spaceMathematicsProjective planeCommon Language InfrastructureComputer hardwareSoftware repositoryIntegrated development environmentMereologyKey (cryptography)HookingDivisorSoftware developerScheduling (computing)Computing platformVariable (mathematics)Server (computing)EncryptionOpen sourceBefehlsprozessorExecution unitChecklistDescriptive statisticsLine (geometry)Computer filePhysical systemRobotMultiplication signDeclarative programmingComputer animation
24:41
RobotMobile appSoftware developerWorkloadConfiguration spaceArithmetic progressionMobile appSmoothingSoftware developerDirection (geometry)Scheduling (computing)Point cloudComputing platformRoboticsComputer animationLecture/Conference
26:32
Multiplication signGame controllerMobile appConfiguration spaceProduct (business)RoutingTerm (mathematics)Abstraction
27:52
Virtual machineComputing platformIntegrated development environmentComputer animation
28:39
BitTwitterInternet service providerHuman migrationCodeConfiguration space2 (number)BlogProcess (computing)Different (Kate Ryan album)Key (cryptography)Software developerSlide ruleData managementLecture/Conference
31:45
Coma BerenicesBlock (periodic table)Data typeXMLComputer animation
Transcript: English(auto-generated)
00:16
My name is Kir, and today I'll talk about running Rails in Kubernetes.
00:22
I work at a company called Shopify, and for over the past year or so, we've moved hundreds of Rails apps within the company to Kubernetes, as well as our main monolith, which is also known as one of the largest
00:41
and oldest Rails apps in the community. We learned quite a bit about running Rails efficiently in Kubernetes, and I decided to make this talk to share some of the things that we learned. So today we'll start from getting a quick intro
01:03
into Kubernetes for those who haven't been exposed to it yet. Then we'll talk about what makes Rails a bit special in running it in orchestrated platforms, like Kubernetes, and then I'll share some of the things
01:20
that helped us to migrate all these apps. First of all, please raise your hand if you ever played with Kubernetes or container orchestration. Oh, it's quite a lot. So in 2018, almost everyone agreed
01:41
that containers are awesome, because they provide this universal interface for any app to run it in basically any environment that you want. But the problem of running and scheduling containers is still there.
02:04
You need to run these containers somewhere. Just as a note, I'm not going to talk about containerizing Rails, because there'll be a great talk tomorrow at 3.30. If you're interested in hearing about containerizing Rails itself, please enter this talk by Daniel,
02:24
and I'll talk about running it in production in orchestrating containers. So you have the container with the app, and you're going to run it somewhere.
02:40
However, in the static world, where servers are configured with something like Chef, you would have a bigger server that would handle more fat containers that require more memory and CPU. You would have a server with a bit less memory,
03:02
and you would decide that you would run some other containers there. So all of that is, all that math is done by humans, and assigned by us. Maybe configured with some scripts, but it's still pretty manual. And if we think about this process,
03:21
there is actually quite some resources can be wasted, because there would be still some CPUs and used left, some memory left, and nothing really stops us. Well, the desired state would be that every CPU is used,
03:43
and all the resources are efficiently scheduled, so that we would achieve the same results, have the same capacity with less resources consumed, and save some energy.
04:01
What Kubernetes solves is efficiently scheduling the resources on your servers in a very dynamic way, bean packing the containers that you want to run in the very best way. So if we want to define it in just one sentence,
04:23
it's smart container scheduling for better utilization. Two things here, it's scheduling, that I want to emphasize, because you no longer have a defined list of servers that you bootstrap.
04:42
It's all scheduled dynamic. If one server crashes, or the power dies, the same unit of work would be rescheduled on another machine, and you wouldn't even notice that. The second, about utilization, to make the best use of all the resources
05:03
that you have, which is especially important as you grow, because you would have more servers, more unused CPUs, more unused memory left, which of course you don't want to just sit there.
05:24
The next step, I just wanted to get some shared vocabulary, and to talk about the concepts that Kubernetes brings. First, the very basic concept is a pod. Pod is basically a running container,
05:41
one instance of something. So if we run one process of sidekick, it would be just one pod. And obviously, one instance of something is not enough to run a whole app or a service. So we come to the next concept, called deployment, which is a set of pods.
06:05
A typical app would have maybe two deployments, one with web workers, and another with job workers. The number of instances in the deployment, the number of pods, is very dynamic.
06:21
It can be adjusted. You can scale it up, you can scale it down. You can even set up autoscaling. If you ever worked with Heroku, you probably remember these concepts of dynas, and the dyna count that you can adjust and scale up. It's the same with the deployment in Kubernetes,
06:42
which you can scale up and down. This all sounds great, but how do you actually tell or describe all these resources? If you use Chef or Cabistrano, you've probably had a Ruby DSL. And as any DSL in dynamic language,
07:03
it comes with good and bad things. Good, it can be very expressive. You can describe lots of things there. But sometimes it comes as a disadvantage too, because you can do basically anything that you can do with Ruby. And sometimes you want a DSL to be as minimal as possible.
07:28
So Kubernetes leverages YAML files as a way to describe resources. You would have a YAML config of maybe 20, 30 lines of a resource.
07:40
This is just an example of a config for Rails app. Then you would apply that config to a Kubernetes cluster and store that same YAML file in the repo, which I think is a great benefit, because it's just a couple configs stored in the same repo
08:00
not in another repo with cookbooks or whatever. At least for me and some of the people that I know, this came to some kind of shift in the mindset. Because we had to move from controlling servers
08:23
when we deploy code applications and new apps. When you deployed resources with Chef or with Capistrano, at the end it was just sequentially applying commands
08:42
by SSH and controlling servers. You would always have an output of exact SSH commands and see what's going on, see what fails, see what commands are stuck, and so on. With Kubernetes it's quite different
09:02
because you just take a YAML file and tell Kubernetes to apply it, and then that is the desired state, which will be rolled out there in a few seconds or in a minute if you applied a very big subset
09:20
of configuration or resources. It would take maybe more. But you have, at least me, I had to move on from this concept of controlling servers, exact machines, to describing configuration. If we take controlling servers, it would be running commands remotely,
09:42
comparing their output, when in contrast, when you describe the configuration, you just push it and then poll for it to apply, which comes with the advantage of being abstracted from physical machines, which is great for things like self-healing.
10:02
If one server goes down, the same work would be rescheduled somewhere else. While if it's controlling servers manually, it can be not very prone to failures. For instance, at Shopify, we have Capistriano config with more than 100 costs,
10:20
and eventually, once a couple months, some cost would die just because it's too many servers. This wouldn't self-heal if it was a configuration described with orchestrated containers. And yeah, if we talk about tools and technologies,
10:44
example of controlling servers is Capistriano and Chef, and in contrast, platforms like Kubernetes and Mesos let you describe the configuration, describe the desired state, and the platform would roll out the state for you.
11:13
So containers, Kubernetes takes a container and runs it for whatever number of instances that you specified,
11:21
and it's very easy to run a plain container, but Rails eventually is a bit more than just a process. Many Rails apps work as a monolith with many things embedded into them that makes them sometimes quite special to run as a simple container.
11:42
One thing, if you use Heroku, you probably were familiar with the concept of 12-factor app, which is a methodology for building software as a service apps that promotes declarative formats, that promotes minimizing difference between production and development,
12:03
and apps that follow the 12-factor manifest, they are usually easy to scale up and down with no significant changes to the architecture. As you have guessed, there's 12 factors,
12:20
and we'll go through a couple of them that are, I think, that can be sometimes be forgotten when we work on Rails apps, but they're nevertheless quite important, especially if you want to run the app in Kubernetes successfully. Okay. One of them is disposability and termination,
12:42
which, in other words, is what happens when you want to restart or shut down a process. For something like web requests, it's as easy as waiting for the request timeout. If you know that a request
13:00
will not take longer than 30 seconds, you stop accepting any new requests and just wait for 30 seconds, and then you're safe to shut down the worker without losing any live requests. Same about background jobs. You have to wait for the current jobs to terminate,
13:22
and then you're safe to shut down the process without losing any work that is going on. However, this might be a bit trickier for long-running jobs. This is one of the example of a very simple job that can become long-running. If you have, in this example,
13:42
it iterates over some records in the database and calls a method on active record. If you have just a few users, this job would complete within seconds, maybe a minute. But as it grow to a size of us, we have millions of records in a row,
14:02
and we've had jobs that were very similar. They were doing similar things as this example, and it would take them weeks to iterate over all records and do something with those records.
14:22
So how do we shut down these workers? We must keep in mind that workers that are long-running, they will be aborted and re-enqueued, which in this example would mean that this job can be maybe aborted in the middle,
14:40
and then it will be re-run again, which is essentially what Setq does. And here we come to the concept of idempotency when the code that is called there should not process the extra side effects,
15:03
and be safe to be executed more than once. Another aspect of 12-factor apps is the concurrency that allows your app to scale with the process model.
15:20
They have this illustration which shows that you have web workers and some job workers which you can scale up and down, and to be able to successfully scale these workers, they should not share any kind of resources together, because if they all had a bottleneck
15:45
of just one shared resource, they would not scale very successfully. Talked a bit about 12-factors. Some things about Rails to know when deploying it to Kubernetes.
16:02
First is assets. When you use something like Capistrano, it would probably run assets pre-compiled on every server that you wanted to serve requests from, which was a bit of a waste of resources if you can pre-compile assets only once,
16:21
and then distribute that kind of image on all servers, instead of pre-combiling them on each server. So the efficient way of doing that is to embed assets into the container with the app, so that when the app starts,
16:42
it already got all the dependencies like assets. Another part that can sometimes get a bit messy is database migrations. In the Rails community,
17:01
we're very much used to migrations as a part of deploy, maybe as a hook at the end of deploy. You deploy the code, and then you apply the migrations right away. This step of the deploy process makes the deploy a bit fragile, because what do you do with the code change if the migration failed?
17:21
Do you roll back the code, or do you keep running it? If you rolled it back, you already had the new code in production for like 30 seconds or a minute. It might not be very safe to roll it back. So we try to avoid migrations as a part of deploy,
17:42
and make developers to write the code that is compatible with both old and the new schema, because at the middle of the rollout, you would always have some workers on the old revision, and some workers on the new revision. We try to make the migrations asynchronous,
18:04
which helps developers, which helps to establish this contract with developers that the code may run on both versions of this schema. So instead of changing code and applying the migration as the same step, the first step could be add a migration,
18:21
add, for instance, that adds a column, and only then you would update the code to interact with the new column, when you'll be sure that all the schemas are having that new column.
18:43
Usually, these asynchronous migrations, they would be applied in a few minutes after the deploy, which gives us, which we make a bit easier for developers by announcing that in Slack, and giving them a notification when their migration is applied.
19:02
Another part of Rails is Secrets, which is, which, well, I think none of the modern apps run kind of isolated. Basically, every app now would interact with some kind of third party API that can be S3 buckets,
19:21
or Facebook API, which, and all these third parties and APIs require some tokens, API keys, which Rails has to be aware of. One approach is Secrets in the environment variables,
19:43
the approach that Heroku promotes. This is very easy, but as you grow, you would have hundreds of tokens, and you probably don't want to run the app with hundreds of env variables that the app is dependent on.
20:01
You may think about putting secrets right into the container with the app, which is not the most secure approach that you can take, because anyone who gets the container also gets the secrets. Fortunately for us,
20:20
Rails 5.2 ships with the credentials feature, which allows you to put encrypted secrets, credentials, right into the repo, and edit them, and it's fully safe to commit and store them in the repo. All you need to read and change them is the Rails master key.
20:43
And as a result, you run the container with just one environment variable, which is the key to the rest of Secrets. To recap, following 12 factors helps it easier to run Rails apps in orchestrated environments,
21:04
and being mindful about worker termination also helps. Migrations as a part of deploy, as a hook after deploy, can be fragile and make the rollout process not very safe,
21:21
so synchronous migrations can help solving that. credentials that ship with Rails 5.2 make the process of sharing keys a bit easier. At Shopify, we've had hundreds of apps
21:43
running in different environments. Some of them were in Heroku, some of them were in AWS, some of them were on physical hardware managed with Chef, and what we wanted for our developers is to stop
22:00
being exposed to all that infrastructure and just have a platform to run Rails apps somewhere. So we've decided to invest into something like Kubernetes, which would allow us to scale, to scale containers in the best way, and also to utilize the resources in the best way.
22:24
As I said, describing, as I said, if we wanted the apps to run in Kubernetes, they had to have the resources specs in YAML, which is pretty easy format. No more than 20 or 30 lines of code in YAML,
22:43
but still, we didn't want every developer to learn that YAML declaration. What we did instead is we created a bot that created a PR on GitHub based on stuff that you use in production.
23:01
If you use Sidekick, it would generate you a YAML config for that unit of work in Kubernetes, and the first item in that PR description would be a checklist that recommends to look if that config makes sense for this app.
23:23
If that looks good, just merge, and your app is ready to run. The next step is to apply the config to with the kubectl CLI tool, and if you ever tried kubectl, apply file and then the YAML,
23:40
it returns immediately, because it just lets Kubernetes know about the desired state, and then it takes the system for some time to provision all those containers to find a server that has some CPU available and schedule the work there, and that process is not very visible.
24:04
If you're used to Capistrano, you probably want some kind of progress to monitor to see how many of your servers already run that new container, and if maybe, what's the progress of the rollout, and things like that.
24:20
So we've made a gem called Kubernetes Deploy that provides visibility into the changes that are applied to the Kubernetes cluster. This is open source project. It's been adopted by other companies as well,
24:41
and just like Capistrano, it applies configuration and lets, oh, there was a little video preview, and applies the config and tracks the progress.
25:07
So robots helped humans to migrate the apps by generating YAML configs. Developers didn't have to write YAML configs anymore, and Kubernetes Deploy brought visibility
25:21
into the rollout progress. Overall, I think the steps that Rails have been taking towards running in cloud and running in container environments, just like Heroku,
25:40
these steps were in the very right direction that helps us now to run Rails in Kubernetes. This is a lot thanks to Heroku that has been pushing Rails into that direction to make that run smoothly in containers.
26:01
For us, and for many other companies, Kubernetes helps to schedule the work efficiently, save resources, and to stop caring about on which server some container has to run. At the end, it's not magic, it's just a technology that helps to schedule the work.
26:21
There are some things that you have to know about Rails and running it in orchestrated platforms to make it run smoothly. Before, it took me hours to set up a new app in production with Chef and Capistrano. I had to find an instance, provision it,
26:44
write some cookbooks, or do something else to set up all the environment, all the packets that were needed there to run Rails. Now, with orchestrated containers, it's a matter of just a couple YAMLs.
27:03
I think it becomes very standardized in terms of getting started with any app. If the app is using Kubernetes, you can just read through the resource specs and see how the deployment is organized,
27:20
which reminds me what Rails did more than 10 years ago, because before, every app has used their own structure, and it took you some time to understand how that works. Now, you can get started with any Rails app within hours, just because you know that all controllers are in app slash controllers,
27:44
and config routes has all the routes that the app has. So, Kubernetes brings this abstraction. It collapses this complexity, what David DHH talked in the keynote this morning.
28:08
You would maybe have a question, when it's worth getting started with Kubernetes, moving on to orchestrated environments.
28:22
I would say that if you want to stop caring about physical machines where something runs, if you want just a platform to run a container, that's a good solution. You can follow me on Twitter.
28:41
If working on things that I mentioned in this talk, from Rails to the infrastructure, Kubernetes sounds exciting, please hit me up, and thank you for coming to the talk.
29:01
So, the question is, what's the easiest way to organize asynchronous migrations? One way is to just add some checks for pull requests, so that developers ship pull requests separately, let's ship one PR with the migration and another PR with the code change,
29:21
because that also makes it easier to revert something if you really want to revert, and which also makes it easier to revert code and not revert the migration, because you wouldn't really want to revert the migration. And yeah, does that answer the question?
29:44
Yes, yes, how we run it. We have a recurring job that runs every five, 10 minutes that checks for any pending migrations and applies them, and that works through a background job.
30:00
I have a blog post about that. If you find, it's in my Twitter. How do we deal with stateful resources? We don't run things like my SQL in Kubernetes yet. With things like Redis, I know it's been a bit painful,
30:21
because, I don't know, Google Cloud or any other provider would diagnose that the server isn't healthy. It would reschedule Redis to another node, and it would be down for that 30 seconds while it's being rescheduled. So it's something that we're actively looking in.
30:42
I would say that that is not as smooth yet, but for stateless things, it's getting better. So the question is, do we use Kubernetes secrets to store credentials?
31:03
Yes, we do, and that Rails master key that I had a slide with, you can put that into Kubernetes secrets, and it just works very, very smoothly. You just mount it. I was surprised that it just worked. So the question is, how do we manage configuration
31:23
for different environments? By environments, you mean like staging and production? We don't have like classic staging. It's, we use feature flags, but something like canary deploys would be interested to look in.
31:40
Thank you all so much for coming. Thank you.