We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Docker isn’t just for deployment

00:00

Formal Metadata

Title
Docker isn’t just for deployment
Title of Series
Part Number
25
Number of Parts
94
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Docker has taken the world by storm as a tool for deploying applications, but it can be used for much more than that. We wanted to provide our students with fully functioning cloud development environments, including shells, to make teaching Ruby easier. Learn how we orchestrate the containers running these development environments and manage the underlying cluster all using a pure ruby toolchain and how deep integration between Rails and Docker has allowed us to provide a unique experience for people learning Ruby online.
File systemStudent's t-testProcess (computing)Cartesian coordinate systemSign (mathematics)MappingWindows RegistryLogicUniqueness quantificationPoint cloudState of matterTerm (mathematics)Mathematical analysisEvent horizonData structureGoodness of fitResultantRight angleShared memoryComputer file1 (number)WebsiteMobile appLattice (group)Server (computing)Proxy serverMultiplication signQuicksortBuildingForm (programming)Interface (computing)Product (business)Identity managementAbstractionDisk read-and-write headLevel (video gaming)2 (number)Wrapper (data mining)Dependent and independent variablesFile formatFormal languageCodeObject (grammar)Web browserOffice suiteWeb applicationProjective planeSocial classSoftware developerMedical imagingComputer architectureMultiplicationRadical (chemistry)SharewareStandard deviationFigurate numberComplete metric spaceGroup actionSound effectSlide ruleInteractive televisionLink (knot theory)Line (geometry)Physical systemSource codeStructural loadKey (cryptography)Endliche ModelltheorieCASE <Informatik>Web pageMereologyDatabaseCategory of beingRevision controlConnectivity (graph theory)Integrated development environmentFreewareReduction of orderDirectory servicePoint (geometry)System callInstance (computer science)3 (number)BitTwitterHash functionVariable (mathematics)Configuration spaceStress (mechanics)Row (database)Query languageLatent heatCodebuchVirtual machineMathematicsDifferent (Kate Ryan album)InformationPartition (number theory)Normal (geometry)Kernel (computing)Overhead (computing)Open sourceVolume (thermodynamics)BlogElectric generatorWritingSet (mathematics)Run time (program lifecycle phase)RoutingDirection (geometry)ParsingRegulärer Ausdruck <Textverarbeitung>CuboidWeb 2.0DemonGastropod shellInstallation artType theoryMusical ensembleComputer animation
Transcript: English(auto-generated)
Okay, so from the outside, in terms of how it's used, Docker looks quite a lot like a normal virtual machine.
If we think that a virtual machine allows us to take one host and partition it into multiple smaller hosts, each of those virtual machines can run a different OS, have different packages on them, different dependencies, and we can then run processes in each of those VMs, and those processes don't know that they're running in a VM,
they don't know anything about the process in running on other VMs on the same host, and they don't know anything about processes running on the host itself. And in this respect, you can say Docker's actually quite similar. We can create an image, a Docker image, which defines a base OS that we want to run, a base Linux OS. We can define packages, we can define application code,
we can define configuration changes. We can then create a container based on this image and run a process in it. And this process doesn't know that it's running in a container, it has no knowledge of process running in other containers on the same host, and it has no knowledge of the other processes running on the host itself.
There is, however, some big differences between Docker and traditional VM. When we run traditional VM, we're virtualizing the physical hardware and we're running a complete instance of that OS, including its own kernel. That means that if we imagine we're running an Ubuntu VM, if an Ubuntu VM takes, say, 500 meg of RAM
for its kernel and its base components, that VM will use 500 meg of RAM for the OS, plus whatever RAM we need for the process we're running. Likewise, when we start it up, if Ubuntu normally takes 20 or 30 seconds to start, it will take 20 or 30 seconds to start that VM, plus however long it takes to start our own processes in it.
Docker works really differently. When you run a process in a Docker container, you're actually running that process on the host itself. It's sharing the host's kernel. This means that there is almost no overhead in terms of resources to running a process within a Docker container. There's also, because we're not starting a new kernel,
we're not starting a complete OS, there's almost no overhead in terms of start time. So if we're starting, say, a unicorn web server in a Docker container, if it normally takes, say, 10 seconds to start that locally, it will take about 10 seconds to start in a Docker container. So we can kind of think of Docker as getting a lot of the benefits of a VM
without the resource overhead, which is a big simplification, but for the purposes of this talk, that's sort of what it looks like from the outside. And because of that, Docker's obviously got a lot of attention for deployment. We can run containers in development. We heard a bit about this in the last talk. We can run containers in development, and we can then run identical containers in production and be pretty confident that we're going to see
identical development and production behavior. But to me, that's not the most exciting bit about Docker. The most exciting bit to me is that because of the way Docker, because of the type of interface Docker provides to containers, we can now build features around containerization very, very easily.
So rather than Docker, rather than containers just being something we use to deploy existing features, they can actually become a part of features, and they can allow us to build new things, in this case in Ruby, that would have been much harder. Pre-Docker. So, there we go. So I'm Ben, by the way.
I'm from a company called Make It With Code, and we teach people Ruby. And we discovered really early on that one of the key reasons beginners who are using Ruby as a kind of first language quit isn't because of the language. It's because they get stuck setting up a development environment. They'll try and install RVM or RBM,
and they'll run into issues with system rubies and path variables, and they'll give up without ever writing a line of code, which is a real shame. They never get to see how beginner-friendly Ruby as a language actually is. And we wanted to bypass this completely. We wanted to provide a complete browser-based development environment for our students. So we wanted a live terminal,
we wanted a file browser, we wanted a text editor. And we were really lucky, because at the time we were planning on doing this, the open source code box project was announced. Now, code box is a Node.js-based application for doing exactly this, for providing a browser-based development environment,
a lot like something like Cloud9, which you may have come across. And so we started off. Each week, groups of about 10 students would join, and we would then use Chef Solo to spin up a new VM for that group of 10. Each student would get a Unix user. We would run an instance of this Node.js app under each user account, and then it had all sorts of logic
so that our front-end proxy would then send traffic for these unique development environments back to each of these Node.js instances. And this worked really well for our students. We saw people getting a lot further with the course, getting a lot further learning Ruby, because they weren't having to worry about how do I install Ruby to begin with.
But it had some fairly big problems on the business side. Particularly, it was still quite manual. We had to provision a new VM for a group, and so people couldn't get started instantly. They had to wait for a group of people to start. And it was a really inefficient use of resources.
We could get about 10 students per 2 gig VM, and most of these students would actually be using it for about 5 to 10 hours a month. But the rest of the time, this Node.js app was still running. It was still using resources, and it was getting really quite expensive quite quickly. It made it impossible for us to offer any sort of free lesson,
sort of try before you buy, because we just couldn't afford to be provisioning these environments for people who weren't definitely paying for the course. And so we started looking at Docker. And I played with Docker a bit in the past. I was really impressed with it. And I think, like most people, my introduction to Docker was the command line. If you go through the Docker tutorial,
that's how you first learn how to start and run containers. And so our first version, from our Rails app, used the Docker command line. Don't worry too much about the exact details of that command. But basically what happened was a user would sign up to our Rails app,
and we would kick off a sidekick job, which would then construct this Docker run command, which says what base image, sort of what OS we're working from. It gave it some details, such as ports that we need access to. Folders from our shared file system that should be mounted into that container. And the sidekick job then had Ruby shell out
and execute this via SSH on a node in our Docker cluster. And I imagine anyone who's a little bit familiar with Docker may be laughing at us slightly here for this approach, because it is admittedly ridiculous. Because Docker, of course, has a complete HTTP API, which is amazing. And so anything you can do via the traditional Docker CLI
that everyone gets introduced to, you can do via the API. So as an example here, I hope that's vaguely readable, you can see to create a container, we can do a simple post request to an endpoint exposed by the Docker daemon. And in that post request, we have exactly the same information
that was in that really long run command we just saw. We specify an image that we want to build it from, in our case, a custom code box image. We specify which volumes we might later want to mount external files and folders into. We specify the ports from the container that we may later want to map to ports on the host.
And finally, we specify a command to be run when you start this container. And that's good. We're no longer having to work in terms of shelling out and using regex to parse terminal responses, which was a fairly flaky process. We're getting nice JSON backs, which we can easily manipulate in Ruby
and see when commands worked when they didn't. But naturally, this is Ruby, and so it gets better, there's a gem for it. I strongly recommend this gem if you want to work with automating Docker. Here you can see exactly the same processes in the last slide, but I'm parsing in a normal Ruby hash to Docker container create,
and assuming that succeeds, I will get a container object back, and I can then perform other actions, such as starting it, stopping it, checking its status, directly on that Ruby object. And this is already much, much better than our sort of original command-line-based approach. There's much less of a switching cost.
We're working in a standard Ruby API. We're not worrying about direct HTTP calls, and we're getting nice Ruby objects back to manipulate. So it's really much more friendly to work with. But it's still not perfect, because the way Docker's architecture... Docker's architecture means there are actually three API calls to go from absolutely nothing to a running container.
First, we have to create an image if that image doesn't already exist on the Docker host, and you can kind of think of an image like a class definition. It's defining the OS, the files, the packages, and this is something you might well pull from a Docker register... from Docker's official registry
or from a private registry if you're running one. The next API call then creates a container from that image, so it's a bit like... a little bit like creating an instance of a class. And at this stage, we specify the directories that we might later want to mount externally into that container,
the ports we might want to expose. And then finally, we make a third API call to actually start that container we've just created. And at this point, we specify that we should... for example, for us, that we need to mount a particular directory from our GlusterFS file system into that container at a particular point that we want to map the port the Node.js app is running on
to a port on the host so that we can then proxy back to it. And so we're still having to think in terms of Docker's container workflow. We're not really thinking in terms of the business logic of our problem. We still mean there are quite high switching costs
to moving between working on the rest of our Rails app and working on the containerized component. And so what's brilliant about this API and this gem is that it's really, really easy to change it to build abstractions that allow us to reason differently about containers.
So we didn't really want to reason in terms of creating images and turning images into containers and then running containers. We wanted to reason in terms of a container should have these properties for this user, and we want to know that container is running irrespective of what may have happened before. And you can think of this quite a lot like
first or create an active record. We're not concerned about the underlying database driver or the specifics of how do you query for a record, how do you create if it doesn't exist. We just want to say a record has these properties, make sure it exists and return it to me. And because we have this API from Docker and this gem, it was really easy to build that abstraction.
We very, very imaginatively called our abstraction dacr, and here you can see we're using a standard Ruby hash to define specific properties that a container should have. And this is very similar to what we're defining in the previous API calls you've seen, things like the base image, the ports that should be mapped,
the volumes that should be mounted, and here we're also defining a few environment variables. It's a kind of standard practice in Docker that pretty much any configuration is pulled in for environment variables that you set when you start the container. Once we've got this Ruby hash, which we have generated automatically by our user object,
so a user object knows how to generate its hash representing the container for its IDE, we can then just pass that hash to a dacr container deployer and say deploy it, and what this will do is work out, has that container been created? If so, we should start it. If it hasn't been created, then create it and start it.
Or if it's already running and it's already there, just do nothing and return it. And this means that when you're working with containers within the app, you really don't have to think about the architecture or about the traditional Docker workflow. And to me, why this is so exciting and so useful
is all of our containerized infrastructure is now really just another HTTP API. We can treat it and work with it in exactly the same way we do the GitHub API or the Twitter API. And so in the same way, normally when you work with a third-party API, we'll wrap it in some sort of abstraction
that maps the way that API works to our actual business logic, to what we're actually trying to do, we can do this with infrastructure, which previously was quite difficult to do. And the end result of this for us is that our application is now very, very easy to reason about. Someone who's new and coming to this application
doesn't need to have an in-depth understanding of the terminology of Docker, of the workflow of going from creating an image in a registry to mapping folders to that container to starting that container. They just need to have a reasonable understanding of the abstraction that we've built around it, and they can start working on the application.
The outcome for us of this was incredibly positive. So the process we now have is a user signs up to this Rails application. The Rails application triggers a sidekick job, and that sidekick job is responsible for using Docker, which in turn uses the Docker API,
to make sure that these containers are started and created, and then once that job returns, making sure that our front-end proxies are updated to route the user back to that Node.js app when they try and visit their development environment. Probably the biggest sort of business benefit for us of this
is that because containers are designed to be stopped and started very easily, we now have a cron job, which goes through and checks when was the last time this container was used, when was the last time the user accessed it. If it hasn't been used for, I think it's half an hour or so, we can then stop that container. When a user accesses it again,
we can once again use Docker and the Docker API in the background to detect that container is no longer running. We need to start it, and then once it's started again, we route the user and update the proxy so that their traffic is being routed back to this container. And that's allowed us to go from a density
of about 10 users per 2 gig node, which was getting very expensive, to at least 500 users per 2 gig node. I say at least because it's probably much higher, but we haven't tested it higher than that. It's also allowed us to do things like offering free trials because in the same way Heroku can afford to offer free apps because most of them never get used and they get spun down,
we can do exactly the same things with these IDEs. So someone can sign up, they can try it. If they don't continue using it, their container is stopped, and that has effectively no ongoing cost for us at all. I said at the start I wasn't going to talk about additional deployment because I didn't think that was the most exciting thing. And you could sort of argue that I sidestepped that
because I really talked about deploying Node.js apps at runtime. We've used it in quite a few different scenarios, and these are a couple of the other ones that have worked which really have nothing to do with deployment. We had a scenario where we had a proprietary data set
which we couldn't share with third parties, but we needed to allow third parties to write analysis tools in C which we could then build, run this data through, generate summaries of their results, and then provide back to them. And again, we were able to create a very simple abstraction around a Docker container so that we could receive a tarball of their C code,
inject that into a container, build that C, pipe our data set in on standard in, and then wait for an event to be raised when it has finished, collect the data on standard out, and then post the analysis. And so there we didn't want to reason in terms of container state. We weren't interested in the ongoing is it running or not.
All we wanted to know is has execution completed. Another scenario where Docker is getting a huge amount of use are language playgrounds. Pretty much when any new language comes out very soon afterwards, somebody will create a language playground, somewhere where you can run small snippets of that code in the browser
and see the results. That then runs server-side, and you can then see the results. Great for teaching and for letting people have a go with the language before getting completely set up. And again, this is a great use case for Docker and something that we've used it for a lot where you can create simple Ruby objects that will deal with receiving code, working out what language it's in,
building a suitable container for that, collecting the output, which is generally just written to standard out, and then providing that back to a Rails app to be returned in an API response. And again, it's all about you can create these very simple wrappers around Docker and around containers so that you don't have to constantly think in terms of how does the containerization process work.
So hopefully what I've got across at a very high level in this talk is that because Docker has this HTTP API that is very full-featured, it's really, really easy to create abstractions over containerized infrastructure, and so that means that we can treat and reason about infrastructure in exactly the same way we reason about the APIs
that form the rest of our application. If you want to have a go with this, there's a page, the top one is just a page on my blog that has these slides and a load of resource links. If you're completely new to Docker, I strongly recommend the interactive Docker tutorial. It's a web-based tutorial that will take you through the command line and get you used to the terminology.
The Docker gem is excellent. If you've been through the Docker tutorial, you should be in a pretty good place to start using that gem directly. And the Docker gem, which is our abstraction, again, it's open source on GitHub. Feel free to use that, just an example of one possible way of creating these abstractions.
As a fun side effect, because Docker works entirely in Ruby hashes, we can also do YAML file-based deployments, much like Fig or Compose can be used to do. So you can define in a YAML file that you have a Rails container, a Postgres container, and a Redis container,
and then have Docker orchestrate that in development or across multiple hosts in production. Thank you very much for listening. Are there any questions?