We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

To Run an App With Guarantees We Must First Create The Universe

00:00

Formal Metadata

Title
To Run an App With Guarantees We Must First Create The Universe
Subtitle
Lessons from managing self-contained application runtimes in production
Title of Series
Number of Parts
50
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
We’ll look at patterns and anti-patterns for self-contained, immutable runtime environment for applications using Habitat, with a focus on special cases, integrated testing and advanced hacks.
24
Thumbnail
15:29
25
Thumbnail
21:21
32
44
SpacetimeTwitterProduct (business)EmailMobile appUniverse (mathematics)2 (number)TrailCoefficient of determinationCurvatureProduct (business)SoftwareEnterprise architectureBitMeeting/InterviewComputer animationXML
Food energyFeedbackInformationQuicksortComputer animation
EmailCartesian coordinate systemData managementGroup actionComputer animation
Universe (mathematics)Point (geometry)Mobile appUniverse (mathematics)PiComputer animation
System programmingRight angleRing (mathematics)Operator (mathematics)WindowGame theoryMathematicsQuicksortMereologyPoint (geometry)Sound effectView (database)BitMobile appCodeVideo gameSheaf (mathematics)Universe (mathematics)Cartesian coordinate systemDifferent (Kate Ryan album)Entire functionCycle (graph theory)Cellular automatonNumberOperating systemQuantum entanglementInteractive televisionComputer animation
Service (economics)Moment (mathematics)Cartesian coordinate systemConnected spaceQuantum entanglementPhysical systemComputer animation
Goodness of fitElectric generatorCycle (graph theory)Physical system
System programmingTopological vector spaceData centerOperator (mathematics)Message passingService (economics)Channel capacityLimit setDatabaseStructural loadCartesian coordinate systemFigurate numberServer (computing)Digital photographyPhysical systemMultiplication signSpeciesMoment (mathematics)Range (statistics)Mechanism designQuicksortOvalNeuroinformatikFood energyExterior algebraScaling (geometry)Point (geometry)Revision controlSet (mathematics)PlastikkarteEnvelope (mathematics)Transportation theory (mathematics)Observational studyCommutator
System programmingPhysical systemContent (media)Mobile appCuboidBuildingComputational scienceVulnerability (computing)Installation artMiniDiscComputer fileQuicksortMultilaterationData managementMedical imagingCartesian coordinate systemMultiplication sign
WebsiteBlogComputing platformRevision controlArchitectureSoftware maintenancePlastikkarteSystem programmingIRIS-TMereologyCycle (graph theory)Cartesian coordinate systemQuicksortRevision controlMedical imagingVirtual machineComputer animation
Function (mathematics)Source codeModule (mathematics)Cache (computing)Revision controlModul <Datentyp>MereologyInstallation artRevision controlData managementModule (mathematics)Variable (mathematics)CodeGoodness of fitOperator (mathematics)InternetworkingCartesian coordinate systemComputer animation
System programmingTwitterEmailCache (computing)Variable (mathematics)Software developerUniform resource locatorMultiplication signInformationData managementMereologyObject (grammar)Data storage deviceLocal ringIntegrated development environmentBuildingComputer animationMeeting/Interview
TwitterEmailSystem programmingFile archiverInternetworkingProcess (computing)Repository (publishing)TouchscreenComputational scienceMechanism designModule (mathematics)Source codeRevision controlMultiplication signCASE <Informatik>NumberData compressionSoftwareHeat transferBuildingService (economics)ExpressionComputer animationMeeting/Interview
TwitterEmailSystem programmingComputer animation
Transcript: English(auto-generated)
So, my talk today is called to run an app with guarantees we must first create the universe. My name is Blake Ervin. I'm an engineer slash product coach at a company in Berlin called Smartbee, which is a sustainability company.
I'll talk a little bit more about that in a second. That's my contact details. I'll post them again at the end. I was hoping to be able to do more hallway track stuff today, but I have a sick dog at home, so I have to do my talk and run home and take care of my dog. Hopefully my flat's not destroyed. Yeah, so I work for this company called Smartbee. We make various software tools that we do data analysis, and we focus really on tools
for large enterprises and industrial sites, focusing specifically on transparency, efficiency, and sustainability. So our strategic goal or our ethical DNA as a company is reducing human impact on the planet.
Here's an example of one of the tools that we make. So you can kind of see here we've got some like general consumption information and savings and how many kilograms of CO2 we're saving by doing certain actions. And we're basically trying to give people sort of like a feedback loop to help them consume less.
And we are hiring. So if this is the kind of work that you think is interesting or are curious about, please talk to me either directly or via email or something later. What I'm specifically talking about today is things that we do at Smartbee with a tool called Habitat.
If there's any Kinvoke people in the room, I think Kinvoke is collaborating with the Habitat folks on Kubernetes support for Habitat. Habitat is like an application lifecycle management tool. It's very difficult to explain Habitat in one sentence, but that was my best effort.
And that's also something I'd love to talk more about later if anybody's curious about how Habitat works in more detail or hasn't tried it. I don't think there's a user group for Habitat in Berlin yet, but I would love to be part of that if there were a few people that wanted to do it. Anyway, so yeah, but it's not specifically a Habitat talk.
We're going to talk specifically about some things we had to do in Habitat, but I'm thinking more about like big picture stuff in general. Yeah, so anyway, the name of the talk is to run an app with guarantees we must first create the universe. And I think a fair question at this point is why am I saying that?
Like, why is it necessary to create the universe? And the answer is because we have to. Anything that happens requires the universe to exist. I'm stealing, or I'm paraphrasing Carl Sagan when he said, if you wish to make an apple pie from scratch, you must first invent the universe. So on the left we have the big bang, and then everything else that happens,
all biological evolution, and then the apple pie at the end. And since it's the autumn, we can contemplate apple pie for a moment, which I'm looking forward to eating lots of, especially if I get to go home to the States for Thanksgiving. Yes, but anyway, I'm talking about this sort of like idea of a pocket universe for the application,
like not the universe itself, but the universe from the point of view of the app. And so when I say universe, I don't obviously mean a real universe. What I mean is the application, all the code I need to run the app, plus all the dependencies the app has. And I'm talking about pocket universes because I don't want interactions
between these different applications. I want to define what those are and control them. Another way to think about this would be in like biological terms, because what's a little bit different about habitat versus things like, I don't know, like a BSD jail or a stone or a container, is that it's also explicitly defining the entire application life cycle.
So here, this is a beehive. In like this little cell number one, we have an egg, right? And then in two, we have a slightly older larva, and a much older larva in three. And in four, we have a pupa, so now it's about to transform into a bee, and what's going to hatch out of that is a bee.
That's the whole life cycle, not the whole life cycle, but it's the entire juvenile section of the life cycle, and that's all happening inside in isolation from the rest. And that's important because we want isolation, but one of the questions that a lot of folks that I work with at Smartbee asked when I started pushing this idea of heavily isolated applications is,
why does this matter? And for me, as a sort of like a career-long operations engineer, a lot of this is about safety, right? So if you, I remember lots and lots of times, early in my, well it wasn't really my career,
but before my career when I was working, or when I was using horrible things like Windows 95 at home to play games, I would install a game. If there was a problem with the game, I had to reinstall like the entire rest of the operating system because there was some kind of like splash damage happening. That was not safe, right? Splash damage is bad. Like we don't want, just because we make a change to one thing,
one part of the stack that we're running, we don't want bad effects to spill over into other components. And that, I don't know if anybody else has had experiences like that, but I certainly have, and we'll talk a little bit more later about why safety matters for another reason,
which I'll get to in a bit here. There's also like the problem of entanglement. Does anybody, can anybody tell me what film this is from? This should be easy for this crowd, right? I can't hear. Yeah, right. So this is, I think this is the first Lord of the Rings movie,
and the dwarf Gimli is about to fall off a cliff into some hole or something, and one of his buddies grabs him by the beard, and the reason that this is a dramatic moment besides being funny, the reason it's dramatic is because you don't know if the beard, which is like the connection between the two people, is going to save this one guy or kill both of them, right?
Because if he falls too far, he's going to pull the other guy with him, and then they're both going to die, and that's the kind of entanglement that can happen between services or applications that run, let's say, on the same system especially, and so we want to really control exactly how those connections look, right? Because we don't want to, we don't want to be in a situation
where one service going down pulls the other service with it. So we, in general, what we're trying to do, or one of the things we're trying to do is avoid cascading failures. This is another good example from biology. Has anybody ever built like a self-contained terrarium before?
So these are like sealed, right? Like there's no air coming in or out. You have enough microorganisms and like animalia in here to generate carbon dioxide, which the plant will then turn into oxygen and you have this cycle. If I have three of these and I mess up the system here, the other two should survive, right?
So that's another example of like isolation being a good thing for us, or we can go back to thinking about like the beehive example, right? Like one of, I'm not sure offhand whether there are wasps that parasitize honeybees, but almost every insect on earth has some kind of parasitic wasp that preys on it, and the parasitic wasp will come and lay an egg on the egg of the thing
that it's parasitizing and its larva hatches and eats the bigger animal's larva. If that happens here, let's say like I'm a wasp and I come and land here and my baby starts parasitizing that larva, all the rest should be okay in theory, because everything's separated and isolated.
But the more important thing for us at Smartbee is our emphasis on scaling down. And some of the safety stuff applies to this as well, as we'll see in a minute. Scaling down matters to us because the entire point of the company, like the reason Smartbee exists,
is because we are trying to get humanity as a species to scale down. Most of human history, especially since the industrial revolution, has been all about growth, and now we're kind of like pushing the envelope for resources and we're starting to see places where we just don't have any more stuff to use, right? So we need to scale down and start reusing those resources, or avoid using resources that we don't need to.
And that also applies for technical operations. I don't remember exact numbers, but you can definitely see some really interesting studies about the amount of energy that's going to be consumed by data center operations over the next like 10 to 20 years, and it's not super cool if we continue powering these data centers with like fossil fuels, for example.
So ideally what we want to do then is we want to do workload consolidation. We want to get as much stuff to happen on a single compute resource as possible so that we don't have any unused resources that we're still emitting carbon. But that is traditionally a little bit scary, right? Because we have safety concerns.
If I was doing this in the 90s and somebody told me, put every single one of your services on the same box, I would be like, there's no way I'm doing that. I'm absolutely not doing that. That's crazy. I can't guarantee that my database and my application server can run on the same system safely because they have too many possible interconnections.
And then I won't be able to do like an apt-get update, for example, because if I update OpenSSL and I have one version of OpenSSL for the database and I need a different version for my application server and I do both of those things, or I do that one update, I could break one of those two critical services. But complete isolation makes things much safer,
so we can put a lot of muscle in a very small space. These are, I'm not sure if they're necessarily racehorses, but I go horseback riding every moment or every opportunity I get. And one of the things that you realize when you're close to a horse is that they're a very big animal.
Like, they weigh around 1,000 kilos, I think, or somewhere in that range. If one of them falls down on you, it will either paralyze or kill you. So you really, like, safety is a big issue when you're working with horses. And horses can also hurt themselves and other horses very easily just because they're so big. But when people transport them, they want like a safe mechanism
for getting all of this valuable muscle from one place to another and building these extremely strong, isolated sort of like containers, basically, for the horses, that's a way of getting all of that valuable muscle from one place to another without wasting a lot of resources. So instead of having like one truck per horse, which would be the easy way to do things,
which is basically what we did with Compute in the 90s and kind of even still do it today, to be honest, the alternative to that is to build the strong isolation so that we can pack a lot of resources, a lot of Compute or work onto one set of resources. And this is kind of, this is like the classic,
like I would say for half of my career, which is now 15 years or so, this is what I thought was a good, this is the way that I thought the world should look for the services I was running, that everything was like perfectly clean and nice and there was lots of overhead, lots of, you know, like breathing room at the top and nothing was overutilized
and everything was fairly cold, like literally cold, or I would just say that figuratively that my system's really cool and not running too hot, but that's not actually what we want. What we want is something more like this. This is the famous, this is fine dog, but actually this is fine.
Like if everything's almost on fire, like all my systems are running at like 98% capacity, that's actually great because that means I'm not spending any carbon on anything that is not being used, right? That's actually what we want. We want every, one of the big problems that we have with transportation in the United States is that if you go to work in the morning
in the Bay Area, that's where I lived before I worked in Berlin, I would commute to work on my motorcycle and I would, and also in California you're allowed to lane split, which means you can drive in between two lanes of cars on the highway. So I'd be going like 120 in between these other cars and I would pass car, car, car, car on both sides of me
and every single car had a single passenger, right? So they're all burning fossil fuels, but they're only operating at like 25% capacity. That's bad. What we want is every car to be completely full, right? These people, probably for economic reasons,
are much better at utilizing their resource than most of us are, and this is something that we're trying to fix at SmartB for ourselves too, like the way we run services. This might actually be not ideal optimization, right?
There's a good chance that this truck is not operating at the best load to carbon ratio, carbon output ratio, but that's something that we can figure out, right? We're smart people, we know how to measure stuff. We can figure out ways to say, I want to put as much stuff as I can on this resource
without increasing my carbon footprint in a bad way. And this is the picture that I like to look at when I think about this stuff. This is the NASA photograph AS08-14-2383, which was taken from the Apollo 8 spacecraft in the 60s, I guess.
And this is, I think, the first time in history that a human held a camera and took a picture of the place that all of the rest of human history had happened, right? So that's like everything that ever happened is right there, more or less. Everything that we care about, everything having to do with human life,
and that was, I think, the beginning of our realization as a species that we are really in a very small container, like we're in a very small limited set of resources and we need to protect those or things are going to get super weird. And that is why I really think that the future is scaled down, not up. At the moment, we're still very much sort of addicted to this,
the sort of high you get from working on all of this tech stuff that's very focused on growth and speed and performance, although technically speaking, performance systems can also be very small, but we can talk about that more later. Anyway, yeah, so this is why I think that the future is scaled down,
but there's a problem with this, that there are downsides to doing this kind of isolation and this is sort of the nitty-gritty technical details I wanted to touch on very quickly today for some of the places where we've had pain trying to keep things really small and isolated
and how we tried to solve those. So one of the biggest problems is that if you want to prove that you can run in isolation, you have to start from zero every time. Habitat as a tool, which is the tool we use for doing this. Other folks might use Docker. You can use Habitat Docker together, actually. Habitat as a tool assumes that you always have to start from zero to prove that you really have isolation,
which means that if we are also, say, depending on something upstream, like we do a lot of Python stuff, that means we do a lot of pip installs, PyPy is pretty much guaranteeing the contents of the packages that we install, right? So they're doing this for us. It would be stupid if I was getting these supplies for me to unpack
and repack every box that drops out of the airplane, right? This is me doing pip install, basically, during our builds. Yeah, so that doesn't make a lot of sense, but that's actually what we're doing whenever we do a build, like a Habitat build. When we build our artifacts, we do a pip install of all the dependencies
that we need, which literally means that we're downloading everything, unpacking everything, and then compressing it again. But it was already compressed and checked something, so that doesn't really make a lot of sense, right? And it intellectually is annoying, but the real tangible pain of it is that it just makes builds very slow.
So we have a very... One of our applications is a very big Python monolith that does a lot of scientific computing. It's about half a gig after it's installed on disk. Waiting for half a gig of files to be compressed is a long build. And it's like 300,000 files or something. So one of the ways that we chose to fix that was by vendoring, right?
So if you've used a package manager for any of the common distributions, you've probably used a vendor thing before. A really common one that I used to use on Ubuntu systems back in the day was, I think, the apt-vendored image magic, because image magic is super hard to build,
but I needed it for a lot of my apps. It's also full of security holes. So for us, though, that's not really what we care about so much. We don't do much with image magic, but we do do a lot of machine learning stuff, which means we use TensorFlow. And this version of TensorFlow is about 150 megabytes compressed.
So when you uncompress it, it's like 300 or something. It's a huge, huge, huge package. So uncompressing and then recompressing that when we don't need to is something that was kind of like a wasteful part of the cycle we wanted to fix. And we fixed it by doing some tricks to actually unwrap it in the Habitat builder service,
which is part of the application lifecycle, and then store the thing that we had sort of pulled out of PyPy or pulled out of the pip module, store that as a vendor package, but a Habitat package, so that now we can depend on it as a Habitat thing and we don't have to go through the dependency song and dance twice. So this is like...
I have mixed feelings about showing actual code in talks like this because I don't think that you can learn much from it, but I can share this to anybody who's interested directly if you contact me later via the internets. But basically what we're doing here is we have a variable called package version. It's actually a function, but we're treating it also as a variable.
It's part of the build plan. We look at our requirements.txt, which is part of the pip install ecosystem for the main application that we build, and then we get the version and then we update that.
And then the big trick that we do here, depending on what dependency manager you have, you may or may not be able to use this trick. We use a feature of Habitat that allows us to push an environment variable from a thing you depend on into the thing that depends on it. And that allows us to construct a new Python path
with all of the dependencies that we've rendered. And that's how we tell builder what we care about or what should trigger a new build of the vendor module. And here we're depending on it. This is very ugly code but very readable, which is a good thing for ops because you usually don't look at this stuff for months or years.
So you really want it to be readable, not elegant. And then the last thing that we do is local caching. And for that, we use similar tricks. This is a physical, like a food cache, basically. If you had an emergency cache like this, you wouldn't want to restock it every time you used it.
You would leave as much as you could behind between uses, and that's what we've been trying to do because caching really just means reuse. And so I wrote a very, very small package that uses the same trick about pushing environment variables to push or to configure your different, like NPM or Go
or the PIP build environment to store the things that those dependency managers cache in the loopback-mounted caching location that's part of Habitat anyway. This could certainly be improved. We could also talk about, like, I've been thinking already about additionally pushing the cache to, like, S3 or some object store somewhere else.
But this basically means that if we do this, if this is my dependencies for my build, if I depend on bshue-cacher, that's going to automatically, at least for local builds, like local development work, it's going to put all the stuff that I want to cache into a permanent location instead of an ephemeral one because every time I do the build,
Habitat's going to tear everything down and start from zero. But this, the stuff that we're caching will avoid that. I don't know if we have time for questions, but I'll put my contact information up there again because I think the time is short. And that's it.
Python packages. Yeah, so you're talking about
vendoring for Python packages. Is that the kind of primary solution you use for supporting Python dependencies? Or do you use, like, wheel builds or anything like that? Yeah, so if we were doing wheels, we kind of were in the same situation, right? Like, the wheel is kind of like a guaranteed thing, and then if we have to download the wheel from somewhere,
even if it's our own thing, it's just doubling the work because we already have a mechanism for taking some bits, putting them into an archive, which in this case, a Habitat archive is like the lowest common denominator, which has a checksum and guarantees. So we did try the wheels thing for a while, but we didn't see huge performance wins by doing that
because we still had to go through the unzip, expand, compress, repackage, like this whole cycle again. Also because... No, I think that was pretty much it, yeah. It was mainly just a speed of light problem, like just going to the Internet
and doing all these TLS handshakes every time we grab a new module just turned out to be really slow. So the more we can avoid that, the better it is. And in Python's case, it's especially difficult when you're using scientific computing stuff because TensorFlow is huge, Cython is huge, SciPy is huge, these are massive modules. I think in the case of NPM
where you have absolutely insane number of modules, you probably have more trouble not on the compression and uncompression side, but the TLS side, just the network transfer. We have one minute left for questions. You can definitely ask me stuff online later, I'll do my best to respond as quickly as I can.
Yeah, so when you say vendoring, is that checked into your source repository,
is that how you use Habitat? No, no, actually what happens is the build service in Habitat is looking at our repository and that stuff that I kind of had to skim over real quickly, that TOML definition, it's builders looking, every time there's a git push, it looks at the master branch, sees if any of those things that match those globbing expressions have changed,
and then if they do, then it goes through the process of actually looking at the requirements file, figuring out what the version of the module is, and then building a vendored module and then pushing the vendored thing back into the builder repository. There's a lot of moving parts, that's why I decided to put my contact stuff back up on the screen
because there's a lot of little details that are not clear and that I couldn't explain efficiently in 30 minutes, so I'm happy to talk about it later online or something. Anything else? All right, I think that's it then. Thanks.