We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Razor - Provision like a Boss

00:00

Formal Metadata

Title
Razor - Provision like a Boss
Title of Series
Number of Parts
199
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Razor is a flexible open-source provisioning tool that makes it easy to control how machines are built based on rules and policies. It maintains an inventory of nodes and their hardware characteristics, gathered by booting each node into a discovery image. Discovery information, together with user-defined policies is used to make installation decisions. Razor can install a wide variety of operating systems, from common Linux flavors like Debian, Ubuntu, CentOS, and RHEL, to operating systems known for their resistance to automated installation like ESXi and Windows. Beyond installation, Razor strives to be a tool for managing a machine's lifecycle, including power control via IPMI etc. and easy integration with external systems. Razor is an opinionated tool that focuses narrowly on provisioning, but makes it easy to hand off a node after installation to a configuration management system like Puppet to perform more complicated setup tasks and for ongoing maintenance. This talk will give an overview of Razor's capabilities and provide some hands-on examples about its use; it will also give examples of how Razor and Puppet can be used to address common provisioning problems, like building an OpenStack cloud
Sinc functionSlide ruleFormal grammarRule of inferenceUtility softwareMultiplication signLecture/Conference
Basis <Mathematik>Customer relationship managementMaxima and minimaMultiplication signContent (media)Scripting languageFunctional (mathematics)Virtual machineSoftware maintenanceComputer fileOpen sourceSystem administratorPoint (geometry)Configuration managementBounded variationOperating systemSystem programming1 (number)WordMereologyPlanningAdditionRight angleSoftware developerLogicQuantum stateSeries (mathematics)GradientRule of inferenceStaff (military)
Multiplication signRewritingSpring (hydrology)Software maintenanceSoftware engineeringGoodness of fitFitness functionArchaeological field surveyVotingPartition (number theory)Field (computer science)Bit rateCASE <Informatik>Electronic mailing listXMLProgram flowchart
DatabasePoint (geometry)MikrokernelPoint cloudBitMedical imagingTheory of relativityWeb 2.0Address spaceComputer clusterServer (computing)Computer hardwareData centerSystem programmingReading (process)VarianceGroup actionInheritance (object-oriented programming)Characteristic polynomialMetropolitan area networkState of matterOffice suiteDivisor
Rule of inferenceMathematicsVirtual machineVertex (graph theory)Lecture/Conference
Vertex (graph theory)DatabaseServer (computing)10 (number)Computer filePoint cloudAuthenticationSinc functionRule of inferenceBitCore dumpSoftware repositoryScripting languageTask (computing)Branch (computer science)BootingSoftware frameworkCustomer relationship managementSystem programmingSoftwareWeb 2.0Enterprise architecturePhysicalismVirtual LANService-oriented architectureGastropod shellFirmwareRepresentational state transferLibrary (computing)Design by contractWindowBus (computing)1 (number)Mobile appRepresentation (politics)Object (grammar)Cartesian coordinate systemMessage passingProcess (computing)Repository (publishing)Web serviceVirtual machineBasis <Mathematik>Different (Kate Ryan album)Software developerMultiplicationWordTerm (mathematics)Model theoryUser interfaceLine (geometry)CASE <Informatik>Video gameMusical ensembleNetwork topologyWritingMereologyData structureSystem callForm (programming)Metropolitan area networkForcing (mathematics)Multiplication signInsertion lossNatural languageState of matterPoint (geometry)ResultantReading (process)ForestMatching (graph theory)Universe (mathematics)Arithmetic mean
Right angleMathematical analysisLine (geometry)MetadataComputer fileCuboidServer (computing)DivisorState of matterUsabilityVertex (graph theory)Execution unitMedical imagingFlow separationOrder (biology)Key (cryptography)ExistenceCASE <Informatik>SmoothingPhase transitionSet (mathematics)Special unitary groupDecision theoryPoint (geometry)Model theoryWordBitTerm (mathematics)Process (computing)Water vaporDistanceMultiplication signOperator (mathematics)MereologyOperating systemConfiguration managementCore dumpTheoryFamilyVirtual machineLocal ringRemote procedure callSystem programmingRAIDCustomer relationship managementAdditionPower (physics)BootingReal numberWindowOnline helpSystem callFlagRule of inferenceScheduling (computing)InformationLatent heatComputer hardwareData storage deviceBlock (periodic table)Semiconductor memoryDifferent (Kate Ryan album)Kernel (computing)FirmwareStandard deviationBasis <Mathematik>Slide ruleGoodness of fitLecture/ConferenceXML
Pointer (computer programming)EmailXML
CASE <Informatik>Revision controlForm (programming)Right angleData structureService (economics)Particle systemVideo gameOffice suiteAuthorizationGoodness of fitLogicMereologyKernel (computing)Task (computing)Identical particlesVarianceExecution unitCycle (graph theory)MathematicsCAN busOnline help1 (number)Multiplication signImplementationKey (cryptography)Process (computing)WordState of matterComputing platformModel theoryVertex (graph theory)Software maintenancePublic key certificateCuboidOperating systemEnterprise architectureGastropod shellSystem programmingScripting languageMedical imagingFilm editingPoint cloudSoftware repositoryOperator (mathematics)Finite-state machineMikrokernelChemical equationServer (computing)Customer relationship managementColor managementDifferent (Kate Ryan album)Configuration managementAsynchronous Transfer ModeSoftwareComputer hardwareTable (information)Decision theoryMinimal surfaceRow (database)Lecture/Conference
Software repositoryWikiInformationStrategy gameMultiplication signComputer animationLecture/Conference
Transcript: English(auto-generated)
Alright, so next up we have David Luttercourt, as I see in his slide notes here from Albeas, which is an incredible tool, but with an incredibly difficult grammar ruleset. Anyway, he'll be talking about Razer, the provisioning tool, and a lot of utility tools.
Thanks Walter, thanks everybody for coming. As I said, I'm David. I write provisioning software. I've been with Puppet Labs since May of last year, so about nine months now.
Even though I'm pretty new to Puppet Labs, I've been in the Puppet community for much longer. I ran across Puppet pretty early on in late 2005 or so, and did a bunch of stuff on it, and pushed it into Fedora, and I've been around it one way or another for a long time.
One of the things that came out of my exposure to content management in Puppet, because I noticed this happened by the background of a developer, one of the things that came out of it was Albeas, and if you modify your content files still with the same thread, stop them right now and check out Albeas.
This talk isn't about Albeas, it's about Razer, it's about provisioning. Provisioning is one of these words that mean a lot of things to a lot of people, kind of like configuration or systems management. For purposes of this talk, what I mean when I talk about configuration and provisioning is this sort of situation.
You have a lot of machines, and you need to get them to do something useful. Hopefully your machines aren't sitting in the backyards, and hopefully somebody rags them and cables them up and they're ready to go. Traditionally, Puppet has had a first-mile problem when it comes to getting goings,
because Puppet really only starts after you have enough stuff on your machine so that you can run an agent on there. Razer is a tool to close that first-mile gap. It's of course not the only tool to do that. There's a ton of tools out there to help you with pixie provisioning.
There's a ton of open-source tools that do it. Each of the big management packages has some provisioning functionality, and there's of course, I'm sure everybody in here has their favorite Perl script, the 1000-line script that does everything anybody might ever want to do with pixie provisioning.
But if you look at them, they all fall into two piles. One of them, they do too little, and the other one, they do too much. The tools that do too little are the ones that just stop once they've installed all the packages, the end when you have a kickstart file. You have to run through a kickstart file to get your install,
because you're not just installing for the hell of it, at some point you need to manage that machine too. The other pile, the tools that do too much, they realize that that's a problem, and you need to do something to get more fine-grained management than just blocking packages down, and they grow all this contact management functionality.
But that's of course the wrong base for contact management functionality, because your provisioning tool is only involved at the very first time you build your machine, but you need contact management on an ongoing basis. Razer tries to be the Goldilocks of provisioning tools. Don't do too little, don't do too much, do just the right amount.
The way it does it is that it makes it very easy to, once the system has been built, to hand it off to a contact management system for further maintenance. The philosophy behind Razer is that you just install the bare minimum of whatever operating system you're installing, and then enroll it with Puppet or Chef or some other contact management tool,
and then do the actual personalization of the system with that. Since there are so many variations of provisioning tools out there to get a better idea, I made up a little user survey. Unfortunately I didn't have time to talk to any users, so I just made up the answers too.
So we've done a lot of software engineering research with that. To give a little prehistory about why Razer came about, or how it came about, and why it does what it does, it was started by two guys at EMC, they're now at VMware, Nick Weaver and Tom McSweeney.
They launched it in the spring of 2012 at EMC World, and then in the fall of that year at PuppetCon, they announced that they would move maintenance of Razer over to Puppet Labs, because they're a really good fit with Puppet,
but also because they felt they didn't have the time and resources to really push it forward. And what's happened since then is that over the last six months or so, in the early summer we took a look at where the codebase was, the initial codebase, and lessons learned from people using it.
One of the lessons was that it was really hard to get the initial codebase installed and going, and of course to maintain it, and so we decided to rewrite the whole thing. And my talk is about the rewrite. At this point, the initial codebase is legacy. If you have an installation for that, great,
but nobody should be installing that codebase, use the rewrite codebase. So one of the things that makes Razer unique is that it deviates from the general approach of these PC provisioning tools that try to make you look at your machines as pets.
These things that you know intimately well, and you have a personal relation to them, and you really care about them. When somebody comes, hey, build me a web server, database server, whatnot, you go down to the data center and look at your most favorite, most beloved servers, and pick the one that is going to do what you need it to do,
and go back to the office, enter the MAC address into your provisioning tool, and then hopefully you've got a machine at some point. So Razer is kind of taking inspiration from how people use clouds and trying to move it a little bit into the bare metal world.
On PC provisioning world, once you look at it, the machine is more like cattle. As things that are largely interchangeable, they have different characteristics, but within each group, they're pretty much the same. Just like with cattle, you have dairy cattle and cows that you raise for meat, maybe for breeding or for showing off at some show,
but within each group, all the milk cows are pretty much interchanging. And the way Razer does that is that when a system first encounters a system, it puts it into what in Razer was called a microkernel. It's really just a small Linux image that puts on the machine,
runs factors, and sends the facts back to the Razer server. And because of that, the Razer server has an inventory of the hardware that you have. And then later on, Razer decides what should go on there based on policy that you've set up. In your policy, you talk about what should happen with machines that have this much RAM
and these many cores and whatnot. And based on these policies and rules, Razer then decides, oh, this should get rel or this is a node that should get ESX installed. As I said, the rewrite, we changed a few things around.
One of the things is we use Postgres now as the database, just because Postgres is awesome. But the database with Razer is not a huge concern. We've literally stored tens of kilobytes of data for each node, so you can do the math, how many nodes you would have to have
before the database gets a respectable size. We also use Sinatra. The server is written in Ruby, and we use a Ruby web framework called Sinatra. If you haven't encountered Sinatra, you can think of Sinatra as Rails after a very, very serious night. And it's a really nice framework to write a web service. The one thing that's probably a little unusual,
and so far that's pretty standard web stack, we use TalkBox, which is kind of a plug-in to JBoss that turns JBoss into a Ruby app server. I don't know how many of you have deployed Ruby apps, and before you know it, you have a simple application that consists of like ten demons,
and web workers and some background workers and something that leads to email. It's a nightmare to manage, right? Because now you get to the basis of ten different things and you monitor them and all that. The nice thing about TalkBox is that it lets me as a developer do all these things, but it doesn't in one process. So as an admin, we're just watching this one process.
It's scary enough. So the one thing that's missing from here, since I've been talking about pixie provisioning so much, is a little bit of a need for pixie provisioning. What about DHCP? What about TFTP? And there, Rails also deviates from a lot of the pixie provisioning tools
that kind of naturally branch into managing all that for you. Rails does not do that. We don't really care what you use for DHCP or TFTP in SMS or whatever it is in SMS. DHCP, the what-have-you. All we need you to do is put two files onto your TFTP server
and instead use the pixie boot, the usual thing. Once you've put those two files on there, one of them is the ipixie firmware and the other one is a little script for ipixie that basically tells nodes, once they come up, go and talk to this other server over here, the Razer server. The genius of ipixie is that it gets you out of the TFTP malaise
of you can't really do anything. It lets you do all the booting via HTTP. Now we can write a web server that has interesting behavior, does useful things just to boot machines. Once you've got those two files set down, you don't ever have to touch them again. Everything happens on the server.
In terms of topology, Razer has really two APIs. One is a public API. You can think of that as the management API. That's what you use to tell the server what the policy and rules are. On the other side, there's a private API
where private nodes use to talk to the server while they're getting installed or while they're booting. And the private API really only comes into play for you if you decide to write your own custom installer to do useful things because then you need to know how to get files from the server,
how to tell the server to log something. The thing is that the public API, we have proper authentication around that. We use basic HTTP authentication and we use a library called Shiro underneath that makes it really easy to plug it into LDAP
or a bunch of other things. The public API is pretty well secured. The private API, just by its nature, you can't really secure because there, when a node comes up and says, hey, I'm a machine that looks like this, we just have to believe in this node. On the back end, you have to secure that
network by physical needs. I think if it's on VLAN or just completely segregated from the rest of the network, I think for people who do fix your provisioning, that's not good. How many of you actually do have to manage physical machines, fix your provisioning?
When I started doing this, I would have never thought that this would get that many hands because everybody's talking about cloud. I mean, it's still a real problem to do provisioning. Yes, that too, yeah. I'll talk about that at the very end a little bit.
Okay, and so the public API is kind of fairly garden-variety REST API. The one wrinkle is because it's really easy usually to modify things over REST. It gets really awkward. To change things, we have commands. You issue a command to create a policy
or modify a policy instead of doing weird gymnastics with representations of REST objects. The objects you need on your server are kind of the ones that are... Here, policy is the most important thing that ties everyone else together. You need a repository or multiple repositories.
That's the bits that you eventually want to install on the machine. You can either just point the RAZR server at an existing repository like the YAM repo that you have sitting somewhere or an AMP repo. Or you can hand it an ISO and import it on the RAZR server itself. That's what people usually do for Windows and ESX installations. They just import an ISO into the server.
Broker is kind of RAZR's lingo for the thing that does the handoff to the contract management system at the end. So there's a puppet broker, there's a puppet enterprise broker. Somebody in the community wrote chef broker. We don't ship that, but somebody in the community actually wrote a broker that just sends a signal on an AMP message bus
for the internal infrastructure. So you can, with a setup, you can move much more than just handoff to a contract management system. Broker, at the end of the day, it's a fancy word for a shuttle script. It's really not much more. Tags are named rules, essentially.
The way RAZR works is that when a node comes in, RAZR goes through all the tags it has and the rules are associated with them and checks whether those rules match that node. Your rule might say you must have more than 8 cores and 16 gigs of RAM and then you tag it as a medium big machine.
And all the same policy also carries tags. And once the tags fall policy and the tags on the node match, the policy matches and gets applied to them. And tasks, at the end, those are the actual things that do the installation, the kickstart scripts.
We actually went through a bunch of naming gymnastics because we initially called them installers but we wanted to do these things more than just installation. Eventually we set up some tasks after a few detours. And to write an installer
or to write a task is actually once you have the installation automated, so once you have a kickstart script and maybe a post-install shell script together, getting that on the RAZR server is a matter of writing five or six lines of metadata of what these files are.
Out of the box we have installers for these things on the right. So we have an installer for ESXi that was one of the initial use cases that Nick and Tom had for RAZR. They wanted to deploy ESXi and that's a real joy, we haven't done that.
They wanted to deploy that automatically. We also have installers, of course, for the various Linux flavors, Rails, CentOS, Debian, Ubuntu. And then the thing I'm really excited about, which I didn't think we would get not that quickly, is we also have Windows 8 installer now, which is, I don't know how many of you
install Windows on a regular basis. It's fun. We have, by all accounts, I haven't tried it, but by all accounts it actually works. So you can use RAZR to provision pretty much all the other systems you usually encounter on your dataset.
The installer itself, this is kind of a linear process. You can say, the first time we boot with this installer you do this, and that's usually a download of some kernel init-rd that is actually the installer, and then the second time we boot, we do something else, and the third time, so on. Until eventually you're done installing
and the thing is just set to boot locally. From then on, you have the machine in production, and you could, for example, write your own installer that is the very first step of some configuration of a RAID card, boot into some special image
that lets you modify and make great contact with whatever tools you use. And then after that, boot into the real operating system as well. A pretty easy thing to do. So everything is about
nodes in RAZR. Those are the machines that we really do all this for, and why we do that, because we need to put something on these machines. And from RAZR's point of view, the node largely consists of those four things. We have a little bit of information about the hardware that iPixie sends us, MAC addresses,
serial, I think. You don't get very much information out of iPixie just because these firmwares are pretty restricted in what they can tell you. We have FACT, which is right now a standard run of Factor, particularly with block devices and how much memory in cores and stuff like that.
Then a fairly recent and really interesting addition is Metadata. You can associate just a bunch of key value pairs with a node. What makes this really interesting is you can do that both through the API, so you can make a call and set some metadata key. But you can also do that from
the installer. The installer can call back or you can of course read those in the installer and make decisions based on certain metadata tags. So if you're totally crazy, you could push your whatever partitioning you want to have on your machine into metadata on your node, and then the installer can pull it out and lay down your custom
partitioning scheme. The last thing is state, which tells you whether the machine is installed and is right now the only thing that will add more flags there on what the node is doing. Those four things are all accessible when you write rules. The decisions about what gets
provisioned, you can base them off all these pieces of data, which gives you a good amount of flexibility. Another recent addition that's not on the slide is that we also added IPMI support so now you can use really such a both enforced power state and say this thing should be off
and keep it off and check that it's off every so often. You can reboot it and turn machines on or off and off. Right now, it's kind of simple. We support what the IPMI tool does, but we want to add support for other remote power management.
Just a few examples of what Razor can do for you. Of course, you can build machines with it and add to puppet or other config management systems like Chef. Your initial use case of building nodes and
setting them up with vCenter. That's actually Puppet modules that help you do that. But Razor is well integrated with that. One of the things I find really cool is you can use the provision OpenStack because at the end of the day you'll just use Razor to lay
down the basic operating system, get the Puppet engine going on that machine, and then you use the OpenStack Puppet modules to actually turn your machine into a Nova compute node or Swift storage or what have you. And then something that we're
taking baby steps towards, but I think that's where Razor will go in the longer term is something that manages the lifecycle of your machines. One of the really important differences between Puppet's notion of what a node is and Razor's notion of what a node is is that Puppet thinks of the node as something
that lives as long as the operating system is on there. If you take a machine and reinstall it, Puppet will think, oh that's a brand new node that I've never seen before. Whereas Razor really follows the machine and the hardware itself is not confused, not reinstalling things. It does not.
And so you can you can use Razor to do the typical complex lifecycles, but one of the things that people seem to do quite a bit is when you decommission a box do a secure wipe before you install something else in there. And that you can actually trigger just by setting metadata flags and writing rules
in a somewhat clever way. Something where we need to do more to make it really smooth is updating BIOS. Once you know that the BIOS on these machines need to be updated, it would be really cool if you could tell Razor to do that and then on the next schedule reboot and update the BIOS.
So you don't need to have a specific reboot schedule just to do the BIOS update. You might have a reboot machine every two weeks policy. So when it reboots Razor would then make sure that it first boots the BIOS updater image and then once that's done it goes back to booting locally and running whatever is
the solution. I've got a few pointers here. I think we have a couple minutes for questions. If you have questions after the talk, like tomorrow or so
we have a mailing list. IOC channel and so on. The thing that's kind of awkward is the flipping back and forth
between doing the BIOS update and the thing is that Razor right now once you've run an installer again considers the node installed and that keeps it from going through the policy table again. So we need a way to mark some of these tasks as non-destructive.
Like actually installers are destructive. You would never want to apply them to a node that is already installed. But some of those things are non-destructive like a BIOS update and you would want to distinguish between those two and allow applying the non-destructive stuff even to nodes that are installed.
TPM brings back flashbacks to a previous I don't know I want to go there and I don't know how much people actually want to use TPMs for that or how much they actually are using it for anything.
So the initial implementation had a state machine and then when we actually looked at the installers in the initial record models
that people actually wrote they were all linear processes. There was no real use of the state machinery. It just made things very complicated. And so one of the decisions we took with the rewrite was installers are just linear steps. You do step one, you do step two, you do step three. But there's no
branching and cycling and all that. But some of that will get back with the additional data we have about a node and things like what I just said to distinguish between the installed node or between destructive tasks and non-destructive tasks and non-destructive
tasks against installed nodes. So behind that is probably implicitly a notion of life cycle of the machine but it's not really exposed because I think that's just too hard for people to really make use of.
We don't have so can it revoke certificates and pull stuff out of the Puppet Master? Out of the box we don't have anything for that but it's a matter of writing a shell script that actually does that and into an installer.
So it would be a pretty easy exercise to do. I've heard about Bunce's Metal as a Service. I haven't really looked at how they do things.
My understanding is that they push much more into a cloud-like mode of operation. I think Razer tries to be very careful to strike a balance between being fairly hardware-centric and fairly close to the way people are used to doing physical
management and the kind of cloud-like features. I think there's a gamut of these uses. I don't think there's you'll make everybody happy by giving them a Metal as a Service cloud tool. So because of that I would expect that there's quite a few philosophical differences between Bunce and
Steam. Generally I would say
whatever you can do with Puppet because Puppet is the thing that worries about the ongoing maintenance and configuration changes of your boxes and only the things you absolutely have to do outside of Puppet and Razer.
If it's something you need to do to get the operating system on there you do it with Razer and everything else you do with Puppet. Yeah. Some platforms allow you to do that
in the running system and others like the tool in the image. If you can't do it in Puppet you don't have to do it.
For starters I hate the word micro-verb because it's a small Linux image read. There was another thing we changed from the English word cut is that we moved it to the Dora.
Right now we're using the Dora 19. The rationale behind that move is that we as public labs can be in the business of hardware support I mean there's companies that do that and they're way bigger than public labs. So the idea here is that eventually we'll move to the enterprise Linux micro kernel
so that if you have a support agreement with one of the enterprise Linux vendors you go and talk to them and you get your micro kernel because it doesn't like the network. Right now it's the Dora 19.
I actually just noticed that OpenOffice very helpfully made these things pretty much illegible. The first one to the server repo is to puppet-labs slash razor-server and that's where all the documentation lives to on the wiki
for that git repo. I try to keep everything there and then the other repos are kind of offshoots but all the documentation and most of the information is on that repo. Any more questions? Okay, thank you.