We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Quest for the Next Evolution of Infrastructure Management

00:00

Formal Metadata

Title
The Quest for the Next Evolution of Infrastructure Management
Title of Series
Number of Parts
133
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The age old task of racking and stacking in a physical data centre is becoming more and more rare as more companies embrace the public cloud. Having the ability to chose between providers such as AWS, Azure, Digital Ocean and Google Cloud Platform makes creating infrastructure easy. It is better to spend time developing better services for our customers than managing infrastructure. During this talk, Paul will demonstrate how building a scalable infrastructure on AWS becomes easy with Terraform. The talk will demonstrate how using configuration management, pre-baked AMIs and auto-scaling groups it gives the ability for developers to be able to launch their own infrastructure when needed. The demo’s will include the ability to launch instances, databases and manage user access By the end of the talk, Paul will have demonstrated that the creation of infrastructure now becomes part of the development lifecycle and that the old ways of system administration is fast moving to become infrastructure engineering. Paul will also demonstrate that the creation of new ‘environments’ are just a change of parameters in our infrastructure code.
36
61
Thumbnail
1:11:04
73
107
Software developerTime evolutionEmailData managementData managementEmailEvoluteTwitterCASE <Informatik>Data structureSelf-organizationPulse (signal processing)Instance (computer science)Application service providerAddress spaceC sharpPoint cloudTerm (mathematics)MereologyJSONXMLUMLComputer animation
Server (computing)Software developerServer (computing)CASE <Informatik>WindowVirtual machineTerm (mathematics)CuboidSelf-organizationSystem administratorComputer animation
Software developerStack (abstract data type)Data managementConfiguration spaceGraphics tabletLevel (video gaming)System administratorVirtual machineMathematicsMarkup languageConfiguration managementCombinational logicPhysical systemState of matterData managementDifferent (Kate Ryan album)Configuration spaceFormal languageCivil engineeringMarginal distributionCASE <Informatik>Computer animation
Server (computing)Software developerCartesian coordinate systemTerm (mathematics)Server (computing)MaizeProduct (business)Goodness of fitPoint (geometry)Service (economics)Group actionPatch (Unix)BlogEmailState of matterConfiguration managementWindowConfidence intervalCodeSystem administratorConfiguration spaceStructural loadComputer animation
Server (computing)Component-based software engineeringSoftware developerMobile appWindowServer (computing)BlogConnectivity (graph theory)CodeService (economics)BitDreizehnElectronic mailing listOrder (biology)Cartesian coordinate systemPoint (geometry)Programmer (hardware)Virtual machineTerm (mathematics)Product (business)Right anglePhysical systemState of matterLastteilungComputer animation
Software developerDemo (music)Discrete element methodComputer-assisted translationServer (computing)UsabilityDemo (music)Instance (computer science)Open sourceInternet service providerKey (cryptography)Type theoryCuboidIntegrated development environmentDeclarative programmingDigitizingGoogolData managementEccentricity (mathematics)Default (computer science)Absolute valueFigurate numberInformation securityComputer engineeringService (economics)Computer animationSource code
Software developerCorrelation and dependencePlanningProcess (computing)State of matterWebsiteCodeElectronic mailing listNeuroinformatikInternetworkingGroup actionInstance (computer science)DatabaseConnected spaceKey (cryptography)Information securityPhysical systemLastteilungBitCuboidIntegrated development environmentWeb 2.0SpacetimeGateway (telecommunications)Scaling (geometry)Product (business)Server (computing)Configuration spaceComputer animationSource code
Software developerAddressing modeDefault (computer science)Table (information)Associative propertyKey (cryptography)Default (computer science)InformationRight angleComputer animationSource code
Software developerCorrelation and dependenceDefault (computer science)Right angleDemo (music)Metropolitan area networkType theoryVirtual machineInternet service providerPoint (geometry)Different (Kate Ryan album)Point cloud2 (number)Computer animationSource code
Software developerDefault (computer science)State of matterSoftware testingProduct (business)Software testingPhysical system2 (number)Level (video gaming)Instance (computer science)Line (geometry)Integrated development environmentMoment (mathematics)CodeSource code
Software developerCommodore VIC-20Software engineeringDemo (music)Default (computer science)FingerprintInformation securityGroup actionCommunications protocolGroup actionInformation securityPoint (geometry)Instance (computer science)Configuration spaceRemote procedure callBitMereologyForm (programming)Key (cryptography)WindowWeb 2.0Computer animation
Software developerUniversal product codeDefault (computer science)Web pageInstance (computer science)Point (geometry)CASE <Informatik>Group actionInformation securityVirtual machineComputer virusState of matterEquivalence relationScripting languageDemo (music)Pairwise comparisonInternet service providerCloud computingConnected spaceDigitizingRevision controlPhysical systemConfiguration spaceWeightCuboidComputer animation
Software developerLibrary (computing)Drum memoryFingerprintArc (geometry)Default (computer science)Group actionInstance (computer science)Instance (computer science)Amsterdam Ordnance DatumGraph (mathematics)Table (information)Integrated development environmentInternetworkingAssociative propertySystem callPoint (geometry)RoutingIP addressFunction (mathematics)Greatest elementVirtual machineCharacteristic polynomialStructural loadComputer animationSource code
Programmable read-only memorySoftware developerExecution unitMultiplication signDisk read-and-write headCuboidLimit (category theory)Virtual machine1 (number)StapeldateiConcurrency (computer science)TwitterComputer animation
Software developerInstance (computer science)Graph (mathematics)Level (video gaming)Group actionInformation securityPhysical systemEndliche ModelltheorieCuboidInternet service providerMixed reality2 (number)Instance (computer science)Multiplication signWeightPoint (geometry)Order (biology)Serial portConcurrency (computer science)IP addressPublic-key cryptographyComputer animation
Software developerGraph (mathematics)Instance (computer science)Communications protocolInternetworkingLastteilungInstance (computer science)Structural loadWeb browserCartesian coordinate systemFunction (mathematics)Direct numerical simulationPoint (geometry)Physical systemWater vaporSource codeComputer animation
Software developerGastropod shellBeta functionInsertion lossGraph (mathematics)Default (computer science)LastteilungSet (mathematics)State of matterScripting languageForm (programming)Integrated development environmentInstance (computer science)CASE <Informatik>Computer animationSource code
TorusSoftware developerDefault (computer science)Web browserDirect numerical simulationAddress spaceLastteilungAreaRight angleLocal ringDressing (medical)Computer clusterInstance (computer science)Configuration spaceComputer fileOcean currentCharacteristic polynomialImage registrationBitElectronic mailing listLatent heatMereologyState of matterTouch typingPhase transitionIntegrated development environmentDemo (music)2 (number)AliasingSystem callNeuroinformatikSource codeComputer animation
Software developerWide area networkPerfect groupService (economics)Goodness of fitInstance (computer science)Group actionInformation securityInternetworkingRoutingRepository (publishing)LastteilungRight anglePosition operatorComputer animationSource code
Convex hullUniversal product codeNewton's law of universal gravitationSoftware developerInclusion mapEndliche ModelltheorieDatabaseSpacetimeSource codeCASE <Informatik>Ocean currentCodeInternet service providerIntegrated development environmentSoftware testingOffice suiteProduct (business)Instance (computer science)Number2 (number)Data managementUniform resource nameOperator (mathematics)Position operatorTelecommunicationOrder (biology)Data conversionModule (mathematics)Point (geometry)Right angleRevision controlComputer animation
Default (computer science)Software developerBus (computing)Information securityGroup actionInternetworkingFingerprintReceiver operating characteristicIntegrated development environmentInstance (computer science)State of matterConfiguration spaceOrder (biology)Module (mathematics)Level (video gaming)Range (statistics)Different (Kate Ryan album)BitLoop (music)Template (C++)Expandierender GraphCodeCuboidType theoryBlock (periodic table)Backup2 (number)Term (mathematics)Information securityGroup actionNP-hardVolume (thermodynamics)Elasticity (physics)Point cloudServer (computing)Variable (mathematics)Revision controlData managementPhysical systemRule of inferenceScaling (geometry)Network topologyMultiplication signHard disk drivePoint (geometry)Set (mathematics)Data structureDefault (computer science)Uniform resource locatorProduct (business)Formal grammarSource codeDrop (liquid)Source codeComputer animation
Software developerInformation securityExecution unitPiBlock (periodic table)Default (computer science)Data storage deviceBitSpeech synthesisGoodness of fitWindowLengthSoftware testingElectronic mailing listContext awarenessVolume (thermodynamics)SoftwareGateway (telecommunications)Shared memoryLatent heatBlock (periodic table)Demo (music)Instance (computer science)Connected spaceInternet service providerConfiguration spaceRootPoint (geometry)Pay televisionNeuroinformatikMiniDiscSubject indexingClient (computing)Right angleMultiplication signLink (knot theory)Graph (mathematics)Default (computer science)CountingProduct (business)WeightTask (computing)Data managementGame theoryScripting languageBlogSocial classAutomationRoutingSource codeRoyal NavySineGroup actionEqualiser (mathematics)Computer animation
Software developerGroup actionPlanningPairwise comparisonString (computer science)Logic gatePoint (geometry)Channel capacitySource codeComputer animation
9 (number)Software developerGroup actionState of matterPoint (geometry)SoftwareSoftware testingCodeRight angleGroup actionMultiplication signElectronic mailing listWeb applicationData managementCountingWebsiteWordComputer animationSource code
Software developerGroup actionVirtuelles NetzConfiguration spaceSource codeBitStructural loadSoftwarePoint (geometry)Range (statistics)Address spaceBlock (periodic table)Source codeComputer animation
Software developerPoint (geometry)Group actionSystem callSoftware testingSoftwareVirtualizationWeb portalMultiplication signComputer animation
Software developerInflection pointIntelConvex hullFingerprintComputer-generated imageryDefault (computer science)CodeModule (mathematics)Latent heatComplex (psychology)Configuration spacePlanningKey (cryptography)Instance (computer science)Drop (liquid)Group actionVirtual machineMathematicsForcing (mathematics)System callPoint (geometry)Physical systemState of matterCloud computingDatabaseSoftware testingSoftwareData structureGoodness of fitCartesian coordinate systemRight angleMassDifferent (Kate Ryan album)Internet service providerComputer fontFront and back endsBlock (periodic table)Form (programming)Token ringDigitizingWeb 2.0Graph (mathematics)2 (number)Computer fileVideo game consoleFactory (trading post)Visualization (computer graphics)Touchscreen.NET FrameworkGraph coloringLine (geometry)Computer animation
State of matterNumerical digitComputer-generated imagerySoftware developerDefault (computer science)Drop (liquid)Multiplication signTouchscreenSoftware bugKey (cryptography)Scripting languageState of matterBit rateAbsolute valueComputer animation
StatisticsDrop (liquid)Software developerDemo (music)CodeSelf-organizationSoftwareBasis <Mathematik>CodeSelf-organizationGroup actionSingle-precision floating-point formatMathematicsLine (geometry)Basis <Mathematik>Computer programmingType theoryCartesian coordinate systemOffice suiteComputer animation
Software developerData recoveryData centerRevision controlEntire functionRange (statistics)Integrated development environmentNumberInstance (computer science)Configuration spaceUniform resource locatorData recoveryAddress spaceWechselseitige InformationState of matterComputer animation
Software developerTime zoneSoftware testingPhysical systemCuboidType theoryWebsite1 (number)Confidence intervalChaos (cosmogony)Lie groupServer (computing)Right angleElectronic mailing listGroup actionInsertion lossLaptopScaling (geometry)Proof theory2 (number)Product (business)High availabilityData centerExpected valueSystem administratorInstance (computer science)Flow separationDemo (music)Entire functionComputer animation
Software developerWebsiteCuboidSystem administratorDemo (music)Instance (computer science)State of matterMultiplication signRadical (chemistry)Data structureWeb 2.0Disk read-and-write headServer (computing)BuildingCartesian coordinate systemIntegrated development environmentValidity (statistics)CodeCASE <Informatik>Computer virusUniform resource locatorComputer animation
Software developerServer (computing)Scripting languageArithmetic progressionService (economics)CurveProduct (business)Web pageCartesian coordinate systemAxiom of choiceWordConfiguration managementLevel (video gaming)State of matterCuboidSingle-precision floating-point formatComputer animation
Software developerGamma functionGroup actionScaling (geometry)Cartesian coordinate systemScripting languageOnline helpInstance (computer science)Message passingEndliche ModelltheorieIntegrated development environmentType theoryAreaRule of inferenceMultiplication signOrder (biology)Product (business)SoftwareCuboidConfiguration spaceBuildingMoving averageWechselseitige InformationComputer animation
Software developerComputer animation
Transcript: English(auto-generated)
Thank you for coming to the talk. This is the talk about the quest for the next evolution of infrastructure management.
This is essentially a lot of practices I've learned about building infrastructure in the cloud over the past few years. My name is Paul Stack. My Twitter handle and my mail address are here. You can send me an email. It's hit and miss about whether I'll email you back because I'm very bad at email. I will definitely tweet you back though because I'm a Twitter addict.
About me, I'm an infrastructure engineer. I'm a reformed ASP.NET and C Sharp developer. I apologize to all C Sharp and ASP.NET developers. I don't enjoy working in that side of the world anymore. I enjoy much more about building infrastructure. I'm a DevOps extremist. I absolutely love everything about DevOps and about what it has enabled within organizations
that I've been part of. And I actually believe in the values that it brings to companies. Note here, I am not a DevOps engineer. There is no such thing as a DevOps engineer, in my opinion. I'm a conference junkie. Although this is only my first conference of the year, there are already many planned. This is the anti-Docker talk.
If you expected to come here and listen to Docker, I am sorry. It is not going to happen. This is actually going to be building instances in the cloud, AWS, Azure, and DigitalOcean, and showing how simple it can be without the need for Docker.
Docker has its use cases, in my opinion. I don't like it. So in 2012, Martin Fowler coined the term Snowflake Servers. So this is very much an idea in old-style organizations, that your physical infrastructure was created, and it was long-lived.
So it was there forever. Let's think of Windows machine, and let's think of ten sysadmins working on a team manually RDPing into boxes. Now, they would manually log in, and they would probably install a new package that somebody from the development team has asked for, and that would be okay.
You cannot guarantee across your server farm that all of your servers are the same. This, as you can imagine, brings problems. It was called Snowflake Servers because you cannot, it's exactly like a snowflake. Every snowflake is different, and in this case, more than likely, every server is different. This leads us to creating infrastructure that we treat as pets.
We like to keep hold of that pet. We like to nurture that pet, and we'll let it live as long as we can, in as much pain as we can, because that's the nasty sysadmins that we are. And then we evolved, and we started moving into configuration management tools.
Anyone use configuration management tools? Puppet, Chef, Ansible, CF Engine? Wow. Does anybody in here actually manage infrastructure? Does anybody in here have an interest in managing infrastructure?
Or is this just a nice, quiet room? So configuration management tools came around in, I want to say, it's like just before 2000. CF Engine was the daddy of all the tools, and everything seems to have stemmed and got more and more abstract in how you do it, for example, Puppet uses its own markup
language, Chef uses Ruby, Ansible and SaltStack use a combination of YAML and Python-based stuff. But these tools were fantastic, because they would allow you to describe the state of a configuration of a system, and then they would maintain the state of that system.
Each of them would do it in slightly different ways, so Puppet, all your machines would be continually checking into a Puppet server, asking for what the state should be, and if anything changed, they would then make the changes back to bring it to what the actual Gnome state was. And they developed a much better way of doing it, because we could then start to say that the servers were Phoenix-like.
Does anyone remember Phoenix Nights? Awesome. So Phoenix servers was a term actually coined by Cornelius Siestma, and she works at Thoughtworks, and she spoke about it in an internal mailing group, and then Martin Fowler created a very good blog post in 2012, and what he basically said is that at
this point, you could destroy a server and you could bring it back up, and by using a configuration management tool, you would know the state of which that server was at, and you could be a lot more confident. Now, if we didn't have this, we would get into the situation where we were deploying
our applications into production, and we don't know how our application would work on a long-running server, or, alternatively, we would, every patch Tuesday in a Windows environment, our sysadmins would happily be applying our patches, and then on Wednesday
morning when we deployed a new application, it would break, because one of our pieces of code was not applicable to one of the patches that was rolled out. This has happened in loads of companies I've worked in. This will continue happening in companies, because we have this idea of longer-run servers. So we moved from long-run servers, we moved to configuration management, and about being
able to configure the state of a server and bringing it back up, but there is a new way of thinking. In 2013, Chad Fowler, who works for 6th Wunderkindl, which makes the tool Wunderlist, which is
the to-do app bought by Microsoft last year, he wrote a blog post saying, trash your servers and burn your code, immutable infrastructure and disposable components. Now, this is starting to get a little bit far-fetched for a lot of people, especially in the Windows world, because servers take a lot longer to come up, but it's catching
on in a very big way. We do this right now at my current start-up, and effectively what we do is when we release a new application, we do not release over the top of a current VM. We will spin up a brand-new VM, and then we will take that new VM and we will put
it in the load balancer and we will take an old machine out. So we are not changing the state of any production systems, we are releasing already pre-tested AMIs, which we will look at in a little bit, which actually lets us understand what is going on. It is a little bit better.
It was decided that maybe immutable infrastructure was not the right term for it because those functional programmers may get a little bit disappointed by using that term, so it has since been decided that maybe it should be called disposable infrastructure, because then we can actually think of it in terms of we bring it up in order that we can destroy
it at any point, and we are happy with that. And this leads us to thinking of our servers as cattle, not pets. I know it's still cruel to kill cattle, but we are much more at ease with that happening than we are with our pets being killed.
So let's look at some demos. For this demo, I'm actually going to use a tool called Terraform. Terraform is an open source tool, and it will allow me to start spinning up instances very easily. Let's go and have a look at AWS.
So by default, with Terraform, you have to declare a provider. There are lots of provider types supported out of the box with Terraform, AWS, Azure Service Manager, Azure Resource Manager, DigitalOcean, and some others I can't remember the name of, but it also will allow you Google Compute Engine and some giant stuff, et cetera.
All you basically need to do to point Terraform to your AWS environment is give it a region, an access key, and a secret key. I have those preconfigured in my environment, because I don't want to be giving away my key details, and I can actually go and say Terraform plan, and it will actually make
SSH key, sorry. So Terraform plan will allow me to take the current state of the infrastructure that's
specified on my computer in code, and it will test that against what is currently in my AWS account, and it will show me a list of everything that needs to change. I've actually started configuring out quite a large-scale environment here, but we're
going to go through it step by step. We're going to create an internet gateway and a VPC, a virtual private cloud, our own little space on the internet that we can actually start to launch our instances in. Inside a VPC, we're going to have three public subnets.
There is a much better way of doing this and laying this out, but we're going to go and create this right now, just so that we can actually see this in action. So those three public subnets is where we will put things like ELBs or anything that's front-facing. We try to be more secure and keep our instances and our databases and everything
in private subnets, which we'll look at. From there, we're going to generate a key pair, an SSH key pair, so that the system can SSH into the system. We're going to create some security groups. So this is like an internal security group that all nodes can talk to each other and
all ports, you would not do that in production at all. You would secure it and be a little bit better with it. We're going to create a NAT security group so that we can actually SSH in to our box into a NAT instance or a bastion box and then be able to get an SSH into our instances inside our VPC, because otherwise we can't SSH in because they're all in the private
subnets. We're going to create a couple of web servers, just so that I can actually show you that the whole thing comes together and it brings actually instances up and it adds them to a load balancer. And the last one, we're going to accept connections from the load balancer externally.
So let me just comment some of this out.
Cool. So I've just wrapped things and make targets just to make them easy for me. So what we're actually going to do right now is we're going to create a VPC, we're going to create three private subnets and we're going to create a write table and some
associations and an SSH key. Now if I go to the AWS portal, we will see that there's currently only one VPC there, which is the default VPC, but we're going to add another, hopefully.
So to run this, I just have to go Terraform, apply, and it will go off and start creating some information. There we go. So at that point, it's added ten new resources, which are the routes, it's added the VPC, it's added the subnets, it's added the SSH key, and I can actually go into the
AWS portal, if I refresh, I'll actually see another one right there called NDC demo. And then from that, I can see that there are actually now three public subnets in place and it's been very quick. It's instant. But if I want to change that, and I actually want to change the name or a tag of a subnet,
and if I replan, because there's differences in my machine compared to what's up on my cloud provider, it's actually going to show me what the difference is and what it's actually going to do at this point.
It's going to change that primary public subnet, the tag's name from one thing to the other, and then if we apply that, we can go to the portal in a second, and we'll
see that it's added the tag. Really simple, really easy. Now, the disposability of the infrastructure comes that we can destroy it and bring it
back up in exactly the same way. We have it declared exactly correct. So if I just say terraform destroy, and it will always ask you to confirm that you want to destroy, you don't want to be randomly destroying your production infrastructure, and I hit yes, it's actually going to start destroying everything
in the system. When it's complete, which it'll finish just in a second, I can see that the three subnets are gone. So immediately, we should be starting to have some light bulb moments and thinking,
hey, maybe I can spin up a test instance or a test environment, test my code, not having a long-running CI or staging environment, and then when I'm finished, tear it back down. People would do this with Docker containers, and they'd spin up some Docker containers, and when they're finished, they'd spin it back down, but they've
still got the underlying infrastructure on which it runs on. So let's introduce some more pieces to the puzzle. Security groups. Let's put those in, and let's actually launch an instance at this point.
This will take a little bit longer, because it's actually going to wait for an instance to come up. Terraform itself will SSH into that instance using the SSH key that I provided and told it where it lives, and it will run some configuration inside that instance. Now, we'll talk about best practices a bit later, but you have this as part of an instance,
you can add provisioners, and this is a remote execution provisioner, and it will SSH in, and it will just run as many commands as you need it to run. If this was a Windows instance, you would have to use something like winrm, but that's okay. You can configure that easily enough. and going back into the AWS portal, or the AWS web page, we can see that our subnets
are back, and if I switch across the EC2, you should actually see at this point that
an instance is being created. We have one running instance, which is initialising, because it's a brand new instance in this system, and it's called an NDC VPC nap. It has the security groups that we want it to have, and it's pretty fast to come up.
It's trying to connect in via SSH right now, and when it connects in via SSH, it will run a couple of commands, and then it will shut itself down. Not the box, it will shut the SSH connection down. Make sense so far? This is quite simplistic stuff that we're doing here. Usually you would have to go to the portal, and we would have to go something like launch
instance, and then we will choose the correct, we will choose that one, and then we will choose a T2 micro, and then we will configure the instance details. Let's put it in the NDC demo, and already I'm fed up. If you want to do that for one machine, that's okay.
You want to do that for 100 machines, you're probably not really going to do that. That's going to be painful. It's going to be really painful. Just see that it actually connected right here, and once it's connected, the creation is complete. It knows exactly what's happening, and if I did a Terraform plan, it knows that the
state of the infrastructure on my machine is exactly the state of the infrastructure remotely. Hello. They are in a GitHub repo, so the question is, are these Terraform scripts available online? Yes, they are. I'm going to show you the equivalent in Azure.
The Azure provider is not fully complete. It's not complete in comparison to the AWS one, but it will still show you how to use it in Azure, and I will also show you another cloud provider called DigitalOcean. We'll just finish this demo first, and then we'll move on. So we can see that the infrastructure is up to date, and that we know that that's
the case. Now, let's add another piece in. Private subnets. This is where you can start to build a dependency graph, because I need a nap instance up and running in my environment to route traffic to the outbound Internet.
So I can use a tag in my subnet resource called depends on, and it will wait until that nap instance is up before it starts to create these subnets, and we're just going to create three more subnets, and we're going to create another route table, and some right associations to make sure that the subnets can go out.
And then at that point, we're actually going to create three more instances. I would usually use the splat syntax for this, but I want to show you that three separate instances come up, and from there, it's going to install Ruby. It's going to download from Dropbox a small package that James and I wrote last year
which just basically shows the characteristics of the machine, like the IP address, the machine name, et cetera. It'll actually unpackage it, and it'll install, and it'll start it up. And it'll do it on all three. And then we can output all the IP addresses at the bottom here so that we can see what
the IP address is, what comes back from Terraform, whether they actually are correct in the AWS console just to prove that it's all actually hooked up. Any questions so far?
So the question is, is that right now we're scripting it out one by one? Is there a way to do it more succinctly so that we can look maybe or do it in another way? Yes, there is, and I'll show you that way. This is just to show you that I can create three actual machines and that all three machines will add themselves in. So you can introduce what's called count, and it will create 100 of those machines
of that machine type, and it will go off. I actually think there may be a limit on the AWS API and how many you can spin up at once, but I can't remember off the top of my head what that limit is.
I think somebody said there's maybe some concurrency issues when it gets to 200. I don't know. I'll check into that, and if I find that's the case, I'll tweet it out. Maybe not in this.
And it'll go straight up, and it'll create 100 in as many batches as it needs to or all at once. Now, the interesting thing about this at this point is that it's not doing this serially.
So you'll see it actually is doing private two first, then private one, then private three. Because of the dependency graph which it draws for you, it understands in what order it can actually bring things up and then concurrently apply.
And I'll show you the graph in one second when it's just finished installing these boxes. But creation complete, and there we go. Now, if I make graph, and if I make show, we can see an actual graph of how our
system is laid out. So the provider is AWS, and the key pair and the VPC depend on the provider, and then the security groups and the subnets depend on the VPC, and the instances depend on the security groups and the key pair, and you can start to build up the knowledge
of how your system looks. And because it was actually able to understand that there were three instances, it knew that it could actually create some instances around the same time. So they're in the same level in the graph. All their dependencies lower are happening. The concurrency model takes over, and it can actually bring three up at the same time.
And it's actually SSH'd in, and we can see that there are some IPs in terraform. But we can't get to those IPs, because they're hidden from us. They're not actually on the internet, they're not public. So let's actually add in a load balancer.
Load balancers sit in the public subnets, because you want them accessed from outside, and you want people to be able to come in, hit your load balancer, and your load balancer will write your traffic to the correct nodes. And what we do is we can actually say, here are the subnets that the load balancer is allowed to talk to. Here are the instances that are currently in the load balancer.
We listen on port 8080, and we write traffic to port 8080, or you can make it 8080, write in traffic to 80, or 80 wrote in traffic to 88, whatever you want, it's not a problem. And then we want to give it some health checks to make sure that the actual application is ready to go in the load balancer before it actually shows up,
and we can write traffic to it. So at this point, we say, I need two healthy checks on the system, two of the targets actually to return 200 before I add it into the load balancer. And then what we're going to do at the end is we're going to output the DNS name of the ELB so that we can actually hit it in the browser.
I'm just going to change the interval here so that they go in the load balancer faster. Any questions while that's running?
So to summarize, the question is, if there are two sets of Terraform scripts pointing to the same environment, will they interfere with the instances in each other's scripts?
The answer, as far as I believe, is no, because Terraform is aware of the state of the resources in which it's being managed. Okay, so we've got... Let me just apply this, and then I'll come back and show you that, okay?
Sorry, I tried to lower the value of the ELB too much. I was trying to be too cheeky. Here we go. So we got the load balancer up, and we actually have a DNS address. I'll come back and address your question in one second, I promise. And we can take that, put it in the browser.
We'll not see anything right now just because they haven't actually gone into the load balancer yet. And we can go and look at our load balancer. And we can see it actually has three instances assigned to it. They're currently in instance registration phase because it's actually trying to work out the health checks and see if they're ready to apply, to accept traffic.
And that'll take like 20 seconds or so. So if I go back and I show you... So it has a state file, okay? And the state file knows about all the resources that that specific Terraform configuration will run and manage.
So if it's not in this configuration, it won't touch it. Just needed a check there. The nice thing is, is that you can actually import current AWS infrastructure into this Terraform way of thinking.
So there is... A guy has actually created a Ruby gem called terraforming, okay? And if I say terraforming, it gives me a list of things that it will actually go off to AWS and get me, okay?
So let's just say EC2, because I know that there's currently four EC2 instances running. And if I say terraforming, EC2, it will probably crash. Oh no, there we go. So it's actually gone off and got the characteristics of all the nodes currently running in my environment.
So if you want to take your existing infrastructure that's managed in AWS and bring it in to being managed by Terraform, you can do that right now. There's a few resources that are missing, like redshift and bits and pieces, you know, the lesser used parts for compute in AWS. But it's all actually here that you can start to use it. Let's go back to the demo.
Does that answer your question? You good with that? Perfect. There we go. So our nodes are in service now. If I actually go back, I refresh, I actually have a node, which is, let's say it's 10.56.91, and that is actually private instance three.
So I've been able to bring up three instances. Have the security groups in place. Actually put the routes and the NAT instance in place so that your security groups can get outbound access to the internet to do whatever it needs to do. And let's say you're connecting to a third party API or you have to go and download
one of your packages from apt-get or yum repositories or something. It actually is allowed to do that. And then lastly, we've taken those, excuse me, put those nodes into the load balancer. And if I just go terraform, terraform destroy, it's all going to go bye-bye.
I go to the instances and you'll start to actually see them shutting down in a second. So for really, this has got a number of use cases. Firstly, just managing your infrastructure. It's in code. You can check this into GitHub or other source control providers are available.
But not only that, as a developer who maybe is not very aware of what happens in infrastructure, you can start to see how your production environments are laid out. And you can send pull requests to your ops team or your infrastructure team
or whatever you call them in order to say, hey, we actually need to open another port here. And you're immediately opening a conversation rather than having to have these silos of people where developers work in one way, ops work in another way, and nobody knows what's happening. You're opening up this communication right now.
But the second use case is for really short-lived environments, load-tested environments. Who load-tests their stuff in here? A couple of people. Okay. Because usually, the reason we can't load-test is A, we've probably not got a production-like environment,
or we don't have production-style data or a production database in place that we can do that. B, we've got no space in our current infrastructure in our office and we can't create another environment. Or C, our ops teams can't be bothered to create one. And as sorry as that sounds, that does happen.
I have been in that position. This is now why I do this. But if I go back and I go to my VPC, I could actually create another VPC.
So I can actually just get Terraform to manage lots of VPCs for me. Again, you wouldn't specify resources that look like this and duplicate them. You would start to actually refactor the code and create modules at this point. So let's have a quick look at a module. It's completely destroyed the entire environment.
26 resources have been destroyed. When we go across to AWS, instance state should all be terminated. There we go. So really fast. So let's have a look at modules.
So for the last two days, or the two pre-conference days, we actually gave a workshop on this and how to get started with all this configuration. But essentially, a module is a predefined template of code
that we can then pass variables into. Now, the module for a NAT box has got security grip for NAT and all the different ports that we believe it needs. It has an IAM role to make sure that it's a little bit more secure
than just allowing everybody in and out of it. It has a role policy, and you can actually go off and have a look at the policies and see what they are and all the different things that it does. And then it has a launch configuration. Technically speaking, I would never launch individual nodes on their own.
Never in individual instances. Everything should go in an autoscale group, and that autoscale group will reference a launch configuration, because then you know exactly the state in which it's able to come up. So this launch configuration is based off an AMI ID that we will pass in, an instance type that we will pass in,
and then it will actually run some user data on launch to predetermine exactly what we want to do in the environment. And then lastly, it's behind an autoscale group so that if any of our nodes... Autoscale in AWS, and if ever you hear the term elasticity in the cloud,
this is pretty much what people mean by elasticity in the cloud. If you tell AWS that you need three instances of a certain server or a certain type of application, and an autoscale group is in place, if one of those nodes drop out of the mix, AWS will go, hey, your autoscale group says you need three nodes. Let's launch another node.
And it will launch one in its place. And it will continue to try and keep the system at a predetermined level. Now, your health checks around it will determine how fast it comes back up. By default, it's actually 900 seconds. So that's 15 minutes. You don't really want a node out of the loop for 15 seconds,
but you can actually check it at any point. We've got it for 30 seconds. And that box, which is really important in the environment, if it drops up to 30 seconds later, a new node will be launched in its place. And then we can actually start to use these modules that look like this.
So we can create a management VPC, passing in all the variables that we need. We can create a management NAT, passing in not only the variables we need, but we can reference the variables that are returned from the management VPC. And we can start to build up some really reusable code templates.
If I really want to, I can take that, and I can actually say management VPC backup. It's pointing to my code location, my path. This can actually be pointing to GitHub, and Terraform will go off and download from GitHub
or any source control you want. And we can actually start to say, hey, this actually sits in a different CIDR block range.
And that will go off and use the same code to create two exact copies of the VPC. This makes managing our infrastructure extremely easy. Extremely easy.
And just to prove that it's not just an AWS, any questions?
So I think the question that you're asking is, if you need to create 10 VMs and they all need expanded volume hard drives, how can you do that with Terraform? Okay, let's have a look.
This is actually going into my production infrastructure right now. Here we go. Here's the launch configuration for one of our nodes.
And notice right there, root block device size. That will actually go off and create an EBS volume of the size that you require it to do, and Terraform will understand the EBS ID when it comes back. Because if you tell an instance to be launched with a specific EBS size,
AWS is already aware of the connection between the EBS volume and the instance in which it was launched from. So there's just some other configuration in there that just says, hey, don't just use the 8 gig default root size. Just actually expand that to whatever you want and you can just pass in a variable to say, make it 50 gig.
The question is, does the zero have the same thing? I don't know, right? I'm currently probably 33% of the way through automating the Azure Resource Manager API. When I get there, you will soon know and it will be in the documentation, I promise you.
I believe so. I believe you can... The way I believe you have to do it with Azure is you actually have to create a storage blob or an actual storage disk. You'll get the ID back from that and then pass that to the instance itself. That's the way I believe that it is, but you couldn't quote me on that.
Make sense? Exactly. And the dependencies are... If we go back to the original and if I show you the dependencies... Let's just get rid of that. We have syntax to say, hey, the VPC we know will have an ID,
but then the Internet gateway actually needs to know what the VPC ID is and it will actually do a lookup at the time in which it's created. This is why it creates that dependency graph because it says, hey, I'm aware that I need to create the VPC first because there are other resources that depend on it
because of the ID sharing and at that point it will pass that computer ID into the Internet gateway and that's what hooks it up. You would do exactly the same thing. You would say something like resource, Azure RM, storage disk,
test. This is completely the wrong resource, but I'm just showing you how you would do it. You would say 10 gig. Actually, it's Windows. Let's make it 100 gig. And then you would say something like resource, Azure RM, instance, test,
and you would say something like storage equals dollar Azure RM, storage disk dot test dot ID.
And that will intrinsically build the link between those two resources. Make sense? Yeah, so that goes into Splat syntax at that point, which I can show you examples of that. I want to just keep this... I don't want to confuse everyone and start showing them like counts and lengths and splits and lists and stuff like that because once you start to get to know this product
and start to get to know this way of scripting, that intrinsically comes. It really does. But yes, you can. You have a syntax that you can iterate over a list and pull out specific values at specific indexes in that point. Right.
So speaking of Azure, let's do Azure. Again, we have an Azure provider. You would pass it subscription ID, client ID, client secret, and tenant ID. These are a bit of a pain to actually generate. But once they're generated, you're good. If you have somebody who knows the Azure portal,
they'll generate those for you. Probably. With much crying and moaning. And let's start with our resource group. Let me just make sure I currently don't have any...
I want to make sure that the infrastructure's not up. I just tested the demo a little while ago. Look at this. It's much slower. Not pointing any fingers. Still hasn't combined... Oh, there we go. Okay, cool. So it has no resources right now. Good.
And let's go and we'll take out the virtual network for now. And we'll take out the subnets for now. And the first thing I actually want to do is everything in the Azure portal, from what I believe, correct me if I'm wrong, Azure users, you need a resource group for everything.
Everything must sit inside a resource group and everything is there. So that tends to be the first thing that we create. So let's go and we'll go make plan. And we'll go off and we'll make a cup of tea. This is my chance to slag off this API. I've been killing myself for the last two weeks on this.
We have some time. Does anyone want to dance or sing? Okay, so it's going to go and create... I should have actually just applied. Damn it. It's actually going to go and create a resource group in US West, West US is how they would say it, but it normalizes the string
because that's the slug name in comparison to what the Azure API takes. And let's apply it. And we should at this point be able to potentially see something happening.
Maybe, maybe, maybe. Are you timing it? There we go. So we actually have a test resource group. The fact that this works is I'm very happy, but this is my code. Well, it's actually this first one says code.
And then from there we can say, right, we actually want to take a virtual network and we want to take that virtual network and we want to put it in that resource group. And then we want to take some private subnets, again, using that syntax for now rather than using the count syntax,
and we want to take those three subnets, attach them to the virtual network and put them in that virtual resource group. Make apply and the word should stop for a second. The Wi-Fi in here doesn't really help at the minute,
to be honest. I'm only getting, like, two meg. So any questions while that's running? The question is, does the Terraform API support websites
and web apps and all those pieces of the resource manager? Not yet. Not yet. Give it a couple of weeks and it'll have a lot more coverage than it currently does. You can spin up VMs. You know what? I can show you the list that I currently support.
Source. GitHub. I should call it Terra.
It's all written in Go. So we depend on their APIs being correct. I'll come back and show you them in a sec. It's just a little bit slow at the minute. Load up all my resources. And we can see from this point, this resource has a virtual network,
and the virtual network, as we scroll through the lovely portal, has some subnets. And we can continue scrolling, and those subnets have the correct CIDR block ranges or address range, as they call it in Azure.
Really simple, really easy, but you're not having to go. And I swear I tried this because I needed to understand how this API worked. Go and add. It's just a slow there. And what are we going to call it? Test resource. Group one.
Oh, no, no, no, no. That was a different one. That was a different one. That was actually a different one, I hope. There we go. And from there, I go right. I have to add. What do I have to add at this point? A virtual network.
Let's just search for virtual and go. Is it a virtual network? Yeah, it's a virtual network. And then so on and so forth. Using portals is a lot more difficult. It will take a lot more of your time, and my session has expired. Jesus, Lord.
That is just amazing. That is awesome. So, yeah. The nice thing about this is it can be specified in code, but more importantly, Terraform plan is your friend. Terraform plan will tell you the changes
that your infrastructure has gone under either locally on your machine or upsource, because if you haven't checked out your Terraform configuration in like a week and you go Terraform plan and you see 100 resources are going to change, you probably know at that point you should update.
So it's just a sanity check to make sure that your systems are working. So let's do Terraform destroy. Oh, actually, you can still run that.
You can actually still see the resources and how they're ordered and what depends on what, et cetera. This is really useful as well. Great to help you understand how network infrastructure goes together. Terraform destroy.
Any questions? Good question. So the question is, instances are disposable, databases are not.
What is the best practice to separate that you don't destroy, for example, your databases? See this folder structure right here? One, two, three, four, graph, makefile and the two Terraform files? Inside that folder structure, Terraform will apply and find all the configuration it needs within that folder.
This folder is a completely different configuration. Therefore, you should start to see that maybe applications sit in one, databases sit in another, VPCs sit in another, and you can start to share resources between them. So the Terraform state file, instead of storing it locally,
you can enable remote state, and it will push that state up into a central bucket in a S3, for example, or console, or I think artifactory is another one. There's a couple of different backends. And then you can start to reference pieces of that configuration from other modules. I can show you a much more complex example of that, Ali,
and actually how to spin up databases themselves. Question off that? There are certain destructive actions that will happen. So if something actually is marked as force new,
it will tell you. So if there was an instance currently up for the NAT server, okay, and that was currently running, and I changed the AMI ID to one on the end rather than the seven on the Terraform plan, it will show me the plan to say, this is a destructive action, and it will force a new resource.
This is why Terraform plan is your friend. It will actually tell you what it's going to destroy before you destroy it. So you get to know that command line pretty fast, and you also get to look out for forces new resource. And it's in big, I know it's not great for colorblind people, but it's in massive red block font
alongside the resource name, just so it stands out there. And then just to show, lastly, that there's another provider. I use DigitalOcean quite a lot, which is a really simple cloud provider. It's just much more fast-moving. They say that they can launch an instance in 55 seconds,
which is quite good. And there's a provider which just takes an API token, and we can create an SSH key, and then we can actually create a DigitalOcean droplet. It's called a droplet, not an instance at this point, or a virtual machine. And it will be named Web1.
It will sit in New York City 2. It will be a 512-meg Ubuntu, and it will have a specific SSH key that goes with it. And if I go into, it will tell me at this point
it's actually going to create DigitalOcean droplet and a DigitalOcean SSH key. Make apply. This may fail. I think I have, like, five minutes.
Any last questions while this is happening? I believe someone who may or may not be in this room tried to test Azure API while it's on a plane, and it made something in the region of 2,000 API calls.
You're using the API that the portal talks to as well. It's a massively chatty API. Massively chatty.
So it's actually created an instance at this point, and if I go to DigitalOcean, you'll see there's a welcome screen, and there's no droplets. And there's still no droplets. There should be a droplet. Give me a droplet.
Excuse me? Should I? Yeah. Yeah, it says it created it. The screen's actually in there. It's waiting to refresh. The Wi-Fi's just gone.
Come on. There is no droplet. Weird. That might be a bug in Terraform. Yeah, I've only got one account. It's a free account. Yeah, I'll have to have a look into that.
But again, it's really simple to show you that that's the state. Now, if I actually make plan, it might actually say that there's no it. Yeah, it says the droplet is available. Weird. And then make destroy. It's going to get rid of it. All of these scripts are all going to be online. You have to supply your own keys, which is up to you.
I've been very careful. I have done it before, and I actually put my AWS keys on the screen while it's being recorded, and I've had to go straight and recycle those. So I've been really careful this time not to. So apply your own. Don't steal mine. And let's just... There's one last thing I want to talk to you about.
I have seven minutes. It's perfect. So when it comes to programming, Mary Poppendieck, who's a very famous agilist, actually said, how long would it take your organization to deploy a change that involved a single line of code, and can you do this on a reliable, repeatable basis?
This is not just for code or applications. This question should also be asked of your infrastructure. Okay? How long would it take your company right now to deploy a new VM because of just a single piece of code changing?
Can it even be done within your company easily? Think about that when you go back to the office. And try and work out what the answer is, and try and reduce it. The key about this type of thing is, if you measure the metric, you can actually make that metric better. But you need to be aware of what that value is.
This is fantastic for disaster recovery. You're continually checking that your environment works. And not only that, but it works to what you want it to be. You know the address ranges.
You know the VPCs. You know the number of instances. You know the launch configs. You know what AMIs everything is actually based off. If for some reason you're in AWS or West US, and their entire data center goes down, all you need to do is you need to go in and change the variable
which says the location of the region that you're in, hit Terraform Apply, and everything will come back up. Everything. In my current infrastructure, I leave the company on Monday morning, and I've left them in a state that, if the disaster happens,
and my beautiful nation of Ireland is ever destroyed, and AWS have lost their data centers there, it will take them eight minutes to spin up their entire infrastructure. That includes databases. That's a pretty good situation to be in. The most famous company that act in this way are Netflix.
Netflix will actively test what happens in a disaster. They've got lots and lots of tools, but their three main really well-known ones are Chaos Monkey, where it will go in, it will randomly destroy one of the servers, and they'll have things like auto-scaling groups
bringing it back up and making sure that everything works as expected. And then they've got Chaos Gorilla, which will actually simulate the outage of an availability zone in AWS. So, for example, Ireland is split into three availability zones, and you usually want to split your instances across those availability zones
to make sure that you have high availability. But Netflix tests that if we lose EU West 1, that the system still continues and recovers as expected. And then they've got the most best tool of the lot called Chaos Kong. And Chaos Kong will actually simulate
that they've lost the entire of a region and that they can failover very easily. About four months ago, there were some severe outages in AWS, US East 1, and Amazon were not affected. They just failed across to their other data center in US West 2, which they now run out of permanently in the US,
and no customer outage happens. Think about that when you go back. This type of tool will help you practice for that disaster. It will happen. You will lose your infrastructure. Some idiot, usually me, will destroy things by mistake and will actually allow things to go on.
And just to prove that I have complete and utter confidence in my infrastructure, right now, before I close, I'm just going to close my laptop in a second. I'm going to go into my production infrastructure, and I'm just going to destroy random nodes. And I'll just leave them, and I know they'll all come back up.
So let's take an admin API demo box. Let's take... Oh, yeah, sorry. Here's a list of all the nodes that are currently running in this region, and I'm going to destroy an admin API, a demo box.
I'm going to destroy a Mobi. I'm going to destroy a random public site. I'm going to destroy a website box. This is actually production. Instance state. Oh, no, not reboot. That's my flat fingers. Terminate.
That's how confident I am in my infrastructure. Using tools like this, this allows me to be very happy with what's going on. Any last questions? I'm actually going to start running web servers in my head
and not need them at all. Question. Next question. The question is, will they run in Docker? Yes, this stuff can run in Docker. Basically, a lot of people use Docker right now, in my opinion, for the wrong reasons.
They use it for deploying applications and packaging applications. That's not what it was designed for. It's very good at that. There are some very valid use cases for Docker. Build environments. Brilliant. It is so good for managing that. And also, development environments. Being able to check out code from GitHub and actually just saying,
Docker build and Docker run, and actually being able to spin up your environment is really good. Just FYI, Vagrant also does that. So the question is,
do I think you have made the right choice by not using Chef and Puppet? OK. Configuration management tools are very good. Had I not used configuration management tools,
I would not know how to create composable infrastructure. I wouldn't understand the benefits of managing my infrastructure. I wouldn't understand the benefits of keeping my infrastructure at a stable level. For me, the progression curve has been nothing. Actually, I take...
56-page Word document, and I swear to God, I worked at a company that had those. To automating it in PowerShell, having a single PowerShell script, but still somebody having the RDP into the box and running the PowerShell script. To configuration management tools, and running a puppet server, or a collection of puppet servers across 30,000 nodes, which is a lot, and it worked really well,
really well, to now go and, hey, why do I want to change the state of my production infrastructure? I know that when it's running, I don't need to tinker with it, because if I tinker with it, that introduces risk. If it introduces risk, I'm probably going to get woken up in the middle of the night when something goes wrong. I'd rather be
woken up in the middle of the night when an application fails, and then I actually know it's a real failure. So it's an obvious progression for me, but your mileage may vary. Okay. Yes, possibly. Question? Yep. Let's have a look. Of course you can't
see. Sorry. It's actually already started running them again. Autoscaling group.
Everything I deploy is behind autoscaling groups. I have an autoscaling group of three or four or five or two or whatever it is, depending on the application. As soon as one disappears out of the autoscaling group and it fails the health check, Amazon will go, oh, I have to launch another instance in order to keep that happy, which is why I was just happy just deleting nodes, and I'll do that all day long, and every
time I do it, people are looking, going, why are you destroying your production infrastructure? Because I know it comes back in exactly the same way every time, exactly the same way. So you go, it's running, and you'll see some other nodes up here are running, and you can see the launch time is at 4.01, so we
can see exactly what's happening, and if I refresh it again, they'll actually now have names. There you go. And you can see them all. They're all tagged, they're all really easy. Next. Yep. Yes. That's what I would do. Well, actually, I take that back. So the
question is, how would I get the AMI that I've built with my new software to launch through my environment? I have a Terraform script that when I go in to,
so for example, this script, I have a simple make script that says, hey, go and get the latest AMI for this Terraform configuration, and it goes off and populates it there and any others for any other environment, and the next Terraform run will go, you need to recycle the boxes and rolls it out.
Make sense? Okay. I'm going to stop because I could go on talking about this forever, and this gent is ready to come up and start building this environment. Find me. I'm around. Okay. I'll happily talk more about this. The scripts will be online. I will tweet them out, and I'll use the NDC hashtag. Go and try some things. If you need any help with it, message
me. I will more than happily help just because I really enjoy this type of environment. Thank you all very much.