We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

We can all have nice things: Patterns for Brownfield Automation

00:00

Formale Metadaten

Titel
We can all have nice things: Patterns for Brownfield Automation
Alternativer Titel
We can all have nice things: Patterns for BrownField Automation
Serientitel
Anzahl der Teile
50
Autor
Lizenz
CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Are you from a large and old IT organization? Do you support legacy applications that were lovingly built by hand in the distant past? Do you want to automate all of the things but feel it’s just not possible because you’re faced with a mountain of technical debt? Or do you think automation is too hard because you simply can’t rebuild your servers because you either don’t know how or because no one will give you new servers? Do you want to have nice things? It’s hard to know where to start a brownfield automation project and how to keep it going once it’s started. Adobe IT Web Platform Services had this problem and still has this problem. We used to build and deploy everything by hand. We had excessive configuration drift. We didn’t exactly know how to rebuild our servers. We would fat finger deployments and cause service outages. We had 19 different environments, all different, and all updates were pushed out by hand. We have a lot of technical debt. We’re better because we’ve tried to automate. We’re not yet completely automated. We don’t do CI or CD. We don’t even do automated tests. But we’re using Chef and our lives are better because of it. We’ve eliminated configuration drift. We’ve made rollout and rollback easier. And yes, we have nice things.
PunktwolkeGruppenoperationCoxeter-GruppeTest-First-AnsatzKontinuierliche IntegrationApp <Programm>Kartesische KoordinatenBenutzerbeteiligungMultiplikationsoperatorSystemplattformDienst <Informatik>Analytische FortsetzungSelbst organisierendes SystemMAPOffice-PaketZusammenhängender Graph
Kartesische KoordinatenTermFeuchteleitungWeb SiteKategorie <Mathematik>Faktor <Algebra>EinsWidgetLokales MinimumPhysikalische TheorieGraphNichtlinearer OperatorMultiplikationsoperatorFamilie <Mathematik>VarianzBildschirmmaskeKontrast <Statistik>
App <Programm>Wort <Informatik>PartitionsfunktionKonstruktor <Informatik>Motion Capturing
PunktwolkeModemVorzeichen <Mathematik>Dienst <Informatik>GarbentheorieKartesische KoordinatenProdukt <Mathematik>Web SitePunktDatenverwaltungEINKAUF <Programm>Mini-DiscGatewayInhalt <Mathematik>KonfigurationsraumGeradeBesprechung/Interview
KonfigurationsraumInformationEnergiedichteProgrammierumgebungMereologieGeradeSchlussregelIntegralRechter WinkelSchnittmengeQuellcodeRechenwerkProdukt <Mathematik>MultiplikationsoperatorDatenverwaltungTVD-VerfahrenSoftwareentwicklerGruppenoperationVersionsverwaltungKonfigurationsraumNichtlinearer OperatorKonfigurationsverwaltungFigurierte ZahlMathematikCodeDifferenteProgrammfehler
LastteilungWeb SiteKartesische KoordinatenServerDienst <Informatik>ExistenzaussageDatenfeldSynchronisierungKonfigurationsraumSkriptspracheVersionsverwaltungExistenzsatzGamecontrollerElektronische PublikationPunktQuellcodeEnergiedichteRechter WinkelFigurierte ZahlIntegralBildschirmmaske
MathematikTreiber <Programm>Patch <Software>PunktKonfigurationsraumElektronische PublikationMultiplikationsoperatorVersionsverwaltungProgrammierumgebungGebäude <Mathematik>AnalysisKoeffizientDivergente ReiheTVD-VerfahrenNormalvektorResultanteProzess <Informatik>Physikalisches SystemWort <Informatik>GamecontrollerMereologieEntscheidungstheorieGruppenoperationParametersystemBildschirmmaske
AggregatzustandCASE <Informatik>ProgrammierungDatenverwaltungOffice-PaketProdukt <Mathematik>Wort <Informatik>DifferenteInformationsspeicherungEinsProjektive EbeneNichtlinearer OperatorZahlenbereichSchreiben <Datenverarbeitung>SelbstrepräsentationTopologie
MathematikKonfigurationsraumProdukt <Mathematik>Kontextbezogenes SystemServerSichtenkonzeptEuler-WinkelMereologieOffice-PaketFormation <Mathematik>Spannweite <Stochastik>Quick-SortComputeranimation
GrundraumAnalysisRechenwerkElement <Gruppentheorie>Formale SpracheSpiegelung <Mathematik>SchnittmengeProzess <Informatik>Ausdruck <Logik>KoordinatenArithmetisches MittelMultiplikationsoperatorBildverstehenLuenberger-BeobachterKonfigurationsraumFlächeninhaltInhalt <Mathematik>Dienst <Informatik>ProzessautomationLastteilungMaschinenschreibenSoftwareentwicklerSoftwaretestMathematikFirewallSynchronisierungKartesische KoordinatenWeb SiteServer
KonfigurationsraumMultiplikationsoperatorKonfiguration <Informatik>KanalkapazitätGebäude <Mathematik>SystemplattformAuswahlaxiomZahlenbereichElektronische PublikationQuellcodePunktTVD-VerfahrenFlächeninhaltMessage-PassingKonditionszahl
VersionsverwaltungKonfigurationsraumGeradePhysikalisches SystemCodeBasis <Mathematik>Web SiteGüte der AnpassungTVD-VerfahrenQuellcodeGebäude <Mathematik>BildschirmmaskeWort <Informatik>Programmierumgebung
CodeDatenflussVerzweigendes ProgrammMathematikKonfigurationsraumMultiplikationsoperatorMathematikCodeProgrammierumgebungAttributierte GrammatikGeradeZeitreihenanalyseMereologiePhysikalisches SystemTVD-VerfahrenComputeranimation
Verzweigendes ProgrammResultanteMathematikSelbst organisierendes SystemWeg <Topologie>Mailing-ListeTUNIS <Programm>Kartesische KoordinatenZusammenhängender GraphKomplex <Algebra>MultiplikationsoperatorSchlussregelApp <Programm>ExpertensystemKontextbezogenes SystemSchnittmengeArithmetisches MittelGruppenoperationMereologieTermersetzungssystem
ServerMathematikProzess <Informatik>SpeicherabzugMultiplikationsoperatorDatenverwaltungVerzweigendes ProgrammEntscheidungstheorieBruchrechnungKonfigurationsraumMAPStandardabweichungOrtsoperatorProgrammierumgebungKartesische KoordinatenTaskKontinuierliche IntegrationVirtuelle MaschineSynchronisierungKanalkapazitätCodeProjektive EbeneMultiplikationDokumentenserverSystemplattformSoftwareentwicklerTopologieMinimalgradSoftwaretestMereologieSelbst organisierendes SystemSchreib-Lese-KopfInstallation <Informatik>SchnittmengeNichtlinearer OperatorKlasse <Mathematik>VersionsverwaltungZahlenbereichDienst <Informatik>Demoszene <Programmierung>Auflösung <Mathematik>OnlinecommunityGüte der AnpassungComputeranimation
Projektive EbeneKartesische KoordinatenComputerarchitekturDienst <Informatik>BildverstehenArithmetische FolgeEnergiedichteMomentenproblemRelativitätstheorieSchlussregelPhysikalisches SystemDatenverwaltungKontextbezogenes SystemElektronische PublikationAggregatzustandLeistungsbewertungBinärcodeApp <Programm>Gemeinsamer SpeicherServerKonfigurationsraumPhysikalischer EffektMusterspracheGebäude <Mathematik>
Web SiteCodeKartesische KoordinatenSoundverarbeitungSchnelltasteStandardabweichungElektronische PublikationMathematikPunktApp <Programm>CASE <Informatik>Installation <Informatik>DatenverwaltungServerProgrammverifikationAbgeschlossene MengeParametersystemDienst <Informatik>Produkt <Mathematik>SummierbarkeitGebäude <Mathematik>Teilbarkeit
App <Programm>PerspektiveSchreib-Lese-KopfNichtlinearer OperatorKontextbezogenes SystemSoftwareentwicklerAtomarität <Informatik>DatenverwaltungInhalt <Mathematik>MultiplikationsoperatorWort <Informatik>Formale SpracheBitrateComputeranimationBesprechung/Interview
DatenverwaltungProjektive EbeneDemo <Programm>Güte der AnpassungParametersystemProgrammierungModallogikEindeutigkeitTelekommunikationBitPunktBesprechung/Interview
Hill-DifferentialgleichungProjektive EbeneMultiplikationsoperatorGeometrische FrustrationFormale GrammatikProdukt <Mathematik>FlächeninhaltGebäude <Mathematik>Motion CapturingOrdnung <Mathematik>SoftwaretestCASE <Informatik>Arithmetisches MittelIterationProzess <Informatik>DatenbankTermDatenverwaltungAdressraum
App <Programm>FunktionalDatenverwaltungProzess <Informatik>SoftwaretestAutomatische HandlungsplanungGebäude <Mathematik>Elektronische PublikationMultiplikationsoperatorGamecontrollerWurm <Informatik>Virtuelle MaschineOrdnung <Mathematik>Produkt <Mathematik>Test-First-AnsatzServerAuswahlaxiomRepository <Informatik>Kartesische KoordinatenCodeArithmetisches MittelDateiverwaltungVersionsverwaltungPunktMathematikRechter WinkelGrundraumSingularität <Mathematik>MereologieGenerator <Informatik>BildschirmmaskeGemeinsamer SpeicherDienst <Informatik>QuellcodeExistenzsatzTVD-VerfahrenQuick-SortFrequenzURLParametersystemOrtsoperatorBildverstehen
InformationPunktMultiplikationsoperatorRegulator <Mathematik>MereologieApp <Programm>Web Site
E-MailPunktwolkeInformationEndliche ModelltheorieMultiplikationsoperatorBasis <Mathematik>DatenstrukturZahlenbereichHydrostatikMereologieProjektive EbeneMAPPunktQuick-SortSchnittmengeKonfigurationsraumProgrammierumgebungBitDatenverwaltungDatenfeldTemplateBimodulProzess <Informatik>Elektronische PublikationEntscheidungstheorieDienst <Informatik>FirewallMathematikServerFunktion <Mathematik>QuellcodeSystemverwaltungFitnessfunktionProzessautomationSchlussregelLoginGamecontrollerProdukt <Mathematik>SchedulingVersionsverwaltungLastteilungVerzweigendes ProgrammAuswahlaxiomSynchronisierungTermersetzungssystemMultiplikationEinsZentrische StreckungBenutzerbeteiligungInhalt <Mathematik>Rechter WinkelAggregatzustandZellularer AutomatWasserdampftafelFigurierte ZahlSymboltabelleTVD-VerfahrenFlächentheoriePhysikalische TheorieWeb SiteGruppenoperationComputerspielGanze FunktionParametersystemNichtlinearer OperatorFehlermeldungInstantiierungBildgebendes VerfahrenPhysikalisches SystemKonfiguration <Informatik>RechenwerkEinfügungsdämpfungWort <Informatik>GeschwindigkeitEndlichkeitSommerzeitBildverstehenComputeranimation
Transkript: Englisch(automatisch erzeugt)
Hi. I'm Jeremy. I work at Adobe. When I first submitted this talk, I worked for web platform services and IT as a senior web technologist. I spend most of my time writing cookbooks and automating a couple of important Brownfield apps. The idea for this presentation came from my experiences with tech conferences.
I go to conferences like ChefConf and see really interesting presentations about awesome automations, and I think about my own apps and get a little sad. The awesome things I saw at these conferences just didn't seem possible for us in IT. IT is an old siloed organization, and we were supporting more than complex applications.
It just didn't seem like we could possibly match what younger and more nimble companies and groups were doing. There was always the question of where the presentation that really spoke to my problems in IT was. I wanted to have nice things like everyone else. I wanted to do continuous integration and continuous delivery
and do test-driven development and have all of the things automated, but it just didn't seem possible. We worked at it. We came up with some solutions that fell far short of that ideal, but it did improve our situation, and it turned out we could have some nice things.
I hadn't heard the term Brownfield until earlier this year. In U.S. law, a Brownfield is a former industrial site contaminated by hazardous waste. That contamination presents a significant barrier to the redevelopment or reuse of that property. This term works pretty well for technology. When we say Brownfield automation,
we're talking about an existing site or application that's old and outdated, that's got a lot of technical debt, and that technical debt presents a significant barrier to automating that application. The term works less well for tech because real-world Brownfield sites typically being abandoned industrial sites aren't in use.
No one's still making widgets in the factory or refining gas at the abandoned refinery. If we knock down the factory and start decontaminating things, no one really cares. No work is being disrupted. In contrast, our Brownfield apps are very much in use. They're getting traffic every day, and they're doing something important.
If they weren't doing something important, we'd just turn them off and run away. We can't turn them off. We can't disrupt business as usual. It's more like we have renters in our Brownfield, and we can't kick them out, turn off their power, or otherwise inconvenience them when doing massive renovations.
The opposite of Brownfield automation is Greenfield automation. It's where you're doing something absolutely new with no technical debt. The adobe.com site is a very popular site. It gets a lot of traffic. It's a top 100 site in the U.S. It does e-commerce. It can't ever go down.
It's also a Brownfield site. Adobe has had a website since 1996. Adobe has a lot of products and offers a lot of services. All of those products and services have a section on the website. We've purchased a lot of company that adds more products and services.
All these new companies have to be integrated into the website at some point. Some content is static and stored on disk. Some is served up from Adobe Experience Map Manager. There's somewhere around 200 applications that use adobe.com as their gateway to the public. All of these reasons are why our Apache configurations are 40,000 lines long.
Nearly half of that is just rewrite rules and redirects. We have 19 environments, including production. Prior to moving our configuration management to Chef, we had 19 different code lines and source control to manage all of those environments.
Changes were applied by hand to all environments by copy-paste operations. This led to fat figure errors, excessive configuration drift. Configuration drift led to non-prod environments working differently from production, which at times would increase bugs in production or require monthly war rooms
to try to figure out why some non-prod environment wasn't working like prod. It was very painful. Every time we spent troubleshooting a non-prod environment due to configuration drift was a minute that we were working on something that would never benefit a customer.
It provided no value to the customer, no value to the company, and it's totally avoidable. When you have a ground field site like this, it's very hard to figure out where to start. If you're in IT and you're siloed, there's going to be a lot of stuff that you don't control, such as server bills, firewalls, load balancers.
This lack of control can really paralyze your thinking. I've had a lot of disagreements with people where they thought there was no point in automating anything if it's not building a new server. If you have a server built and everything is already installed, why would you build a solution that installs what's already installed? That's a very fair question to ask.
You may have some automated deployment scripts or other tools that make things easier, but the existence of those scripts can set you down the wrong path because you think something's already being managed. We had a tool that would sync application configurations out to all the servers and then restart the services. Once it detected that files changed, if we're already managing configuration files,
why do we need to build something new to manage the configuration files? That's another very fair question. Our Apache configurations weren't source control. If we're already managing our source and source control and everyone knows how to use what we have, why do we want to switch them over to something that no one actually knows how to use?
These kinds of questions are killers. Going down these paths will usually result in you doing absolutely nothing at all. That's where we were for a very long time. The paralysis will also result in you pursuing bad solutions.
For example, we knew configuration driver was killing us. We needed a way to compartmentalize changes as they came in and then merge those changes into various environments on demand. We had a source control system, so obviously the solution was to use that system and find a way to merge those around using patch files
or doing a Jenkins build of the configuration file every time. Neither of those solutions ever got off the ground. We didn't solve the problem. It turns out we were thinking about this all wrong. We were accepting the premise that the technical solutions we had met our requirements.
Since we already had these solutions, there was this implicit assumption that it was a requirement to keep using those tools. The solutions met our requirements at some point in the past, but there was never any serious thought given to whether or not our requirements had actually changed.
Our requirements had changed. We just weren't aware of it yet. We had problems. We wanted to solve those problems, but we didn't know how to do it or where to start. Unfortunately, we didn't know what our actual requirements were, and we were asking all the wrong questions. We were jumping to solutions without understanding what it was we were trying to solve.
We needed to try something different. Our project had an IT project manager and all the representative stakeholders. That was typical. That was something we always did. What we did different in this case is we brought in an Agile coach and learned about Lean operations.
We had a session dedicated to learning how to write user stories. This was very new for us. We were an ops team. Agile and Lean weren't really something that we did. User stories weren't something that we did. I'm still really bad at writing user stories.
We didn't only have one session. We had several. We did exercises where we wrote down things on little pieces of paper and then voted for them with stickers. We built a product tree. We did an A3 problem-solving exercise. Those activities helped us to explore the problem even further.
I was skeptical about a lot of these exercises at first. They seemed very strange to us from an operations point of view. But what we got out of the experience was invaluable. It helped us reframe our requirements to what we needed them to be without being constrained by any solid technical solution.
It let us truly interrogate the problem instead of starting at the wrong place with faulty assumptions. As we started challenging our assumptions, we were able to focus on our real requirements as they existed today, not as they existed several years ago.
Being able to quickly, easily, and safely deploy Apache changes was still a requirement. That didn't change. What did change is that we started thinking about zero configuration drift as a requirement instead of as a nice-to-have. We wanted the ability to compartmentalize changes so we could make them once and deploy them on demand.
We wanted a way to be able to roll back a change quickly and easily. We also wanted a way to be able to build new servers in any number in minutes. We stopped worrying about how we were going to do it and instead focused on what we required.
Most importantly, it helped us to think about automation in the context of a minimum viable product. For that, that meant the least we could do that would create value. Once we achieved that MVP, we iterated on that product to create more value.
If we were trying to automate the adobe.com site in the typical IT waterfall model, it probably would have taken us about 14 months. That's 14 months of no improvements. We would have started by getting new servers, moving on to creating cookbooks to handle the base installation and configuration of those servers,
and then to install the platform, and then to install the application configuration. Somewhere in there we would have put together the automations to sync a test code, then we would have made firewall requests and load balancer requests, and then we would have finally released. Finally, after 14 months, we would have been automated, and that's probably very optimistic.
Such a massive change like that would most likely have had some significant problems that we wouldn't find until really late in the process. This is very much a bottom-up approach. You start with servers, build a foundation, and then keep building on top of that foundation. It's the way we typically do things in IT, and it's how we usually think of these big problems.
It's very much trying to fix a brownfield problem by doing greenfield development. It's a reflection of the idea that if we have something that works, we shouldn't mess with it. If it works, don't touch it. Build something else and then cut over. The MVP approach is maybe the most important thing that we did.
It let us get something of value out the door in a relatively short amount of time. Once we decided on an MVP, we were able to deliver it and train the team in about four months. It's hard to take that greenfield scenario and have an MVP that delivers value in a short time like that.
If you're building the entire stack from the ground up, the least you can do is really build that entire stack. A half-built server provides zero value to you. This is where groundfield automation, even though in many ways it's harder, makes it easier to deliver value sooner. It's not because everything is that much worse and therefore the improvement is greater.
It's because everything already exists. If you can automate elements in isolation and in place, you can slowly creep through that stack and deliver value as you go. We thought of our MVP as the least we could do to deliver value.
Value was very loosely defined to saving us time. There were a number of places we could have started. One option was base platform installation and configuration. Tackling that would save us weeks on capacity uplifts. We didn't have an automated solution for that kind of build.
The fact that we didn't have an automated solution made it seem like the obvious place to start. We had also recently gone through a capacity uplift, so the perceived pain was very, very great. Another option was just handling the Apache configuration files. Doing that wouldn't save us a lot of time on deployments.
We already had an automated solution in place to handle that. But digging deeper and past the superficial dismissal of, oh, we already have that solution, is where the option really became the obvious choice. We had a homegrown tool for deployments and everything was in source control.
The problem was that we had a bad solution. It was further complicated by the fact that we had a pretty good homegrown and purpose-built tool. It would safely deploy the configurations and if anything went wrong, it would stop what it was doing to avoid taking the site down with a bad configuration. But it didn't build configurations on a per-environment basis.
It was just a deployment tool. This meant we had 19 code lines that had greatly diverged with no good way to pull them back together and we were making all of our edits by hand. Our source control system didn't make handling code in this way easy. By taking a hard look at this and being willing to tear it all out, we were able to arrive at a much better solution, which was Chef to handle the configurations and Git and GitHub to handle the source control.
Moving to Chef and having Chef build the configurations at converged time using note attributes gave us the ability to collapse those 19 code lines into a single code line. Git and GitHub gave us the tools we needed to compartmentalize changes and promote them to environments on demand.
While we didn't save a ton of time on deployment, we completely eliminated that configuration drift. It saves us and others a ton of time getting five people in a room and having them set. Packaging something for a few hours just to discover we didn't apply a change is something that we could avoid in the future.
That's also a very, very expensive way to spend your day. That time savings isn't all we gained. We also gained the ability to see what was deployed and when it was deployed. A Chef managed component that could be used later in the larger run list
and a standard workflow that would apply to any other cookbook for any other application. This last one is extremely compelling. Once people knew how to merge a feature branch in Git and push the result up to Chef, we were in a situation where anyone in our organization immediately knew how to deploy any other Chef controlled app.
They weren't aware of it at the time, but because we were able to standardize that workflow, the overall impact of learning that one app was far greater than just learning that one app. It didn't mean they knew that they knew how to make a change to the app. We still needed the subject matter experts to do things like draft complex read-write rules or do serious application tuning.
Once that change was written and parked in Git, anyone could pick it up and deploy it. Naming conventions helped us keep track of inflight changes. The name of the branch indicated the type of change and where to find the corresponding tracking.
After four months, we had our Apache configurations under Chef Map Management, and we had a workflow that used Git feature branches and pull requests in GitHub. This didn't happen overnight. It wasn't a couple of sprints and then we were done, but it did make our lives easier in four months.
That's a fraction of the amount of time it would have taken if we were doing the project in the typical way. Two months after that, we had a Jenkins server that would sync changes to all landscape branches within minutes of a change being pushed to production. This eliminated the manual tasks introduced by our Chef GitHub workflow. The Jenkins jobs were created by a Chef cookbook that would create all of the jobs necessary to manage cookbook code in that standard workflow.
This put us in a position where onboarding new cookbooks was fairly easy. We could also modify the standard workflow by editing that cookbook and then rerunning it against the Jenkins server. This made managing hundreds of Jenkins jobs really easy.
Next, we worked on core cookbooks and platform cookbooks, as well as standard installation packages and a package repository. In this case, it was RPMs and Yums since we were running on Red Hat Linux. By the end of the year, we were rebuilding the Adobe.com cluster on new virtual machines by using those Chef cookbooks.
Essentially, we automated from the top down. When we got to the bottom, we rebuilt everything back up on new machines. These changes gave us a way to easily promote changes across multiple environments.
It eliminated the configuration drift, and it also enabled us to build new servers. We ended up saving a lot of time and a lot of work. It used to take a senior team member weeks to build up new servers for capacity uploads, depending on how many servers we were adding. Now it takes about five to ten minutes. Anyone can do it, and it doesn't matter how many servers we're doing at one time.
People did complain about having to deal with manual merge resolution, but that was really about it. If the worst thing that people can say about your application is that they have to sometimes manually merge things, you're in a pretty good position.
It's a huge change from multi-day war rooms or copying and pasting things. We didn't end up with continuous integration. We didn't end up with continuous deployment. We didn't end up with test-driven development, but we ended up with enough to greatly improve our operational capabilities.
We were able to have some nice things. With brownfield automation, you probably won't be able to get everything that you want. The problem is much larger than just technology. We don't have 19 environments because we want them. We have 19 environments because we need them. Eliminating that need is much bigger than us.
We can't make that decision on our own, but we can automate away the biggest headaches that come with that organizational decision. We learned a lot from this project. Here are some takeaways.
You're mired in a brownfield because you're stuck thinking in old patterns and in the context of your old tools. If you want to make any kind of progress, you must be willing to throw it all out and architect a new solution that meets your requirements as they exist now, not as they existed years ago when your app was first created.
You need to admit you have a problem, be willing to throw out the stuff that kind of works, and you'll be better for it. For example, our brownfield apps usually rely heavily on NFS mounts so we can easily share configuration files or binaries between systems. It's an old way of making manual management of applications a lot easier.
It also adds technical and cross-functional dependencies that can cause application breakages and slow down new server builds. These days it doesn't add any substantial value. Chef can manage the configuration files and packaging systems can handle the binaries.
Chef installing and managing everything locally on the server is faster and more reliable than an NFS mount. It also gives you a lot more flexibility. You should look for solutions that have multiplier effects. Standardized workflows with Chef and Git meant that if someone
understood how to make a change in one application, they knew how to make a change in all applications. Using Git and GitHub gave us great tools for compartmentalizing and promoting code, and it also gave us great tools for tracking the history of that code. Using Yum and RPM Mentor cookbooks could be more simple while also providing better tools for dependency management to package verification.
Installing from cookbooks and packages is also a great way to get a handle on how your brownfield app is built. I would wager that in many cases your documentation for how to install a brownfield application is not complete. It might have been completed at some point in the past, but most likely it's not complete now.
To build your application from cookbooks and packages, you have to understand everything that goes into making your application work, so it really solves that problem. You can't take shortcuts like rsync files from an old server to a new server or using NFS mounts.
Those practices just perpetuate and amplify a brownfield site and duplicate that toxic waste. You need to interrogate the problem. This can be hard to do because you're stuck in the context of what is and not what could be. If you have an agile coach or a leader or scrum master, development manager,
whatever, see if you can pull them in to help guide you in writing stories. Building product tree, going through an A3 head exercise, any of those things. I still think that some of those practices didn't really map very well to our operations role, but it got us to reframe our problem and led to real solutions.
Bringing in a third party, even if they're not an agile coach or scrum master, to get fresh eyes on the problem is a great way to reframe your understanding of your app. That's one of the reasons I like conferences like ChefConf and I like going to meetups when I have time. Outside perspectives can help you understand your problems in a new way and help you drive forward to a solution instead of sitting there mired in the past.
You might need to argue with a lot of people. You will need to argue with a lot of people and they're not going to think you're doing anything necessary, useful or any good. That's okay, it's frustrating, but you need to remember that if everything was
awesome, there would be no need for the Brownfield project in the first place. There's going to be a lot of arguments over priorities. You're going to encounter a lot of people defending old solutions simply because they already exist or because that's how it's always been done. There's going to be a lot of bike shedding. Training, demo days,
documentation, clear communication, management support are good ways to deal with this. It's not going to end those arguments, but hopefully it will minimize them. You should come up with an MVP and then iterate on that MVP. Think of the least you can do to add the most value of the shortest amount of time and then do that.
It should be something you control, something that doesn't have cross -functional dependencies and something that causes you great pain and frustration. If you can do something that removes cross-functional dependencies, even better. You should be able to deliver value in about four months, any more than that, and you might be doing too much at once.
That being said, it's okay if dates slip. I know management isn't always happy about slipping dates, but this kind of work isn't simple. In any manual process, there's a lot of implicit assumptions that everyone makes. If you don't capture and address those assumptions, you're going to have problems.
We actually had to roll back the product release a couple of times because we missed things. We had assumptions, we didn't capture them, we didn't address them, and we would have failed if we had gone forward. That's okay. We ended up with a better product in the long term, and it actually
worked because we were willing to admit that we missed something and we needed more time. You need to be allowed to fail, regroup, and try again. There's no silver bullets or easy solutions. There's just experimentation and then iteration.
So far I've only mentioned testing in order to say we did do test-driven development. That doesn't mean we don't test things. You should test as you go. You should find a way to test what you're automating. It's probably going to be a lot easier than you think it is. It's just not obvious. If you are automating in place, doing a top-down automation test kitchen isn't the
natural choice because you can't build that full stack and then start the application up and shoot tests at. But because of the limited scope of the initial automation, you can probably find something very easy to test. In our situation, since we were just managing the Apache configurations, we couldn't
start up an Apache server and perform functional tests against the running app. However, we could diff the Chef-generated files against the file storage source control. If the files were identical, we knew they were the same. This also served as a way to make everyone a lot more comfortable with the initial rollout of Chef on our Adobe.com servers.
This is one advantage that you have with Brownfield. Since you're building something that already exists, your job is to make sure that what's built is identical to what exists. Your functional test isn't an application functional test. It's an automation functional test.
If it looks the same and tastes the same, it will probably work the same. Hopefully it smells a little better. You should also be prepared to pivot or reshuffle priorities after you deliver that MVP. Each time you deliver something new, you may discover that assumptions you've held and plans that you made just no longer apply.
You may discover that something you just delivered, while it does solve some major problems, also opens a giant can of worms. Our cookbooks to generate our Jenkins jobs for code management were originally out of scope because we thought
we could get away with just cloning existing jobs every time we needed to onboard a new cookbook. This worked fine until we came up with some enhancements to the workflow that required us to rebuild most of the jobs, and then suddenly cloning jobs really didn't seem like a great idea. Prioritizing the automation of the management of those Jenkins jobs over other deliverables, such as RPM repo segmentation,
then we could iterate on our Jenkins workflow and push those changes out to all our cookbooks fairly quickly. We've done that a couple of times. I wouldn't want to do that without the cookbooks because at this point we have hundreds of Jenkins jobs to handle these cookbooks.
RPM repo segmentation, to keep RPMs that weren't approved for production out of visibility to production machines, on the other hand, started out seeming like an extremely important idea.
If your production machines can't see the new code, you can't accidentally deploy the new code. In order to accidentally deploy that code, since we versioned all of our RPMs, you would have to violate our code control policies. It just turned out we never really had an issue with that.
By prioritizing the Jenkins jobs and being willing to reshuffle those priorities, we were able to deliver value to our team and to the rest of the company much faster than if we had just gone ahead with the original plan.
Ground fuel automation is hard. It really has to be done because if you don't start removing that toxic waste, it's going to eventually kill you. But it's hard. It means incremental changes in place, slowly but constantly making improvements and decontaminating the site until you have something that's much more manageable and much less toxic.
Eventually, you'll get to a point where you have enough automated that you can rebuild the entire thing as a greenfield app. This will save you time, it will improve your ability to manage your application, and it will open up opportunities for further improvements. We can all have nice things. We just have to be realistic about what we can do and what we should do.
There's my contact information, and we have time for questions. You had the Apache configurations in source control already. Did you move them into
a cookbook, or did you leave them where they were when you started integrating them? The question was, since we already had the configurations in source control, did we leave them in source control, or did we move them into a cookbook? We moved them into a cookbook. All of those configurations now are in a collection of Chef cookbooks.
The cookbooks do parameter replacement on those configuration files as the node converges. We take advantage of environment overrides to make configurations that are special for each environment.
That also transitions those cookbooks out of the old source control and into GitHub. Because we have significant configuration drift, we had to choose a set of configurations which were important as the starting point.
We chose production. We realized that we really don't care about anything that's not production. Other people care about that, but we support the customers. The customers expect things to work. We're going to start with production. That did cause problems when we went to roll out Chef on the downstream environments, on stage 2 and
the QA environments, because there was a lot of drift in there which was put in to make things work. By converging it with Chef, we got rid of all that drift, and then things broke. That was a good place to be because it meant that we were now consistent. Was there any piece of the infrastructure that was particularly painful to move to automation?
The question was, was there any piece of the infrastructure that was really painful to move?
I think the most painful part about it is that we were able to automate by starting at the top and going down and then rebuilding new servers. That was actually not horribly painful. The most painful part is that we can build our servers now, any number of
them, in 5 to 10 minutes, but we still can't get load balancers configured or NFS exports for the big static content, dock route, or firewall requests to outside services configured in that amount of time as well. Sometimes it does feel
a little pointless that I can build my servers in 5 minutes, but I still have to wait 3 weeks. How are you highlighting that back to the business? How are you pushing that back up the stack? The question is, how are we pushing that back up the stack?
If you have a pain point like that, you can build something in 3 minutes, but then you can wait 3 weeks for the rest of it. We talk about that with our management. We bring it up with the various teams, and they're working on various solutions to implement APIs to allow us to go out and do that.
That's a slow process. They're automating, they're dealing with their own ground field as well. With this application, were you all allowed to take advantage of other change management rules, or were the same change
management rules that existed in the enterprise, the ones that were moved forward with the higher automation, higher velocity, higher frequency? The question is, change management rules, could we take advantage of better change management rules or make our own? That's one of the nice things about dealing with the www site, to begin with, is that because rewrite
rules changes to the external content, doesn't fit on the typical schedule of marketing wants, a press release, out now, or a product released out now, doesn't matter if you're in financial blackout time, that kind of thing.
We're able to push those changes out without a change control. Larger changes, such as rebuilding the entire site, that still went through the typical ITIL or RFC model. But changing, making small configuration changes, we can do that just at will. We actually took advantage of this to change the way that we
accepted changes from other people and then we would push them out the other side because now we had a way to actually bundle those up as nice little atomic changes and bundle them up and structure them so we could get them out the door on a weekly basis.
Instead of someone sending a bunch of tickets and then bugging people for every single one. Because we added more structure here, we were able to create a better change model for ourselves.
But it's not the typical, well, we were in IT RFCs and now we're in continuous deployment and not talking about ITIL anymore. That's not the story at all. Were there any existing Greenfield automation efforts going on at the same time and was there
any sort of conflicts? Did you work with each other or was it just starting in Brownfield? The question was, were there any Greenfield projects going on at the time and did we run into conflicts? And now there weren't really any other Greenfield projects that were going on.
Prior to this project kicking off, we did actually have a bit of an effort to do a little bit of a Greenfield project around release automation. That didn't work out all that well. So we managed to abandon that project and then we were told to use Chef, which turned out to be a good decision even though a
lot of people were pretty resentful of the fact that we were told to use Chef instead of being able to discover it on our own. Could you elaborate a little bit on how you were using Chef to manage your Jenkins
jobs? I imagine you templatized those and maybe identified some of the problems you had with that. One problem that I had with that is that I wrote the cookbooks before I discovered the Jenkins DSL module and I already had everything templatized and I wasn't going to rewrite it.
But basically what I do is I have templates, so I would create the Jenkins job, get it to how I wanted it to be, I'd suck the configuration file out of the Jenkins server, then walk through that template and parameterize it.
And in ERB files like that you can actually add loops, you can add code, it's not as clean or readable as in other places, but you can do it. So after I had the jobs all templatized in the Chef cookbook, we have a data structure that describes where our
Git repos are, what environments they go to, which branches we need to sync to, and that's all in a data bag. So when Chef runs to create those jobs, it burns out those templates and just
walks through the data bag structure and basically just builds hundreds of jobs per cookbook. We ended up with hundreds of jobs and not less than that because with Jenkins you have the choice that you can either parameterize jobs, so you
can use one job multiple times, or you can just have a very simple job that does one thing and you can have a lot of those. I chose a lot of jobs to do one thing because I really hate digging into job run logs to figure out what it was acting on. The biggest problem in dealing with that was setting up Chef so it could activate, create those jobs on the Jenkins server in a safe way.
You can turn on anonymous access, anonymous admin rights, and then it can create those jobs, but then anyone can log in and start messing with your bits, and that's not good.
So we had to set up an SSH key, get that into the job, get that authenticated to the Jenkins server, get that user created on the Jenkins server, remember to rotate that key, all those fun things, and then start building the automations to handle that.
You're pushing through the CLI rather than just writing to the... I pushed them out using the Jenkins cookbook from Chef, using the Jenkins job resource, and so yeah, that takes the template and pushes it through the CLI.