The process, technology and practice of Continuous Delivery
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 110 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/51010 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
NDC Oslo 201269 / 110
1
2
3
5
7
9
11
12
15
19
20
23
24
27
28
29
31
32
33
35
36
37
38
39
41
43
46
47
51
52
56
59
60
61
62
63
65
67
70
71
74
75
77
79
80
81
83
87
91
92
93
94
95
96
97
98
100
103
106
108
110
00:00
Software developerMultiplicationEquals signTerm (mathematics)Continuous functionContinuous integrationExtension (kinesiology)LogicSoftwareStatement (computer science)State of matterSoftware developerSoftwareDatabasePhysical systemCycle (graph theory)IterationMathematicsMultiplication signSet (mathematics)Matching (graph theory)Cartesian coordinate systemBitCodeINTEGRALOpen sourceInformation technology consultingContent (media)CASE <Informatik>Data miningWeightQuicksortProduct (business)Point (geometry)Presentation of a groupWebsiteOrder (biology)NumberTransport Layer SecurityFeedbackFundamental theorem of algebraPatch (Unix)BuildingChainSupercomputerMotion captureIntegrated development environmentDebuggerSource codeTerm (mathematics)TheoryConfiguration spaceFront and back endsCategory of beingSoftware testingDesign of experimentsExtension (kinesiology)Self-organizationMaxima and minimaError messageArithmetic meanWater vapor2 (number)Context awarenessEnterprise architectureWeb 2.0Operator (mathematics)Continuous integrationLogicInternetworkingWaveUsabilityFrequencyComputerAnalytic continuationComputer animation
09:40
TheoryAutomationSoftwarePlastikkarteSoftware developerContinuous functionControl flowRevision controlBinary fileIntegrated development environmentMechanism designSoftware developerSoftware testingQuicksortPhysical systemSweep line algorithmProduct (business)CASE <Informatik>Basis <Mathematik>Multiplication signMereologyPosition operatorGoodness of fitBitContext awarenessGroup actionRight angleArithmetic meanOperator (mathematics)CodeExecution unitPoint (geometry)Cycle (graph theory)State of matterFeedbackUnit testingAnalytic continuationSoftwareServer (computing)AutomationProcess (computing)Sound effectRevision controlIterationLevel (video gaming)MathematicsSelf-organizationNatural numberInformationDataflowIntegrated development environmentFrequencyCommitment schemeSet (mathematics)Performance appraisalFigurate numberExistential quantificationVideo gameComplex (psychology)Scaling (geometry)INTEGRALBinary codeCausality1 (number)Fundamental theorem of algebraInformation technology consultingDivisorInstance (computer science)WeightInterior (topology)Service (economics)Rational numberComputer animation
19:19
Continuous functionBinary fileMechanism designIntegrated development environmentSoftware developerLocal ringRepository (publishing)Category of beingSoftwareDatabaseLevel (video gaming)BitConfidence intervalSubsetSoftware testingProjective planeIntegrated development environmentPhysical systemSpring (hydrology)Linear regressionCartesian coordinate systemCompilerQuicksortSoftware developerGraph coloringConfiguration spaceMereologyFeedbackReal numberLibrary (computing)Source codeLine (geometry)Operating systemLocal ringMechanism designPattern languageRevision controlElement (mathematics)Unit testingExploratory data analysisFluid staticsChainPasswordDescriptive statisticsRule of inferenceUsabilityMathematical analysisInjektivitätWeb serviceoutputRepository (publishing)Configuration managementWorkstation <Musikinstrument>Digital rights managementTerm (mathematics)MathematicsCycle (graph theory)Process (computing)CodeINTEGRALFunctional (mathematics)BuildingMultiplication signSet (mathematics)10 (number)CASE <Informatik>NumberPoint (geometry)LeakMiniDiscDifferent (Kate Ryan album)Context awarenessWhiteboardBranch (computer science)Computer programCommitment schemeProduct (business)Execution unit1 (number)Binary codeDegree (graph theory)Exception handlingPoint cloud2 (number)Mathematical optimizationSheaf (mathematics)AutomationContinuous integrationArithmetic meanMultitier architectureMessage passingWeb 2.0Key (cryptography)Cloud computingError messageHand fanFunction (mathematics)Classical physicsCovering spaceAnalytic continuationControl systemComputer animationLecture/Conference
28:59
Continuous functionLocal ringSoftware developerRepository (publishing)Component-based software engineeringPhysical systemTerm (mathematics)Physical systemRepository (publishing)Software testingRevision controlSoftware developerIntegrated development environmentMultiplication signJava appletBasis <Mathematik>InformationConfiguration spaceMathematicsMaxima and minimaBuildingSubsetOperator (mathematics)Quicksort2 (number)Web 2.0Interactive televisionTrailUser interfaceLevel (video gaming)Figurate numberDifferent (Kate Ryan album)Focus (optics)Cartesian coordinate systemMereologyNormal operatorDigitizingTotal S.A.Unit testingMessage passingIterationSet (mathematics)Single-precision floating-point formatDecision theoryExecution unitPoint (geometry)State of matterStructural loadProgram slicingProduct (business)Film editingHuman migrationEvent horizonFunctional (mathematics)Equivalence relationBitCodeNumberProjective planeSoftwareComplete metric spaceFreewareComplex systemRegulator geneFlow separationChainDigital rights managementFitness functionException handlingFrequencyBenchmarkProcess (computing)AuthorizationForm (programming)Information securityGame theoryForcing (mathematics)Bit rateDatabaseVideo gameServer (computing)Component-based software engineeringCycle (graph theory)Roundness (object)Control flowAutomation1 (number)Normal (geometry)CybersexDemoscene40 (number)Time zoneGoodness of fitComputer architectureService (economics)Computer animation
38:38
Continuous functionLocal ringSoftware developerComponent-based software engineeringDreizehnSoftware testingFeedbackBitMehrplatzsystemIntegrated development environmentInstance (computer science)Software testingDampingPhysical systemWhiteboardSoftware developerSpacetimeDifferent (Kate Ryan album)Patch (Unix)Cartesian coordinate systemProduct (business)Term (mathematics)Open sourceWeb applicationTime travelParallel portMathematicsCASE <Informatik>Dimensional analysisState of matterServer (computing)ConsistencySet (mathematics)Branch (computer science)SubsetMathematical analysisTraffic reportingPredictabilityAlgorithmGoodness of fitMultiplication signInformationProfil (magazine)MereologyNumberLevel (video gaming)FeedbackLink (knot theory)1 (number)Functional (mathematics)Projective planeRevision controlBuildingResource allocationDatabaseLoginDirac delta functionSeries (mathematics)TouchscreenOrder (biology)Human migrationGroup actionData recoveryPoint cloudUnit testingService (economics)Repository (publishing)Control flowRootFitness functionMultiplicationGraph (mathematics)PlanningResultantInformation managementChainSummierbarkeitPoint (geometry)Sound effectReading (process)Exception handlingComputer animation
48:18
Software testingSoftware developerLatent heatDirection (geometry)Software testingJava appletMultiplication signIntegrated development environmentUniform resource locatorPhysical systemSubsetInteractive televisionExecution unitPublic domainSemantics (computer science)Order (biology)Virtual machinePoint (geometry)Forcing (mathematics)Product (business)CASE <Informatik>Cartesian coordinate systemPlastikkarteCodeRevision controlFormal languageGoodness of fitScripting languageFunctional (mathematics)SoftwareSoftware developerOperator (mathematics)Server (computing)Moving averageQueue (abstract data type)Different (Kate Ryan album)Parameter (computer programming)NumberLevel (video gaming)String (computer science)Condition numberRight angleConfiguration spaceResultantCommunications protocolMereologyInformationRobotTerm (mathematics)Limit (category theory)Shape (magazine)Video game consoleGame theoryComplex (psychology)Directed graphCategory of beingInterface (computing)MathematicsGroup actionStatement (computer science)TouchscreenWindowPattern languageControl flowSubject indexingAbstractionRepository (publishing)Form (programming)1 (number)BitUser interfaceUnit testingFinite-state machineDrill commandsFeedbackDevice driverWhiteboardData recoveryTelecommunicationMessage passingGateway (telecommunications)Video gameLinear regressionQuicksortVariety (linguistics)Propositional formulaMaxima and minimaWordSystem administratorMathematical analysisCore dumpInstance (computer science)SurgeryAreaDefault (computer science)ProgrammschleifeVariable (mathematics)Service (economics)Phase transitionInsertion lossWebsiteView (database)ParsingSoftware bugStandard deviationGrand Unified TheoryBuildingProblemorientierte ProgrammierspracheComputer animation
Transcript: English(auto-generated)
00:00
Everybody's feeling wide awake and not as hung over as I feel this morning. Welcome to the second day. This morning I want to talk about the process, technology and practice of continuous integration. What I mean by that is I want to talk a little bit just in the general. This is a holistic approach to a development process, an agile process,
00:23
and I had the good fortune to work to get employed about five years ago to a completely greenfield startup. It was about the time when I was working on writing a book on this topic, and so it gave me the opportunity to explore those ideas from a clean sheet of paper.
00:44
I want to use the company that I worked at for that period. I left there recently, but a company called Elmax as an example and use that to just explore the approaches that we took during that engagement and use that to explain how it all fits together and how the process works
01:02
and the sorts of things that you need to think about. I know that the recommendations for giving presentations is to kind of have one idea that you hope that your audience will take away. I've actually got three ideas that I hope that you would take away, so please feel wide awake and I'll tell you what they are so you can kind of look out for them.
01:23
The first one is what I've just described. I want to give you an overview of the process and the approach to it. The second one is really fundamental. It's where this process came from to my mind, and that's about the application of the scientific method to computing.
01:43
It's all about feedback loops and trying to establish, build on what you know to be working, what you know to be good. And the third one is again what I've alluded to. What's really possible when you start with a clean sheet of paper
02:02
and apply these sorts of techniques fairly rigorously. So here's my agenda. I've started doing the context setting. I want to do a little bit of a theory just as a lead into what we mean when we talk about continuous delivery. And then using this example case and walk you through the process
02:24
that we developed in this engagement and show you some of the supporting tools and techniques that we use to achieve that. I've got about an hour and a half's worth of content to fit into an hour,
02:40
so I'll try and leave questions to the end. If there's anything really urgent, if there's something that's really not clear, wave anyway and I'll stop and take the question, but I want to try and flow through as quickly as we can. So who and what is Elmax? Elmax stands for the London Multi Asset Exchange. It's a retail trading facility and I was engaged by an old friend of mine
03:02
who was the CTO at the point of inception of this. And what it is, it's a financial institution. It's trying to bring retail trading to anybody that can sign up on the Internet and have the same kind of experience that financial institutions have
03:20
when they trade on large exchanges. That had a number of challenges in terms of high performance, computing and all sorts of things, but the reason why I think this is relevant to the presentation that I'm giving today is really just I want to paint you a picture. This is a fully-fledged enterprise system. This has all of the properties of a full-blown exchange,
03:43
some of the properties of a bank, all of the properties of a web company. So we were doing clearing and trading and matching and financial reconciliation, all of those things. This is a big complicated system. It's not a toy system that's just doing crude things to a database somewhere.
04:02
It's more complicated than that. So what's continuous delivery? So if you read the Agile Manifesto, it's the first statement, and really it's what it's all about. What is software? When is software important? It's nothing until it's in the hands of its users.
04:24
Software development process is all about delivery. It's all about getting software into the hands of its users so that they can do something useful with it, make some money, get some advantage, have some fun, whatever it is. It's kind of the logical extension of continuous integration.
04:43
So continuous integration has been around for quite a long time now. Can I just have a show of hands, people that are familiar with the concepts of continuous integration? So keeping the build working all of the time. It's really the extension of that into more of the process. So particularly at the back end, where in my experience
05:03
as a consultant over the years and in various other engagements, the deployment process was often a source of fairly serious errors. I spent too many late nights sitting up trying to patch stuff together because for some reason we'd screwed it up and the software wasn't deploying correctly on the day when we were trying to deploy it.
05:20
So trying to get rid of that cost and trying to make it smoother and simpler and less error-prone and less nerve-wracking a process to get the software into the hands of our users. Another way of looking at it is kind of a holistic approach to development. Actually, if you look at this as a whole process, it's about all of it.
05:43
It's how the business capture requirements, how that feeds into the development process, how the developers create software that's usable and how they get it into the hands of their users. It's the whole value chain really of the software development process
06:02
that it covers. I'm going to be talking more about the back end of this because that's generally the bit that interests people in this kind of audience and me too because I'm a techie. But it is more than that. There are things at the front end about what's a good way to capture requirements and how that feeds into the process that's also relevant.
06:23
A fundamental idea here is that every time that you make a change to anything that's going to affect your system in production, whether it's the source code, the configuration of the production environment, the tailoring of the configuration of your application for a particular deployment target, whatever it might be, you're giving birth to a release candidate.
06:42
From then on, the rest of the process is about evaluating that release candidate to see whether it's fit to succeed to get put into production. I've already made this point. Finished, you're not done until the software is in the hand of the users. Finished is not the point at which you throw it over the wall to the testing team.
07:02
It's not the point at which the testing team throws it over the wall to the operations team. It's when it's in the hands of users and doing real work that you're done. So why? The dream that everybody's looking for is that you want the shortest possible path from a business idea to getting that software into the hands of the customer.
07:24
Now, we all know that the process of software development and requirements collection and all that stuff is complex. So there's a lot of things that we need to do. We need to apply rigor and careful thought into this. But still, our goal should be to minimize the path of getting changes,
07:44
useful software, into the hands of a customer. A very, very useful and important metric is the cycle time. If you can imagine the smallest possible change to your code, application, configuration, whatever it might be, how long would it take you to get that into the hands of users?
08:02
As a consultant, I've worked in organizations where the minimum cycle time was six months. Any change would take six months to get into the hands of users. That's a ridiculous state of affairs. At Elmax, we didn't deploy this frequently. Very often, we deployed it every two weeks on iteration cycles.
08:22
But our cycle time was two hours. The minimum time that we could get the smallest possible change to our system into the hands of the users, having been fully tested, was two hours. Mostly, software isn't like that.
08:40
And how do we address that? Well, we need to start applying lean thinking. And this is a theme. This is not an accident. I mentioned about the importance of, for me, the scientific method. Lean was consciously, lean manufacturing and lean production and those sorts of ideas,
09:01
consciously lifted from the application of the scientific method to industrial and commercial processes. Us doing the same, taking lean thinking and applying that to software development, we're doing the same. We're building on the scientific method. You need to be able to get a theory, design an experiment, carry out the experiment, evaluate the results and iterate.
09:21
Fundamentally, that's what the scientific method's about. Fundamentally, that's what lean is about. Fundamentally, that's what continuous delivery is all about. And most are agile methods, to be honest. It's all about feedback cycles. It's all about trying to find out where we've gone wrong. The kind of assumption is that, as fallible human beings,
09:44
we're very prone to making mistakes. And so the best way that, as fallible human beings, we have found to avoid making mistakes is to make small changes, look at those changes, evaluate them, figure out how we screwed up, fix it and carry on. So it's all about establishing feedback cycles.
10:00
And I like to think of the software development process as kind of layers of onion skins of larger and larger cycle feedback loops. The tiny cycle, if you're used to doing test-driven development, then write a test, see it fail, write the code to make it pass,
10:21
see it pass, refactor, move on. That's kind of the smallest feedback cycle. That's kind of the unit test code in a feedback cycle. Outside of that, though, there's more to it than just the unit test. You need to specify the behavior of the system somehow.
10:41
And unit tests don't always do that. They're very focused on the detail. They tend to be a little bit too close to the solution level to effectively express the business-level intent of the solution sometimes. And so outside of that, you want to be able to do the same kind of thing, but on a slightly grander scale. You want to be able to specify a feature of the system,
11:02
assert that it's delivered effectively, and evaluate that. And then outside of that, the ultimate feedback loop that I was talking about, you want to be able to have an idea in your business, get that into the hands of the users, see how they react and change and modify.
11:20
So this idea of interlocking feedback loops is very important and fundamental to the way that the process works. As I said at the outset, I'm going to be talking principally about that end of the process. I'm going to be talking mostly about build and release because that's mostly what we're focused on. But please don't forget the other things.
11:43
They're part of the process and part of the flow of information and keeping those feedback loops as short as possible and as effective as possible is a continual effort on the part of practitioners of this kind of process. This is a shameless plug for my book, which I'm very proud to mention.
12:02
It won the Joel Excellence Award last year. I wrote it along with an ex-colleague of mine, Jez Humboldt. It's available at all good bookshops, including the one downstairs. So the principles of continuous delivery. If you want to achieve this sort of feedback loop,
12:21
you need to create a repeatable, reliable process for releasing software. That means that you can't afford manual works of art where Joe knows precisely how to configure this particular server to get it into the state because Joe might be on holiday the day that that server goes down and you need to replace it or you might leave and get a better paid job somewhere else
12:41
because you've been treating him so badly. To get over that, you need to automate almost everything. One of the practices of continuous delivery is kind of small-scale incremental automation. As I describe the system that we developed at LMAX, it's going to look very big and complicated.
13:01
It's going to look like a complex process. That's from kind of this end of the telescope. Actually, we started off very simple. Our first iteration, our first two-week iteration, we built the smallest possible feature that we could think of, and we delivered it into a production-like environment and people were able to use that feature.
13:21
That's an important fundamental goal. If you want to keep that feedback loop, you've got to include the business in the feedback loop as well as everybody else. You can't afford to build your continuous integration, your continuous development system for 10 weeks before you start delivering value. You're going to make people really nervous if you start doing that kind of thing.
13:43
If you want to automate everything, if you want a repeatable, reliable process, you've got to version control almost everything. Depending on how seriously you take this, it gets to the point where everything... In the ideal world, who would not want to be able to be absolutely certain
14:01
that every last bit and byte in their system as a whole was the one that they intended? Any one of those bits can screw you up. Any one of those bits can fail your system, and you don't know which ones. You want to be certain that it's the one that you intended, the one that you tested, the one that you evaluated fairly thoroughly through the life of this process.
14:27
A very weird thing about software development is that if something's painful, if it's difficult, if it's hard, do it more often, not less often, because that's the way to make it simple. If your releases are horribly fraught, painful process...
14:43
Sidebar. A few years ago, when I was a consultant, I worked for one customer that used to do releases at the weekends, and their release process was horrible. It was complicated. It was split up into various teams who would come in at different times of the day that was scheduled. There was lots of hand-over walls of documentation of how to do things,
15:01
and it took them the entire weekend to do any release, and they made a change, and it took longer than in the weekend, and then they were stuck because they couldn't release their software, that they had no out period in which to release their software. So if it hurts, do it more often. If you're releasing once every 12 months, start releasing once every six months.
15:21
If you're releasing once every six months, start releasing once every six months, and so on. There are many successful organizations with big complicated pieces of software that release on every commit. Every time you make any change, the system is evaluated by an automated set of tools, and if it passes all of those evaluations,
15:42
it's automatically deployed into production. That's not a prerequisite of continuous delivery. You can choose when to release, but the philosophy is correct. At any point, if a release candidate passes through all of the evaluation stages in the deployment pipeline, it should be releasable.
16:01
You don't necessarily have to release it, but you should be comfortable and confident that if you want to, you can. Like good processes, I'm going to say it's good and unbiased, but like good processes, it has lots of positive side effects,
16:23
a bit like TDD. If you do TDD, well, my experience has been that it drives better design. If you apply these principles, the principles of continuous delivery on a broader context to software development, you end up with better design and better processes to releasing your software.
16:44
That means that it's important to be rigorous about things. It's important not to sweep things under the carpet, so focusing on quality. If something looks strange, look into it. Our system at Elmax is a highly asynchronous, highly performant trading system.
17:03
Those sorts of systems are quite hard to test to set up the cases to evaluate and so on. We would have intermittent test failures. We spent a while being rubbish and not paying sufficient attention to them and just ignoring them. Every single time that we dug into them, there was an underlying cause,
17:24
some of which were genuine production problems. They were highlighting real problems in the code. It wasn't just a test artifact. A few of them were just test artifacts, but it's important to focus on quality. If anything looks anomalous, if anything looks wrong, if any code looks a bit more untidy than it should be, fix it.
17:41
Work on it. I was watching Uncle Bob yesterday talking, and he said, you should always leave a code base in a better state than you entered it. I think that's a very important philosophy. Done means released. This is a whole team thing. For this process to work, it's not about the developers doing the right thing.
18:04
It's not about the operations people doing the right thing. It's not about the business doing the right thing. It's not about the testing group doing the right thing. Everybody is involved in a release. Everybody has a role. Everybody needs to work together. Everybody needs the right sort of insight into the process and visibility of the process and so on.
18:26
It's very important. As I said, it doesn't really make sense to do this in a big bank. It's an incremental gradual improvement to get to a point where it works effectively. I left Elmax about two months ago.
18:41
My bet is that they've made five or ten changes since then to the process, just to refine it, because we were always doing that. That was just the nature of the organization. We would always be looking to improve things on an iteration by iteration basis. Some of the practices of continuous delivery.
19:02
An important one, build binaries once. If you're going to evaluate your release candidates, put them through a battery of automated tests, which is part of the process, you want to be fairly sure that the release candidate that you deploy into the production was the one that you tested. If you recompile to a particular target on deployment,
19:24
maybe there's a different version of the compiler. Maybe you're using different versions of the libraries that you're linking to. It's not the same as the one that you tested. Part of the output of the process is you should build the binaries once and then if those binaries are successful,
19:41
those are the ones that you test in all the subsequent stages and those are the ones, if they're successful, that you release into production. Use precisely the same mechanism to deploy into every environment. If you're deploying to the developer workstation, it should be fundamentally the same mechanisms that deploy the application as into your distributed cloud-based production environment.
20:03
There are differences to those environments, but you need to cope with those differently and I'll talk more about that as we go on. Smoke-tested deployment. This is again about the feedback loops. You don't want to deploy the whole thing and then say, does it work? You want to know that each stage of the deployment works. If you're laying down the operating system, you want to just verify that that's worked
20:22
before you lay down the database, before you lay down the application code, before you lay down the configuration. At each stage, just increase that confidence that everything's going to work well. That has the added benefit that if things don't work well, you know where to start looking and so it shortens your debug cycle. If anything fails, stop the line.
20:41
If during the process, which I'm going to describe the deployment pipeline, if as the release candidate is flowing through, anything fails, you throw that release candidate away. You don't go back and try and fix that release candidate. You go back to the head and fix it along with all the other changes and they flow through again. I'll cover that in more detail as we go through.
21:01
This is a description of the LMAX continuous delivery process. This is fairly typical and this is pretty close to the process that's described in the book. We worked as developer pairs and we would make changes locally and this is kind of a classic continuous integration cycle.
21:20
When the developers were happy and the build was okay, they'd commit their changes to the team's source code repository. There'd be a build management system that was monitoring the source code control system and when it saw new versions added there, it'd pull those down
21:42
and it would run what we call the commit build. The commit build is the same to a 99% level as any continuous integration build. The slight difference is that when it succeeds, it stores the binaries in an artifact repository
22:02
so that we can reuse those and ultimately deploy them into production if this release candidate is successful. Just like any other continuous integration process, if the build fails at the commit point, the developers are kind of sitting there waiting, they're going to fix it there and then. It's very important for the commit build to be as fast as possible.
22:21
Lots of people talk about 90 seconds being about the longest acceptable build. I can see that. It's certainly optimal at that sort of level. Our build on this project, as I said, is a very big project and our build, we built everything monolithically because it made the configuration management problem a bit easier.
22:44
Our build was about five minutes so it's a bit longer than that but it does mean that you need to focus on keeping it that fast. You need to make sure that the tests that you write in the commit build are largely not touching disk, not touching real messaging systems, not deploying to real web servers, anything like that.
23:01
They're just local process things that will run very fast and you can run tens of thousands of tests in that amount of time if you stick by those sorts of rules. However, you also want to be able to catch the vast majority of errors at this stage. This is a branch optimization algorithm, the whole process,
23:24
and really what we're doing at this point is that we're betting that if it passes this build, it's likely to pass the rest. You want it to give you a high degree of confidence that any release candidate that passes this stage
23:40
is going to be successful for the rest, or at least works. There are some added tests that generally we'll add in at this level. There's usually a simple smoke test of the application at some level that just said that it works correctly. If you're using something like Spring, which has the nasty property of not really knowing whether it works until you exercise it,
24:02
there are some tests that will test the Spring configuration or the juice wiring or whatever, the dependency injection, to make sure that that's correct as well. And there are some static analysis tests, so we would fail the build if the test coverage went too low. We would fail if we broke any architectural rules
24:21
if we tried to access the database from the web tier or something stupid like that. The test, the build wouldn't pass. If it all works, then these guys move on. They'll start working on something new, and the rest of the stuff that I'm going to describe is happening in the background. It's very important, though, to the process
24:40
that they're keeping an eye, they're aware of what's going on through the rest of the pipeline. As I said, they've kind of made a gamble. They're gambling that the fact that the commit build passed means that this release candidate's going to succeed. But they might lose the gamble, and if they do lose the gamble, it's their job to drop what they're doing and fix the problem, to address the problem that they caused with the commit.
25:03
One of the reasons for that is because this ends up, as you'll see, being a fairly expensive resource, and it's a whole team resource, and you don't want it tied up and being broken all the time because of changes. But another important thing is the sooner they get to fixing it, the faster they're going to be at fixing it
25:21
because they're going to have the context. They're going to understand what they changed. There's going to be a relatively small number of things that they changed, and so they'll know what's impacted the problem. Worst case, they can take a step back and look at the problem offline. The next stage in the process is acceptance testing.
25:40
I'm a huge fan of TDD in general, TDD in particular, and unit testing in general. But I've worked on projects where we only did unit testing, and on the sorts of projects that I was working on, that wasn't enough.
26:01
Unit tests assert that the code does what the programmer thinks the code should do. It doesn't really assert that the software does what the business think it ought to do. Acceptance testing is about that. Another name for this is functional testing. I like the term acceptance testing because it focuses on what we do.
26:21
If it passes this, it means it's functionally correct. It's doing the right things. This is monitoring not the version control system, but it's the artifact repository. This is going to be a resource. These are going to be slower running tests. They're going to be running against the whole system, and they're going to take a while.
26:45
If this was looking at the version control system and taking every build, it would be falling gradually further and further behind. So what it does, it kind of leapfrogs over. Each time the acceptance test environment becomes free, it looks into the artifact repository for the most recent successful commit build,
27:05
and it will evaluate that. On the assumption that all the ones previous to it, all of those changes are in there, so it's evaluating those changes. It can jump over. It does mean that you kind of lose the direct tie to who committed what.
27:20
That's a problem that you have to fix, you have to address, you have to track the collection of people that may have contributed to this failure. Modern build systems are doing that more frequently. When we started doing this, you had to roll your own and do much more of it yourself, but the modern build systems, Hudson, cruise control, Team City, those sorts of things,
27:44
they're doing this much more effectively now. If the acceptance test deems this release candidate to be good, if all of the tests pass, it tags the release candidate with a tag saying you've passed your acceptance tests.
28:04
I'm a huge fan of automated testing, and I've talked about repeatability. That's not all of it, though. I think that there is no place in using people for regression testing. I think that that should be automated. It's a repeatable process. It's deadly dull, and to be honest, people are useless at doing boring, repetitive, technically complicated tasks,
28:24
and there's not much more that I can imagine that's boring, repetitive, and technically complex than manually regression testing software. So automate that. What people are brilliant at is exploration, pattern matching, just touchy-feely things.
28:41
So you want people to interact with the system, and you want them to be able to say the colors don't quite work on that, or this doesn't line up right, or when I press these 15 stupid keys or put 32K of input into my password field, it blows up. You want people to try and do the stupid things. So exploratory testing and kind of usability testing are very important facets.
29:03
So I think having manual testing in the process is an important step for most applications, certainly for the one that we were working on, which had a significant web user interface as part of it. So we have manual testers, and they would pull release candidates
29:21
once they'd passed acceptance tests out of the artifact repository. There was no point in them looking before then, because before then, maybe the system didn't even start up. It would be a waste of their time. So once it's passed acceptance tests, it's free for them to go and have a look. They would pull those down. They'd use the same deployment tools. We put a user interface on top of the deployment tools
29:41
that were used on an automated basis for the acceptance test. I'm going to go into a bit more detail of those shortly. And they would use those tools to deploy it to a test environment and interact with the system. There was no human intervention other than they didn't have to call on somebody else.
30:01
They didn't have to call on an operations person or a developer to help them deploy the application. They chose the version that they wanted, where they wanted to deploy it, and press the button. And it worked. That's an important facet. When you give people tools like that, it changes their relationship to the software and the way in which they do it. We saw all sorts of things where the testers would treat things differently.
30:25
We had one nasty problem that we couldn't figure out where it was. And they did a binary chop of release versions going between versions to kind of home in on which it was. It took them ages, because there were a lot of versions. This book had crept in over a long period of time.
30:42
It changes the relationship. Demonstrations. We could demonstrate code within about 40 minutes was the duration of the acceptance test build. Within about 40 minutes, we could demonstrate any feature to anybody, because we could deploy in 30 seconds to any test environment we liked,
31:01
and people could look at it. The business liked this sort of behavior as well. If they're happy that this release candidate is good, they had a feature on the deployment tool that they could mark the release candidate, and that would add a different tag to the release candidate in the artifactory repository. You can see in the artifactory repository,
31:22
it's collecting information about the lifecycle of the release candidate. And what we're looking for, by the time we want the decision to release into production, we want to look for a release candidate that has the full set of tags that's passed automated unit testing, acceptance testing, manual testing, performance testing, and so on.
31:43
For us, at Elmax, performance was a critical thing. We were a low-latency trading environment. We were dealing in... Our turnaround edge of our estate, a message coming in, going out again, an average turnaround was one millisecond.
32:01
So it was important that we were... Performance was a critical part of our business. We were dealing with high-frequency traders. So we evaluated performance at two levels. We did component-level performance testing, which is essentially the equivalent of unit testing for performance. For performance-critical aspects of the system, we would write a dedicated performance test
32:22
that just micro-benchmarked that bit of the code. So if we did something stupid and made it go slower, it would just shout at us, and we would go and look, and we could fix it. But we also did whole-system performance tests, where we would start up the whole system in as production-like an environment as we could afford.
32:43
The performance-critical bits of that were identical to our production environment, so it was kind of like a thin slice of our production environment for performance testing. And we would run whole-system tests with load tests and destructive tests and all sorts of things. This was a great environment for doing all sorts of things. If we had a nasty event in production,
33:02
we could replay what happened in production in this environment and see what went on and kind of debug it and those sorts of things. Again, this is pulling release candidates out of the artifact repository and tagging them when it's successful as it goes through. So it's accruing this information about the lifecycle.
33:25
For us, we were controlled by the UK's Financial Services Authority, and so part of their regulation is to have a separation between the people that can develop the software and the people that look after it in production.
33:42
So we have a staging environment in which we evaluate the production release, where we could test this. Generally, what we were testing in the staging environment, because we tested the functionality and the performance of the system, really what we were looking at was the migration of the system, as was an intensely stateful system,
34:02
and as part of the deployment process, we had to migrate the production state into the new version of the production system. So we would take an anonymized cut of the production data for the release that we were going to do. We would do the release, evaluate that it all looked sensible as far as we could,
34:20
and if that was good, then we were fairly confident that, as well as everything else that we tested, the migration of the production data was also working. If that all worked, then we would migrate, we would deploy the system to the production environment if that was the candidate that we chose. The stuff that I was talking about, the FSA authorization,
34:40
this was actually secured. It was actually a bit more complicated than this. There was a separate hop where you had to have three authorized people that would say, yes, this release candidate has passed all of its testing, we'll migrate that into the production artifact repository, and then somebody else could release from there.
35:07
Another nice little sidebar, when we started describing this to our regulators, they were very nervous, because it doesn't fit into the normal operations. By the time we finished describing it to them, they were recommending what we were doing
35:20
as best practice to everybody else, because this is actually perfect for regulators. It's hard to convince them, because they're not used to seeing stuff like this. But if you think about it, what's better than a fully audited system? If you think about what's happening here, we've got an automated system, an automated process, from actually further up the value chain
35:40
in terms of requirements capture and so on, that feeds through so we could see who specified a particular requirement, who prioritized it in our story management system. We could track that through, because we tagged each commit with a story number or so. We could track the developers that worked on that story. We could track those commits,
36:02
and we could see that it had passed all of the commit tests, because it made it into the artifact repository. Again, that was tagged. We could see the acceptance test that it passed, the performance test that it passed, who'd looked at it in the manual testing environment, who'd looked at it in the staging environment, who'd authorized it to move into that. We've got a complete audit trail
36:22
from soup to nuts of every change in the system. The other thing that I haven't really talked on here, I've talked mostly about stories, because I'm a developer and that's mostly where my focus sits. It's not all that this is about. Any change to the system would go through the same process. We used to update the Java system.
36:43
We would update the version of Java on a regular basis. Pretty much when a new version came out, we stayed with the new version. That's simple. We'd just check that into the version control system. We had a manifest arrangement where each build had a manifest of the dependencies that it had. We'd say which version it depended on. It would pull that, it would run it through the deployment pipeline.
37:02
If it failed, we'd know that there was a problem with that version of Java. We did the same with the database, the same with the operating system, the same with the web server, the same with the messaging system. Any change to the system, any change to the configuration of the system would flow through this process and be evaluated in increasingly production-like circumstances until it got to production.
37:27
Brief thoughts, any questions at that point? That's a very hard thing to say. As I said, at the outset, we started with kind of the bare bones, the absolute minimum.
37:42
We had essentially a very, very trivially simple commit build and an acceptance test build that ran one test at the end of the first iteration. With the project at the point when I left, I was there for nearly five years. We were still changing it. We were still modifying it, enhancing it.
38:02
This is an investment. It's not something that you do and then forget about. You live with it and you cope with it. Keeping the acceptance test running, keeping the performance test running is expensive. They do break. When you've got a big complex system like this, it's an expensive exercise to go back and fix them sometimes.
38:22
Nobody that worked on that project thought that it was money poorly spent, though. But there's a heavy investment. If I was to guess, I would say that somewhere between small single digit percentages of the total development effort went into less than that, actually.
38:43
The tools and stuff. The acceptance testing was more than that. I would say that we probably spent five to ten percent of the development effort on testing and keeping the test going. Something in that order. It might be a bit more than that, to be honest. We didn't keep statistics on it. Sorry, there was another question at the back.
39:04
We did those two largely by... There's a good book, Agile Database Refactoring, or something like that. Forgive me, I've forgotten the title, but it's a good book. It's very good. Effectively, what you're doing is try and do additive changes
39:26
and do deltas. Each change, part of our commit build was a test that would do data migration changes. We would test that the database looked like we expected.
39:45
We had some kind of markers. If it didn't, it would fail. The way that you fixed that test was by putting a delta patch in that did the migration. We had a test that kept us honest to make sure that we did it. We built up these migrations. At deployment time, we kept a revision level
40:03
of what the last delta patch that we applied was. We applied all of the ones that were later than that and then updated it to the newest one. I recommend the book to you. It's quite good.
40:22
The whole thing is referred to as the deployment pipeline. That's a term that I coined. The reason I called it a pipeline was not because it's kind of a linear thing. It's because I'm a technology geek. It reminded me of processor pipelining. It's a branch prediction algorithm. What you're doing, as I said, at the commit stage,
40:41
you're betting on the fact that mostly, if it's passed the commit, mostly everything's going to be okay. You can move on and you can work. These guys, they wait their five minutes for the build to pass. They move on, they're working on something new. They're keeping half an eye on what's going through the rest of the process. If the build breaks, they've lost their gamble. They've now got to drop what it is that they're doing,
41:01
go and fix the acceptance test or the performance test that broke the build or revert the change or whatever it takes to make it good, to keep the build working. If it passes, they've won and everybody's happy. For a branch prediction process, that's all very well,
41:22
but I mentioned in passing that the acceptance test build took about 40 minutes. Actually, by the time that I left, there was something in the order of 15,000 acceptance tests, something in the order of 25,000 unit tests. The 15,000 acceptance tests, if you ran them all serially end to end,
41:42
probably would have taken more than 24 hours. I don't know, but lots of time anyway. It was a bit more complicated than that. The way that it actually worked from my slightly simplified picture was that when we got to the acceptance test stage, when a release candidate made it to the artifact repository,
42:02
the acceptance test environment, when it became free, it would look for the newest release candidate and it would deploy that to an acceptance test environment. This was a lightweight copy of the production environment. The more production-like you can afford, the better, but in our case, our production environment was kind of a cluster of 100 servers or something,
42:21
and this was about five or six. We then had a whole bunch of test hosts to farm out work and to parallelize the work. An important aspect of this is kind of the isolation of the test case. I said our system was a trading exchange,
42:41
and if you think about it at its root, the dimensions of containment of a case for trading are really the market that you're trading in and the user's account that you're trading in, at least it was in ours. In most problem domains, there are things like that
43:02
that will give you isolation in a multi-user system. In our case, what we did is that every test kind of started off by first creating a user and creating a market so that it could play in isolation from all of the test cases, just that user's holdings, just that marketplace, we could set it up into exactly the state that we wanted.
43:23
It had the amusing side benefit that one of the really efficient parts of our system was the ability to create users and marketplaces, and we had a number of external third parties that they were testing and they said, we think it didn't work, we created this user and it responded too fast.
43:41
It gave us the ability to run these tests in parallel. It meant that the application instance that was running in this environment over here was a little bit different in terms of its profile to the production environment because it tended to have lots and lots of markets,
44:01
but that was something that we thought was acceptable, an acceptable compromise to give us the test isolation that we wanted. And that's an important facet. One of the anti-patterns that I've seen many times in functional testing and one of the reasons why many people say that functional testing is hard is because they try and maintain a consistent data set
44:20
across all the functional tests. It's just not worth the effort. It's best to find a way of isolating each test case. It's a much, much simpler approach to the problem. So our test, we ended up, we got a build grid of about, when I left it was about 35 servers. It was kind of an in-house cloud and we could allocate these differently.
44:43
The host of the acceptance testing would report the results back. It would collate the data back from these servers. We wrote a little application that would farm out work to these servers so that it would run in parallel. And this is a little animation of the application. We called it Romero and it divided the tests up into a series of different groups.
45:03
So these are time travel tests. These are tests that need a dedicated environment. They can't use the shared environment because they're going to change time so they would affect other tests. These are parallel tests. At this stage, it looks like there are more of these and there are those. It's just the stage that happens to be in.
45:20
But most of the tests were parallel tests. Most of the tests you could run in parallel and they're running in that group. And then the sequential tests. These are tests that we would run. Again, they needed a particular version of the environment because they would do things like they would selectively fail bits of the system to test that our system was robust and failover would work and disaster recovery would work and so on.
45:41
And you can kind of see these things. Every now and again, it kind of reconfigures itself as the test profile changes and you'll move tests around. Because when we started, this was a relatively immature discipline, we did a lot of creating our own tools for this stuff.
46:02
So a lot of the stuff that you see, we kind of wrote from scratch ourselves. The situation has moved on. There's a lot of stuff in the open source community and commercial offerings that cover some of this space now. And probably if we were starting this project now, we would have used a bunch of those. But nearly all of the stuff I'm going to talk about, we wrote our own for.
46:21
Another problem, as I mentioned before, is that you want insight into the build pipeline. You want to be able to see what's going on across the board. And this is what we call big feedback. The top bar is the commit build and you can see who did the last commit, the comments,
46:40
the history of the last three commits over on the side, some links to some useful other tools. There are some subsequent stages. We did some static analysis after the fact and so on. Then we've got the acceptance test build and this is the Romero server, the thing that we just saw, the work allocation process to parallelize the tests. And then we got some branches
47:01
where we were running different versions of the application and reports from those. And then some kind of supplementary projects and some performance testing reports here. You can see there's a failure over there in the history that's just reported. This is what it looked like when things were going well, which was kind of most of the time. When things weren't quite so well, this is what it looked like. This was kind of, we had this,
47:22
we got big monitors at the end of the workbenches where the teams worked and it would be displaying this all of the time as well as some other graphs, some of the important feedback information from the application for production. But it was also available as a web application
47:41
that you could look at from your development system and you could kind of click through. So this was just effectively just screen scraping on top of Hudson. And if you actually wanted to go and see why this test felt you would click on there and it would go through to the Hudson instance and look at the logs or the Romero instance and look at the logs or whatever else that you wanted and it would navigate through.
48:02
Again, the first version of this was really trivial. One of the guys knocked it up one Friday evening, the first version of this, just as a simple highlight. And then over time we kind of evolved it and added features to it as we wanted. Another problem where you've got the acceptance test build
48:24
and there are multiple people that could have contributed to a failure is determining who and why and whether this is an intermittent problem or whether you just introduced something. And we had a woman on our team who, for a brief while, well for a fair while, played the kind of team's conscience
48:41
and she would used to nag us to be honest and stay on top of fixing the tests and she would do this regular analysis of the breakdown and stuff. Her name was Trish and one of the guys automated her and created AutoTrish which did an analysis of all this information and kind of cruised it down. So this is kind of build history across here
49:00
and you can see something fairly nasty happened at this point where there's a whole bunch of tests and the tests have flavors so you could kind of say roughly what functional areas and that will kind of give you some hints as to what we're talking about. It's also got a blame thing. You can't really read it but it's got the names of the people that could have been responsible whose commits contributed to the failure at that point.
49:21
It's also got a Trish index which is how often has this failed in the past? This one looks maybe a bit suspect because it failed once there and then passed and then failed so maybe that's intermittent. There are some others down here. Maybe it wasn't. Maybe it was just a breakage there and a breakage there but you would see patterns like that
49:41
and this was kind of a useful tool. Generally what would happen is that we'd have the big feedback at the end of the benches. We'd commit and we'd be working away and something would start going wrong. We'd start seeing some red on the board and so we'd drill down and look at AutoTrish and it'd start pointing the finger of blame and we'd go, oh that was me. I'll go and find out what I did wrong and why I screwed it up.
50:05
Acceptance testing. What's the right language for acceptance testing? Is it FIT? Is it JUnit? JavaScript? There's none of those. It's the domain language. It's the problem domain. If you're expressing the tests
50:24
that you're trying to assert the behavior of your system with in the language of the problem domain, that's the most durable language you can use. That's not going to change. If the requirement is when I register my credit card I can subsequently fund my account, that's the requirement. It doesn't actually matter whether that's delivered
50:41
using a web user interface or through named pipes. Named pipes don't exist. They aren't used very much anymore. Whatever else the underlying technology feature is, it doesn't matter. That's irrelevant. It's the business level feature that's important.
51:01
Our system had a number of different ways of interacting with it. There was a proprietary API through which you could talk. Our own user interface talked through that but also we publicized it and you could write your own bots to trade on our system through the LMAX API. We also had a fixed API which for those in the finance industry is the financial information exchange protocol
51:20
which is a common protocol for exchanging information. There were a number of other communications gateways to and from our system using a variety of different technologies. If we were to do the straightforward thing and write a bunch of tests that talk
51:43
directly to each of those technical interfaces of the system and then we change one of those technical interfaces to the system, we're going to break a bunch of tests. In order to fix those, if we've implemented that way, we've got to go and visit each one of those test cases and fix each of them.
52:04
Like most problems in software, it's better if you raise the level of abstraction. So if instead we had an interface that represented the interactions that we want to perform and the underlying guts of how we communicate that intent via the fixed API or any other protocol
52:21
is hidden away from the tests, that gives us a little bit more leverage. In this case, if our interface changes and the tests break, we just fix that point, that intermediate point, and it fixes all of the tests. We've got one place to fix. That's a step forward.
52:40
There's a good pattern called window driver. Effectively, you're writing a device driver to the points. That's true of your user interface as much as any other protocol of interaction with the system. So we would write a layer of insulation between our tests that abstracted the interactions, the behavior, and try to represent those in semantics that were as close to the problem domain
53:02
as we could get. We actually took that a step further. If you start looking at that, there's a bunch of common things. I mentioned previously that the common important concepts in our application for test isolation were marketplaces and users. We wanted each test to create its own marketplace
53:22
and its own user. But if you had a test that created marketplace US dollar, then the next time you tried to run that test, it created marketplace US dollar, that already exists. You need to alias the name.
53:40
We would create things. The system underneath would create something called US dollar 1237956243. The test would talk in terms of US dollar. We would have an aliasing layer. There are concepts like that that make this easier. What this leads you in the direction of is a domain specific language for testing.
54:01
This is a really important concept because this really starts to get to address the problem that lots of people complain about the difficulty of functional testing. Actually, this makes it much easier. This starts to get to a really valuable approach. You start to move away from thinking of these things as tests.
54:23
What they really are are executable specifications of the behavior of the system. There is a good book by Gojko Adzic called Specification by Example that covers some of this stuff. I also touched on it briefly in continuous delivery, but his book goes into more depth with it.
54:43
This is an example of the DSL we created. Calling it a DSL is a bit fancy, actually, because we ran it in a unit test. Really, it is just Java code that is kind of formulated in a way that makes it readable. If you were somebody that understood the problem domain of trading, this would make sense to you.
55:02
We would identify the channel here, the trading UI in this case, and then the operation that we wanted to perform and some parameters. We use strings as parameters. We had different versions of the language, different parses for the language for the DSL. This one, we are showing the DL ticket for an instrument called instrument.
55:22
We are placing an order, which is a limit order. The detail of what that means, if you are not familiar with it, it doesn't really matter. It is a bid. That means I want to buy four at the price of 10. That is fairly straightforward. Then we look for a feedback message. It is kind of high level. Anybody that understands the business proposition would understand the intent of this test.
55:43
Our goal wasn't to make these tests writeable by the business. Actually, we did have a few instances where business people wrote these tests, but that wasn't our aim. We still wanted people with the right analytical skills, the technical competence to think about these. We didn't want it to be very codey.
56:00
We tended to frown upon loops and variables in these tests. We wanted them to be essentially just scripts like this because it made them reasonable, passable and so on. That is another version of the test. This is a very similar test, but in this case it is using a different channel, the fix API.
56:22
As I said, beforehand, before each of the tests, we would create the instrument. That was a synonym for the market in which we were trading. We would create a user and then we would operate the test. Underneath, there was actually quite a lot more that we could do. Another thing that the DSL was doing is overriding a lot of the features. It is giving us some standard default behaviors.
56:42
If we wanted to be very precise about the order that we were placing, we could specify more detail. We could say there is a trigger, a point at which we want to do some other behavior if the price in the market reaches this. There is a stop. If my order is going against me, I want to take my profit
57:01
or eliminate my losses and so on. I could be much more specific, but I would give sensible defaults as well. If in the majority of cases that were placing order, you were just trying to get a market into a decent shape and you weren't too bothered about the detail, you could just do the minimal parameters into the place order instruction and it would do the right things.
57:22
The DSL also hid the complexities of interacting with a fully asynchronous system. It would make things look more synchronous, which is important from a testing point of view because if you're logging in and then you're going to try and place an order, you want to be sure that you've logged in before you start placing the order otherwise the place order is just going to fail
57:41
just because there is a race condition. The test layer, the DSL, would stage the interactions. It would wait for the right results from the system before it moved on to the next stage if that's what was happening, if it was doing something that was asynchronous. This is a more complicated version. This is the fixed place order
58:00
and you could specify quite a lot more parameters in that. And again, a lot more defaults. So at its basic, if I wanted to place an order through either fix or the public API, I could just say place this order on this instrument
58:22
for this quantity and price. That's the kind of minimum that I would need to do and that would work. And that would work through any channel. As I was leaving, we were starting working on generalizing the channel. So instead of doing what I was showing earlier, where you were saying trading you are here,
58:40
you had an annotation where you could say this test runs against the fix API, the trading API, the proprietary API, and the administration user interface. And then you have the same test case and it would run it against each of those channels because the DSL abstracted the business operation so that place order was the same
59:02
however you were placing it, through whichever channel you were placing it. That's a very powerful idea. I think there's a lot more to be said about approaches to acceptance testing like this. I'm running out of time very quickly. I want to talk briefly about deployment because that's another thorny problem.
59:20
If you want to be able to achieve the sort of things that we've been talking about this morning, you need to be able to automate the deployment process. And that can be complicated. We developed our own deployment tools. Again, we were rolling our own because it was a relatively immature hour. If we were starting now, we'd probably pick up something like Chef or Puppet to do it, but we built our own.
59:40
We called it Scotty, as in Beam Me Up Scotty. Effectively, the way that Scotty worked was the Scotty server sat on top of the artifactory repository and it provided a state machine on top of the artifactory repository. Each environment that ran a version of the application ran an agent, and the agent would regular...
01:00:00
check in with the the Scotty server see if there's any work to do for it to do and the Scotty server would would resupply with with instructions like start or stop or nothing to do right now or or deploy it was actually more complicated than that because some environments if for it for a manual
01:00:22
tester we might be running the remember the deployed environment is a hundred-odd machines but for a minute we could deploy the whole system onto one machine with a deployment tool so for a manual tester maybe they just run it locally just really on one machine and the QA team would deploy to single system the acceptance testing we want a bit more sophistication so we had more
01:00:43
machines in production we got lots of machines and in performance test we got we got we got a few again each of those the way that we achieve that was that each of the places that hosted the system had a collection of roles and the role said what aspects of the system this was fulfilling in a QA machine it
01:01:03
was pretty much all of the roles were on the one machine in production it was pretty much there was one role per node and in acceptance testing we kind of doubled up some and the ones that were performance-critical had their own in the production environment ones that performance-critical had their own machine and the ones that we didn't care so much about we doubled up for low
01:01:24
footprint in production is actually a bit more complicated than that because as well as the production system we had a disaster recovery system which was a cut down version of production off site in another location so somebody that wanted to release the system would go to the Scotty console
01:01:43
this is an old version it got a bit prettier than this later but I didn't get a number of different release candidates that have made it through testing these are available for release there's a summary of the approvals that it's had and you can choose action so you could just say upgrade to that and you
01:02:03
just click on the button and it would deploy that system the system to the target environment this is this is actually a console showing the status of the performance test environment this is this is one environment but 15
01:02:21
nodes and there's a there's a statement of what each of them the role that each of the nodes is playing and the health status the last time that the Scotty agent checked in with the central server and the current operate operation the Scotty agent is performing this is a similar picture but
01:02:42
for a single a QA environment so a test environment so this is this has got all of the roles on one machine and a summary of all of the health checks from that one machine in one place the point the important point is to reiterate is that whether the whether we were deploying the application on a
01:03:00
development machine or in production fundamentally it was the same code that was deploying it actually we didn't tend to use the UI console for deploying to a development machine we tended to call ant targets that went through to the scripts that did the deployment but other than that is the same that's all that Scotty the Scotty agent did it just called the same ant targets and they went through to the scripts so that meant that by the time
01:03:25
we released the software into production we'd rehearsed deploying that version of the software with that version of the release tools in that configuration probably 10 or 15 times before it got to production so we know that the deployment system works as well as knowing that the software works
01:03:41
we know that the configuration works as well as the software everything was version controlled everything everything was was asserted I've overrun by about two minutes one final point we were running for about we were running for 13 months before we had our first regression bug in production
01:04:00
live and that was just a stupid thing we had other kinds of bugs we had things that we missed things that we didn't think of but but they were the sorts of failures failures of intent rather than failures of execution we kind of almost eliminated failures of execution through this process I'm very sorry I've run out of time for questions I'm more
01:04:20
than happy to take questions offline if anybody's interested thank you very much for your time