We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Learn How We Deliver. Continuously.

00:00

Formal Metadata

Title
Learn How We Deliver. Continuously.
Title of Series
Number of Parts
66
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2016

Content Metadata

Subject Area
Genre
Abstract
How to keep customers happy by delivering features fast, and bug fixes almost immediately - without breaking stuff. NiteoWeb runs several SaaS projects, serving over 5000 customers. They use several techniques, libraries and services that allow them to make several deploys to production every day. That does not mean that they do not test code before shipping it. Rather, they have a workflow that runs a variety of checks and automatic tests and makes it very fast and easy to test new features in staging environments. And even if they do push buggy code to production, they only push it to a fraction of users to minimize impact. The outcomes are great: happy users, since they get features and fixes fast. And maybe even more importantly, happy developers, since the code is actually being used minutes after being merged rather than being stuck in a bureaucratic deployment workflow. Nejc will describe how this system is set up so you can easily replicate parts or all of it. NiteoWeb relies heavily on SaaS providers such as GitHub, Heroku and Travis, but he will provide alternatives that you can install on your own servers. While he will provide concrete examples and scripts, it’s the principals that matter - and those can be applied to any platform, hosted or in-house.
8
11
Thumbnail
30:34
13
50
Thumbnail
1:21:15
53
57
MereologyProduct (business)NeuroinformatikTrailGraph (mathematics)InformationTouchscreenPlanningSlide ruleMultiplication signPoint (geometry)Goodness of fitCodeComputer fontLecture/Conference
Process (computing)Multiplication signFigurate numberThermodynamisches SystemTraffic reportingSoftware developerInformation securityServer (computing)Mobile appProjective planeClient (computing)WebsiteProduct (business)PlanningPoint (geometry)INTEGRALWeb 2.0Scripting languageMathematicsMachine codeCloningConfiguration spaceRepository (publishing)NeuroinformatikPhysical lawFocus (optics)AverageRevision controlCodeStatistical hypothesis testingAnalytic continuationSoftwareInformation technology consultingLibrary (computing)Level (video gaming)Software bugCuboidInheritance (object-oriented programming)Perspective (visual)Human migrationControl flowComputer hardwareLipschitz-StetigkeitStatistical hypothesis testingBoom (sailing)DatabaseOcean currentCASE <Informatik>Dependent and independent variablesFeedbackShooting methodData managementTest-driven developmentBranch (computer science)Multitier architectureIntegrated development environmentFrequencyLogarithmMoment (mathematics)HierarchyDecision theoryFlow separationGoodness of fitUniverse (mathematics)InformationLinear regressionPower (physics)Link (knot theory)Computer architectureSubsetLine (geometry)Right angleTable (information)Row (database)Musical ensembleSign (mathematics)FlagGroup actionEntire functionForm (programming)ConsistencyPhysical constantRoboticsData compressionOpen sourceMultilaterationDifferent (Kate Ryan album)MassElectronic mailing listExecution unitUser interfaceUniform resource locatorUnit testingFlow separationCellular automatonQueue (abstract data type)Type theoryValidity (statistics)SimulationComputer simulationBitComplex (psychology)Game theory2 (number)Lecture/Conference
Transcript: English(auto-generated)
What's the most beneficial thing about the conference? Which part is the best? What part of the conference gives you the most information, the most value? Sprints? Party? Yeah, anything else? The hallway track.
You know, when I was preparing for this conference, I was like, I gave a similar talk two years ago. I know what I'm gonna talk about. This time, I'm gonna do my talk. I'm gonna prepare it before I even sit on the plane, and I've really had this as a goal, and I did it. I prepared a really nice slides, nice pictures, like good points, and
then as the plane was speeding down the runway, I was like, oh, such a great idea, and I changed everything during the flight. Obviously. And then, you know, being here for the last two days, especially yesterday evening on the party,
the same thing happened again. then I was thinking, you know, the hallway track is the most valuable part of the conference, and is anybody using slides in the hallways? You sit down behind a computer, they pop up the screen, and then they start showing you graphs and nice fonts and titles.
No, you talk. So I thought, yeah, I threw up my digital slides up to the ceiling, said, fuck it, and let's do a talk without any slides. Let's make it a hallway track discussion. In that light, if people would like to join here, just like, I see at least 10 more chairs available over here,
so, like, there's not gonna be any slides, so they might as well be close. I need to turn on my mic. I just remembered. Yes. Am I being heard? I am. Okay, cool.
So, yeah, it's, if you want to count the keynote, the first talk of the morning after the party,
so I would like everybody to stand up, to get the blood flowing, and this is also my idea of getting, of my way of getting the idea of what the audience is. So keep standing up if you have deployed code to production the last three months.
Yeah, I was guessing. What about in, like, about last month? Still? Last two weeks? Yes? The week before the coming to the conference, have you deployed something to production? Still standing up? What about this week? Have you deployed something this week?
Today? Cool. Yeah, you can, you can sit down. So, you know, I like to, I like to approach things when I think about it with the mentality that I stole from Tim Ferriss.
What would this look like if it would be easy? So, you know, let's say that half of the people here did not deploy anything this month. Why? You know, is code sitting on your computer bringing value to your customer? Or is it, when it's sitting in Git, does, do your customers have any good out of it?
No, it's only when you give the code to them or you install it on your production servers, do your customers actually get any value from you? And, you know, if this is the only thing that actually brings value, why are we not doing it every single hour or every single day? Why do we wait two weeks, three weeks, maybe even months to get it out to customers?
And normally the answer to that is because the deployments are painful, things go wrong, and we just don't want to do it. And, you know, the longer period that you're putting off some deployment, the harder it gets, and then it's even harder to get it started.
So how would this look like if it was easy? You know, imagine that you did some feature or some bug fix, and then at the exact moment where you had everything tested and you know it works, the customer or whoever reported the feature report would just magically appear
and you can just show them like how it is and they would say, yes, this is exactly what I need. You would press a button, boom, production, done. You know, that would be really cool, you know, because then you can constantly bring value to your customers every day, several times a day.
So another example is, you know, when you get a bug report, I know that in the past, I've been running my company for 10 years now, and the normal response for a bug report was, yes, thank you. We have created a ticket, and this will be fixed in the next release.
Now imagine if you can reply to the customer, it's fixed. Just, you know, can you check it out? Is this what you wanted? Wouldn't this be cool? Wouldn't your clients be really happy about it? Because, you know, when you, for example, you have a fix and you have an issue tracker and you push the fix, and then in the issue tracker you say, fix has been committed
and it's waiting for a release, and then it's waiting for a release 10 days. The non-technical people go like, what? Like, why are you being such an asshole? If you have fixed it, why am I not getting the fix, you know?
It has been my goal to get value to customers as fast as possible for the last couple of years, and this is kind of my story of how we did it. Also, before we continue, if there are any questions, please shoot immediately. Just, you know, raise your hand and we're going to address them immediately.
Before we go through the story, let's also talk about Newton's first love. Newton's first love, the inertia, you know. The bigger the system is, the harder it is to move. Like I said, it's the same with deployments. The more time you wait with the deployment, the bigger it gets, the harder it's going to be to deploy.
The bugs, if you get any bugs, are going to be like more severe. If you need to do data migration and you just do it in really small chunks, they're manageable. They're quick, they're easy to revert. If you're waiting with data integration, then you have to do several migrations in just one big pile,
that's always going to break. I mean, most of the times, and it's going to be hard to test because it's going to run a long time, and you're going to have a longer downtime, and it's going to be more difficult to revert. So keeping things small is always beneficial. The other way that this first Newton's law applies, the law of inertia, is team size.
If you allow your developers to deploy immediately to production, the team size shrinks. Because otherwise, you need to have, for one piece of code to get deployed, you need a developer, and then you need a QA person, and then you need someone to a reviewer. And you need all this process, and this process, again, takes time, makes deployments harder.
So with keeping, with trying to optimize the entire process, you keep everything smaller, moving faster. And like, there and back, moving faster also means you can revert faster if things go wrong.
So yeah, we have been in the Plumb consulting business for many years, and then we decided we're going to start building our own products. So we're not going to, you know, do customer sites, but we're actually going to build our own software as a service products and then start selling them to potential clients.
And, you know, at that point, I was the main developer, I was in charge of deployment. So we did them every two or three weeks. And sooner or later, I realized that this is something that could be automated. And just for the fun of it, I hacked up a little script that listens for GitHub webhooks.
So whenever something is pushed to master, you can set GitHub to send a post request to a certain URL. And I would listen for that post. And then what I would do is I would shut down Plumb, get pulled the latest code and configuration, rerun build up, restart Plumb. It was like really basic. I didn't think of it much until three months later when the script fell.
And I was like, I haven't done a single deployment in three months. This is really great. And at that time, I started to realize how great this is. And this has been a process in the last three years, and we now have the process much more sophisticated.
But, you know, I talked about the process two years ago in Brazil, how to do it and the talk is online. So if you need to know about specific tools that we use, you can either listen to that talk or I can explain later. But I still want to talk more about like why to even do it and like how, like, because the thing is tools are interchangeable.
Everything is open source. Then maybe you have a different setup environment. It's the idea of how to do it that matters. And I mean, you're smart people. You're going to figure it out what tool to use. We all know how to use Google.
It doesn't make sense for me to list tools here. So how our process looks now, today, is we treat everything as a problem. So either it's a feature request, it's a bug report, it's a performance problem, it's a security problem. Everything is a single problem that lives inside a ticket.
And then once a developer starts working on a fix or a designer or whoever, every problem has a solution. It doesn't have three possible solutions, it has one solution. That solution is a proposal in a form of a GitHub pull request.
Again, you can use GitLab, whatever. Just think that you need to have one solution for a problem. And immediately when that solution is proposed, what we do is we actually copy the entire production environment and set it up on our staging infrastructure. We take the production data and we subset it.
What that means is we take random rows from the data and we make sure that the data consistency is still there. So if one row depends on the other row, we will take both rows. And we also don't take the entire data over. We run a logarithmic function over it.
So, for example, if you have ten rows in a table, the resulting table will have three rows. If you have one million rows, the resulting table will have ten thousand rows. So this significantly shortens the production database, but it's still very much like the production data that you run on the staging infrastructure.
What we also do is we look in the code if there are any database migration steps and we run those. And then finally when this staging architecture is up, we post the link to it inside the pull request. And that allows the reviewer, whoever requested the, whoever posted the problem,
and then when the proposed solution comes up, it allows the person to just click on the URL and start browsing the new code that runs against a subset of production data with all the migration steps already applied. And that really increases the feedback loop from the reporters to developers or designers, whoever.
And it also, the thing is, it makes it very simple to test, you know. Before, when I was reviewing pull requests, I saw like there was, this is just a simple pull request. I guess it's right, I'm not gonna, no, normally I would pull the code locally and then start
a local instance and with the new code and maybe pull production data and click around to see if it works. But if it's just like five lines of code of, you know, maybe some CSS, I'm like, yeah, I'm pretty sure this works. You merge it and it fails. And it's always like that. And if you have already, everything already set up,
you just click a link to see it, then you do it every time. So, you know, when people ask me, but if I continuously push code to production on every merge, won't I get much lower quality? Actually, no, because you give the reviewer a ton of information, a ton of power to actually see if everything is right before you do it.
Obviously, we have 100% test coverage. We test for code quality. We do regression tests, so performance tests, again, automatically inside every pull request. The benefits are the customers get value immediately. And like we have spoiled them with it.
I know that some of our customers, while we're still doing consulting on other projects, they started requesting continuous delivery on every other project because it just, for them, it seemed archaic to wait for a fixed two weeks if it's already there. Why? And the other great benefit is to developers.
I have realized that if you allow, especially when you're onboarding new team members, if you allow them to push their code to production so their code is actually used from the beginning, they feel an actual team member much sooner than they would be.
Whenever we hire a new person, I try to make sure that that person will deploy something to production the very first day. Not like in three months, the very first day. I will prepare a very simple ticket or a few,
maybe just some UI fix or something, and we would go through the entire process. It feels rewarding when you are able to do something really productive for the company the very first day you are there. You set up a different environment, you spend 15 minutes fixing a bug, maybe one hour, and then 20 minutes later,
it's on production, people are using it. Thousands and thousands of people are using it. And so keeping the developers happy is, I think, even a better benefit than keeping customers happy because it follows one to the other. The second thing that happens with developers is code ownership, and they make sure that the
code is better. Because if you're in a process where you know that there will be two reviewers and then a QA team and then somebody is going to test before your code goes to production, you're like, I think that's good. Three more people will look at it, so probably it's okay.
But if you know that once you submit this and somebody is going to click around and there's not going to be a huge one-week testing period, instead your code might be on the servers in the next 20 minutes, you start to be careful what kind of pull requests
really careful. Because everybody knows who pushed the latest fix. Again, if you have 100 commits stuck in a single deploy and then something goes wrong, everybody's like, yeah, it's probably not my code. If you deploy on every commit, it's immediately known
who was the one that caused the production system to go down. People are way more careful in such environments. How am I doing with time? Okay. Any questions yet?
I work for a university, and if I change something in the production for one client, the other department asks, no, we don't want that week. And so their discussion breaks up.
It's just simple things. Sometimes it's just labels we change and the other one, oh, that's not what I wanted. So we have a very complex review process. Our customer needs this. We have recommendations for this. From the technical perspective, it's very easy. I would
love to do this kind of continuous deployment and continuous integration, but the customer is willing to do so. Two things that you can do. One is if you have internal libraries that you use that are not immediate, then the production system depend on them. If you have base packages
or whatever internally released, you can use continuous delivery for them, for those libraries, and you deploy them to internal PyP servers automatically. You can do this if you use them. The other thing is start with very small projects, unimportant projects,
and show that it works, and then use that as a good use case, and then try to move into the more mainstream things. That would be my approach. But I've worked for universities as well. It takes time and effort.
I can just add here that you can let the client decide the requirements first. Once they are finalized, then you start working on them. So it's their fault if they're not finalized. It's a name game. You don't want to play that. Instead of putting it afterwards. Sometimes they need to see it, so sometimes at the moment they see it, they realize
what they wish, so that's a little bit of a problem. But that's exactly what I'm saying. For every change that you do, deploy a staging environment with that change applied, and give whoever your customer is access to it so they can see.
The problem is the one who requested the change says, okay, that's fine. But there's another one in another department that says, oh, why did you change that? I didn't request that. I didn't want that. So that's the problem. You have conflicting hierarchies, and that needs to be fixed on the hierarchy level.
So someone needs to be the one who makes a decision that a change might be okay, and the others will just have to live with it. Otherwise, you can't treat it as a product that is delivered to two branches at the same time.
Yeah, but there's a flip and then everybody wants to... Maybe if you did several staging environments for every small fix, there would be less conflicts, because it would be easier for them to check, like all the people that are involved, to check every single change.
Maybe try with that. Yeah. I guess you mentioned the team member having responsibility by rolling out. Can we use this? This, this. I'm just wondering if you have an automated process for rolling back
if something goes wrong, or if that's still handled manually, or if you just always apply a new commit to fix it. How do you handle mistakes? Yeah, so most of the times you don't do rollbacks, you just push a fix.
Some of the times it happens about maybe once, maybe twice per year. I'm spoiled by Heroku. We host everything on Heroku, and Heroku basically when you're in their highest paying tiers, gives you a button. And if you do a deploy, something goes wrong. The thing is that with Heroku, until the build is successful,
they will not even deploy it. So if something, you know, you messed up the migration, or you messed up build out or whatever, the code will not even get to the production service. The old version will still run. If the bug still somehow gets in and something is wrong,
basically you click a button and they will revert to the previous version, the code and the database, because they do minutely snapshots. So I'm spoiled by that. I always say that we are here to bring value to customers,
not to host, to do hosting. There are companies that do hosting, do it well, and I don't want to be a hosting company. I want to be a developer. So I make sure that I get the best hosting I can for my money, which in my case is Heroku, but there are plenty others. If you're not in the business of hosting, don't do hosting.
Sure. Two questions. Team size, current team size, and are you doing Plone on Heroku? Current team size, 10-ish, more or less.
Am I doing Plone on Heroku? Yeah, I mean, there are some tiny sites, but we have our own internal knowledge management system we have built with Plone, and I need to put that somewhere because we have it on a dedicated server that I want to deprecate.
And I'm thinking of moving that over to Heroku, probably more. Yeah, can you just... So when you do one deploy per merge request, are you queuing the merge request up
to go into your main deployment branch, or how do you handle that? Because I, for example, have 15 developers, right, that all check into master, and it takes four hours to go through all our QA checks, all the security checks, plus, similar to what you described, the staging environment type things,
and people check in at the meantime. Even if I would try to linearize that, that would still go past 20 balls, so we would get a queue that never finishes. Yeah, so we never push into master at all. Like, nobody pushes into master.
The only way to get code into master is merging a pull request. And there's another check on pull request that you have rebased your code to the latest master, so if there are two pull requests, you know, and they're both in the process, and then one gets merged, the other one will have to update the code. So some checks will still be valid,
but some checks will have to repeat themselves. And the checks that will have to repeat themselves are those that are running fast, which are tests and stuff. The general, you know, if the UI looks good, and from the security perspective it's fine, that checks stays valid. Okay, even those checks take hours for us,
because we run literally 10th of file, we have to check, so you linearize, basically, you're both just into master. That's the philosophy of the question, how you do that, right? Okay, so you're small enough that you can still... Yeah, yeah. I mean, small enough.
Microservices? We have a ton of apps that are independent, and then we talk to each other. I mean, I don't want... Our build takes... Whenever it gets over 10 minutes, I go berserk and everybody... We're getting it down to three minutes or four. Well, so we, for example, have to verify that I have 10 months on the browser, so just one single simulation for one process that we do
takes sometimes 10 minutes. We have nine simulations for that process, and we have 100 of those processes, so yes. Can't you run in parallel? Or you do? Well, we do, but... More hardware? I mean, we run five times as many testing cells as we've done on production. At some point, you have the cost benefit trade-off
that you have to... No, that's a good point. We spend way more on staging and the build servers than we do on production servers, like way more. Production servers are peanuts compared to the infrastructure. We became Amazon's best friend within months. Cool.
Any more questions? Okay, so like I was saying to Tom, one of the approaches of how we went into continuous delivery was also to make sure that the internal libs that we have, so code that we wrote, or maybe just internal releases of public packages, that we didn't release those by hand.
So who uses Yarn MKRelease or ZestReleaser? By hand? Why? I mean, it's a great package. I use it, but Travis runs it for me. I have Travis set up in a way for a certain repository.
If a new tag is pushed, it will recognize, oh, hey, there's a tag, and it will bundle up a release and push it to our internal PyPI. Again, I think that normally just takes you minutes, but it steals focus from you, and you will make mistakes.
Humans make mistakes, and computer scripts are normally more reliable, from my experience. So you can start with that. If you have any external packages that you use, that you have an internal release of, make sure that those are delivered continuously.
So make sure that, for example, every time you push something to master, a new version is generated and pushed to your internal PyPI. Or maybe when you push a new tag, that is automatically generated. And this will get you getting into this continuous delivery story,
and you will start to see benefits, and you will get the battlefield experience that you need to be able to do it. Yeah, and then with apps, maybe start with a very small pyramid app on Heroku, because pyramid is very simple to just get up and going,
and just do a site project, or something like a helper app for your big, blown sites, and use Heroku. Heroku has fantastic integration with Git. Basically, when you create an app, you just say, I want to deploy from master on every change, and you have continuous delivery like that.
And again, just to start playing with it, and see how it goes. Yeah, I think, so we're slowly running out of time. We still do have time for a few more questions, so if anybody has any more questions, now is the time to shoot, okay?
Yeah, where's the mic? So what you're saying is that's great for small fixes, pushing bug fixes to production immediately. But what about features, particularly massive features that are hard to develop incrementally,
like the whole feature really only makes sense if a certain amount of pieces are in place. I'm guessing your answer will be feature flagging, like disabling the feature for the customer site, but then what's the point in pushing it to production
if that code path is never actually hit, and the code isn't being used? How do you deal with that? So we do have big features to deploy as well. And yes, we use feature flagging for that. It's not true that the code is not being hit, the code is being hit by 10 or 20% of the people, depending on how you set the criteria.
So feature flagging is, you have, for example, a completely new UI, and you write code in a way that it's only displayed for a certain amount of people or a certain group of people, and when you push that to production, you actually have two systems running side by side, but two UIs running side by side.
And if it's really bad, you can just switch that criteria, who gets to see it, and then everybody still gets the old UI, or you can, after a week, adjust it, and then more people, and more people, and more people start getting the new UI. But that does not really have any, it's not about continuous delivery. You can do that with manual deployments as well.
So big features, you said big features, they're hard to develop. Again, that's not, I mean, the point of continuous delivery is that you don't do manual deployment because those will break. So even if you have a huge feature, and we do a lot,
the thing is that as you develop, as you propose a solution inside a pull request, then your entire migration steps, everything will be run and set up on a staging server. You can see if there are any performance regressions. You can see if you actually test all your migration steps
every time you commit inside a pull request. It's not like when you do it manually, you test your migrations, and everything is fine, and the customer sees it, and maybe we fix this and this and that, and you also fix that, and you say, oh, fuck, the testing migrations will take me two hours. I guess it's fine. But if it's done automatically,
it's done every time you do changes to the pull request. So it's only about automating things that you would do manually. Even if it's a huge feature. So that would mean you do actually do massive pull requests that just gets pushed automatically? Yes. I mean, after all the checks are done
and everything, yeah. We've had a big feature in one of our main systems that we were developing for over three months. It's still being developed, and then on every change, there's still a staging gap that goes up. Yeah. Filip? Okay. Where's the mic?
As a developer, I mean, it's really hard to write the unit tests all the time, but sometimes the client needs the fix pretty quick. So how do you handle those situations? Like, okay, really busy time, and you just apply the code and forget to write the unit test or robot test.
Write the test first? Write the test first. Then there are sometimes really emergency cases where how do you handle this? To write a test, it takes time. That's for sure. I always write a test to confirm that the bug is there, and then to confirm that I have removed it.
I mean, not entirely like write the test first, but I would never push code without writing a test. It's just not going to happen. No, I've did it a few times, and every time that I did it without a test, I broke something. It doesn't make sense. I might as well spend 20 more minutes now,
then two hours when I break the production data and I have to do a full restore of this. We have a similar problem, and we have a second way of deploying hot fixes that way, because if the customer is down, reaction time trumps everything,
and we don't have three hours today for our checks to come back, so we do hot deploys that go where we circumvent our infrastructure. Yeah, you can have that. You can have a second process, that is. But that's not normal, and that is checked for a release. So one of the ways of mitigating this
is maybe look at how Heroku is doing. So what they do is when you push code, they will actually make a zip, they build a code and then make a zip out of it, and then keep the zip, and then whenever you need a new node up, or they will take a fresh server, and then take that zip, decompress it, and run it.
So even if that node goes down, you have everything built already. You just pop up a new node and just take the... We do the same with the reels, basically. Do take the time to build the reels. Okay, I think this is it then, and we're on time.
Yay! Thank you!