We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Zero downtime deployments: Is it worth the effort?

00:00

Formal Metadata

Title
Zero downtime deployments: Is it worth the effort?
Title of Series
Number of Parts
141
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Learn about the advantages and disadvantages of zero downtime deployment strategy, as well as best practices for implementing it in your organization. Learn how to make changes to production systems while keeping users up to date. Don't pass up this chance to optimize your software deployments.
Multiplication signPlanningPresentation of a groupPoint (geometry)Strategy gameComputer animation
Length of stayStack (abstract data type)Android (robot)Cloud computingInternet service providerFront and back endsMultiplication signWeb-DesignerAxiom of choicePoint cloudMobile appWeb 2.0Meeting/Interview
SoftwareWeightDigital signalProduct (business)SoftwareComplete metric spaceSoftware industryOrder (biology)Web-DesignerExpert systemIndividualsoftwareProduct (business)Computer animation
Revision controlCodeGreen's functionStructural loadStrategy gamePhysical systemRevision controlCartesian coordinate systemProduct (business)Identity managementOffice suiteFacebookLatent heatIntegrated development environmentTrailSoftware developerGreen's functionSet (mathematics)Stability theoryLastteilungService (economics)Instance (computer science)Process (computing)Complete metric spaceTerm (mathematics)WindowBitNumberSoftwareState of matterComputing platformRight angleHypermediaDistanceQubitServer (computing)Software testingComputer animation
Moving averageRevision controlSubsetGroup actionCartesian coordinate systemRight angleSoftware developerFeedbackServer (computing)Entire functionFacebookSheaf (mathematics)Strategy gamePlastikkartePresentation of a groupLevel (video gaming)Set (mathematics)MathematicsImplementationRevision controlFluxNetwork topologyFiber bundlePatch (Unix)Product (business)Process (computing)Procedural programmingTask (computing)Software testingSoftware bugNumberPoint (geometry)Moment (mathematics)Range (statistics)FlagAverageComputer animationMeeting/Interview
Software testingMathematicsFlagContinuous integrationDevice driverProcess (computing)BootingComputerReal-time operating systemCartesian coordinate systemCASE <Informatik>Interface (computing)CodeProduct (business)Revision controlOnline helpLatent heatRight angleSoftware testingSound effectGroup actionGame controllerRollback (data management)Physical systemModal logicSoftware bugMultiplication signError messageBitStrategy gameSoftware maintenancePresentation of a groupSoftware developerSystem administratorSoftwareMusical ensembleResultantDifferent (Kate Ryan album)View (database)Theory of relativitySet (mathematics)Linear regressionBuildingRoboticsFreewareMaxima and minimaAnalytic continuationMultilaterationComputer animation
Product (business)Rollback (data management)Scaling (geometry)Revision controlAdditionService (economics)Computing platformCartesian coordinate systemArchitectureComputer architectureStrategy gameSet (mathematics)Software maintenanceComputer animation
Human migrationMathematicsDatabaseVulnerability (computing)CodeProduct (business)CASE <Informatik>Human migrationProcess (computing)Presentation of a groupCartesian coordinate systemSet (mathematics)Strategy gameDecision theoryReplication (computing)Projective planePhysical systemPosition operatorDampingEuler anglesRight angleService (economics)Table (information)Software as a serviceShared memoryComputer animation
Multiplication signBuildingPoint (geometry)Product (business)Cartesian coordinate systemComputer animation
Green's functionMoving averageFlagRight anglePatch (Unix)Perturbation theorySubsetService (economics)Identity managementStrategy gameQuery languageSequenceMultiplication signProduct (business)CodeGreen's functionMathematicsIntegrated development environmentHuman migrationAreaDecision theorySoftware testing2 (number)Error messageRollback (data management)Kernel (computing)Computer animation
Right angleField (computer science)MassStrategy gameWritingPattern languageProcess (computing)Multiplication signHuman migrationBitCASE <Informatik>Data structureDatabaseFlagTable (information)FluxReal numberType theoryElectronic mailing listMereologyData storage deviceSpeech synthesisScripting languageSystem administratorDescriptive statisticsForcing (mathematics)Software developerOcean currentCodeLevel (video gaming)PredictabilityProduct (business)Hydraulic jumpRevision controlCartesian coordinate systemMoving averageMathematicsPatch (Unix)1 (number)Lecture/ConferenceMeeting/Interview
Computer animation
Transcript: English(auto-generated)
All right, so, hello everyone again, thanks for introducing myself, it's a pleasure to be there on EuroPython 23, I cannot really see and meet some of you as this time I'm online,
but thanks everyone for coming and hope you will enjoy. So let's start from the plan of
the presentation, so today we'll talk about like five points, so we'll start from benefits from zero downtime deployments, I will present you a few strategies how to achieve it, I will go through best practices, I will also try to highlight the challenges related to zero downtime
deployments and in the end I will try to answer the question or help you answer the question is it actually worth the effort. Right, so yeah, I'm enough for Novitskiy, so
as you should I enjoy web development, I'm mostly like full-stack web developer, so even though my last couple of years of experience is front-end based related to the JavaScript
and the React, I decided recently to go back to some topics related to DevOps, as like for the first time I started DevOps work at 2015 when Kubernetes was just crawling and
everyone was on the theme of microservices, so I was also writing some of them and the only one choice was the cloud infrastructure provider which was the AWS, so no one really think about using other cloud providers or there were like really
minorities, so in the meantime I also found the App Grid online startup that helps small and mid-sized companies processing their daily custom orders and it goes with
pair with the sale and the settlements, so basically this says that it's like software for business, we run it on for now only in Poland, but if you're interested just give me a shot and I'm also like contributor to the Ulanlabs software company,
this is the house full of fintech and blockchain experts, so here we provide the complete solutions for the web development including the audits, product design and custom software
development, so it's like really nice theme, I recommend it to you. So let's start from the benefits, so what are really the benefits of zero downtime deployments, so many of you are already hear about that, many of you already used the
application that are really deployed in like fluency, fluently, so first thing that came to our mind is definitely 100% of the application uptime, so this is especially dedicated by some specific industries like fintech, banking, so the industries where
the customer interest matters, when the end users really matters, so for example as a user of exchange platform, so I don't want to really miss the opportunities to sell or buy
in some critical market situation, also the social medias, emails, office tools, so everywhere, every tools, so we use it so often that it needs to be on
and we cannot really stop our work because of we have some like schedule for like our office tools, it needs to be like upgraded, so in desktop
application it quite recently like also happened that you go to work and you need to wait for for installing the software but this is not the future where we want to leave. So probably like the Facebook or Google products won't be that popular if they would show some like prompts, oh the system is unavailable, please try again later, right?
This also builds trust in the relationship with our customers, the end users, so especially when a number of user grows, chance that no one is using our application
in release window go down, right? So if you're like small company, you don't have many customers, you can always deploy after the office hours, right? But if you have number of users, it's a bit harder to schedule.
So what if your deployment process is complicated and long and you would benefit from doing that deployment under the hood? So I think this is a good question. The system efficiency improved, I mean that like very often the zero-downing deployment is not
everything and application are built in the mindset of lean software development choose so not only to deliver as fast as possible but also to monitor, to measure
and quite important to decide on that data, right? So how would you do that if you don't have that? So overall benefit is that complete system efficiency got improved and this is what we are aiming also with zero-downing deployments. So let's talk about a few strategies.
So I will elaborate on three of them that I think is the most popular,
at least in what I faced. So starting with blue-green deployments, so this strategy involves maintaining two identical production environments. So the blue, the one on the left side and the green and the one on the right side.
And in between our user traffic and application there is always some kind of load balancer. So this is always something that manages our traffic. And at the beginning once we have version one of the product it's using some set of servers, some stack.
And the new version is deployed to the green environment and tested through this. So once it's stable, once we decide that it's stable or our automation tools decide that,
traffic is re-inserted from the blue to the green. So now you can see that the traffic is on the new version, making the green environment the new live version.
If any issue arise after that, so if our deployment don't fail during the build, if it won't fail on our tracks or maybe even some manual tests, then
it can always fail later, right? So we should have some kind of tools that will allow us to switch back to the blue environment quickly. So blue-green deployment is also known as the red-black. And maybe I should move more
like talking about the red-black, because this new term is like used quite, maybe not, I don't know if quite recently, but it was introduced by Netix if I recall correctly, and it's also described on the Kubernetes documentation. But it's all the same,
so we just the name, right? Another strategy is the rolling deployments. So this involves granularity updating the application instances or services in some sequential matter. So here you
can see four states of the application. So again, blue is our live application and the green is something that we want to upgrade. So we are upgrading granularity, service by service, and instead of deploying the new version to all of the servers, it's allowed to some
like subset of the servers. So the previous servers is remaining to serve the live version. I hope that you can see at this point some difficulties there. I will talk about them in
a moment. So this approach ensures that portion of the application remains available throughout the deployment process, and by this we reduce the risk of the downtime or we eliminate it. Another one that is maybe not that popular, but it's quite easy to implement
and I've seen it in a number of applications that I was working on, is the hot patching. And this one involves applying patches or updates to the running application without restarting it,
right? So imagine like you could just fix the bug without really like go through the whole deployment process. In bigger companies, it's sometimes
like procedures that involves many people and involves many small tasks to be done. So imagine if instead of rolling back the new version, you could easily patch it. And an example might be just when you have a JavaScript
application served on a stream, you can easily swap the bundle and almost immediately allow to get that. So hot patching enables updates to be applied while still application
remains live and accessible to users, right? So this also is done in a matter that no one really recognize that the change has made. Let's go through some best practices now.
So it's not actually the best practices of the implementation details about the zero downtime deployment because this presentation is more from the high level.
So we will talk about set of practices that make zero downtime deployments make more sense when they are applied together. And like honestly, it's like almost always go together, at least some of it. So starting from the canary releases,
so sometimes they talk that this is like actually the strategy. I put it like here more in the practices section. So what is actually the canary release? So this is something that
involves like deploying new feature or update to a small subset of users. Yeah, and I think this is quite critical why it may be noticed as a strategy. So it makes deploying new features updating to small subset of users or servers before rolling
them out to the entire user base. So yeah, you might already see or maybe hear about about similar experiments made by Facebook for example. So you have different UI,
you have different applications than your friends and who is in the same room. So by granularity increasing and exposing, sorry, by granularity increasing
that's how I'll say that. So like the developers can monitor new version behavior, collect feedback and detect any issues before a full deployment. So we can like, if we focus on that the small subset of users is like group of our customers
that are actually testing our application. So they are actually testing our application because maybe we just, they've all voluntary accepted to get better like features. They are
on the same stack that is like where you deploy the changes first. So for example, in the average we have something like this, it's like multi-tenant application and we have like kind of isolated tenancy and we have the whole stack where there is a customer that
knows that is like very first like adopter of the application and gets the most updates more frequently. So once this works, so once the application works not only on the level of
like CI, CCD jobs, like manual tests and so on, we have green light that's based on our best knowledge. It's working fine, product can be released. So we first release it with some
small subset of the users and then like after a few days of testing, like monitoring, like tracking the user behavior, we decide on deploy to bigger range to other customers.
So canary releases are also considered as the zero downtime strategies. So as I said, it's like more implementation like detail for me. Another one, another practice is that like using
feature flags. So also like the feature flags, I think that the deployment process is also like very close to the deployment process. So there is also, there is some like relation between how do you manage the version of the application and feature toggles or feature flags.
This is something that helps us to, that allows us to run specific features or hide specific features while deploying new code. So this is something that you
like make the end users don't see the deployment results from the beginning, but you can just go to the admin panel and turn on the feature later. So this like help you,
for example, to merge and to like partially deploy not complete features maybe, like MVP features or maybe not finished one. So you can constantly working on it and once it's ready,
you can turn it on or you can again turn it on for some specific group of users. So this provides granular control over the feature releases and allows for easy rollbacks if
necessary. So it's primarily used to review, so A-B testing, review of the effectiveness
of a change and how the market reacts on the check. So this can be connected with canary releases or feature flags for better experience and data gathered there can be important drivers on how application is deployed. So this is something that I observe more in
e-commerce sectors. So it's like they very often care about how the sale process should look like, like how fast you should buy the product and how efficient you should do that.
And in this case, the small like improvements like checking different interfaces. I know the company that is saying that you can buy item in like three clicks like with the payments, right? I can buy the product with like four clicks. So it's something that allows you,
like there are some researchers that showed that if the process of buying an item is long, then you think, okay, it's not something that I really want to do. It's not so this boots,
this book, this like computer is not worth the effort. I will find another shop where I can buy it. Yeah, so this is like most often used in the sectors where we really care about
like performance of our user. So implementing automation test processes and utilizing continuous integration, continuous deployment pipelines are also crucial for the successful of the
mountain deployments. So you probably want to be sure as soon as possible that the new version is ready for deploy, that the new version is like stable. So you want to achieve it by just manual testing and making regression tests with
manual regression tests with all of the changes. So like you need to automate the building, you need to automate the testing, you need to like automate deployment of the application. By this you actually reduce the risk of the human error and allows rapid
and reliable deployments. So that's our goal there. And monitoring was also something that I mentioned a couple of times during this presentation. And it's still essential to have
monitoring systems in place during the deployments. So real-time monitoring like help the techs to minimize performance issues or errors promptly. In case of the problem, robot strategies should be prepared to revert to the previous version quickly and
minimalize the impact on the user. And this is like the monitoring is very important while the deployment itself, but it's also important after all when deployment is done,
and maybe you're just turning on the feature flag or you're providing different view to some set of users like with the A-B testing. So basically the zero downtime
deployment, it's not something that you really get for free. And there is also like some down sides and costs obviously. So since you decide, you will always need to take care and ensure to prevent problems that may occur. So yeah, in the developments by creating new
software, the biggest problem is always maintenance of the code. So in this time with the zero downtime deployments, you have a bit more to maintenance and take care. So it's
kind of loan that you make for yourself. So first thing is that the orchestration actually. So the container orchestration platforms like Kubernetes provides features for zero downtime deployments, not only the Kubernetes, but maybe more specifically the tools built on
top of it. And these platforms allows you to define deployment strategies and perform rolling updates called scaling and health checks. So by leveraging the container orchestration
capabilities, you can deploy new versions while maintaining safe service availability. So in addition, you can automate like automatically handle failures or rollbacks. So depends on the teams, many new products starts with Kubernetes.
But if your don't, yeah, questions why. Yeah, so keep in mind that the biggest problem of the custom solutions is the maintenance and the scale, right? So if the scale grow ups, you need to be prepared for it. That's why
the DevOps work and the whole architecture, application architecture work is quite difficult because we usually start from small things, small set of requirements, and then it grows
in a way that you would never predict that. So it's always good to start from the fresh mind. Another big problem, and maybe it's even the biggest problem that makes that
like that loan on our team is that the database schema changes. Because these can be significant like challenge and like actually for this topic, there could be
like dedicated different like completely separate presentation. And what I've seen, there were even one of that for zero-time migration is in SaaS applications today. So I encourage you to
take a look at this, but just like quickly going through the possible strategies that you can achieve is that like performing the backward compatible database changes like using database migration tools or like providing some kind of like database sharing and replication.
So basically the biggest problem with the migration is still that your end users like cannot really, you cannot really prevent them from using
the certain table once you decide to rename it or move some of the data. So that's why on every deployment, your database needs to be prepared.
So actually like nothing like affects using the database. The real challenge is also team attitudes. So when you choose a team that's already built a system like this and you build project from scratch, so we are quite
of like on the winning position. But migrating existing system without making like cutting edge decisions, like highlighting the cutting edge decisions, very often rewriting some set of the application from the scratch or the whole application can make me migration that
never as a never-ending story. And I'm the weakness of many attempts of migrating to cloud, migrating to microservices and optimizing the deployment process. And I've seen it
work only once, when the case was completely rewriting the couple of the services. So it's really hard to rewrite the base of the application, right? And fortunately in
Ulan Labs, they're like great teams that manage to like set up new projects with everything that could be needed for zero downtime deployments. So the question is, is it worth the effort? So I can see that we are getting out of the time.
I will try to finish next like three to four minutes. So obviously there is no simple answer to that question, or obviously it depends. So it depends on manufacturer that are specific to your product or company. So like what your product does,
who are your customers, how valuable is for my company to have zero downtime deployments, how often do you need to deploy, right? So they're like very correlated a question that you need to ask to yourself, to your company. All of the challenges have a price
you need to pay, right? So start from investigation, if it's actually something that you benefit. So if you start to rewrite the application infrastructure anyway, so if you're in that point you might be building new foundations anyway, just like try Kubernetes technologies like
something that would help you to start, like to achieve zero downtime deployments if it's needed really. So let's quickly recap on what we discussed today. So we went through strategies. So there are like three main strategies, blue-green deployments,
which is about deploying like second identical production environment, and then we were swapping the traffic. Then rolling deployments, where we deploy services in sequential manner. Hot patching was about deploying challenges that don't need the service to be restarted, right?
Best practices, so we went through kernel releases, so this is releasing changes to the subset of users, right? So yeah, feature flags like that, the feature toggles with no deployment needed, so you can deploy and later you can turn on the feature, the flag. AB testing is for better
product deployment decisions making. CA automation tests to speed up the process and reduce the human error, so this is really critical, as well as monitoring, which is for safe rollbacks and as well for AB testing and better decision making. So is it worth the effort?
And the answer is hard and not very clear, so you would need to go through that question that I gave you. Do you really need that? Maybe you can just automate a lot of stuff already, and this will be satisfying for you and you don't need to care about migrations,
for example, and like 20 seconds, 30 seconds of deployment, like outages, it's fine for your users. Like the advice is start small, because there are plenty of best practices in that area
that can be followed to build better products and mindset as well, so you don't need to do everything now, you know, go to your products and say, oh, we are implementing ZDD now, we are getting rid of our infrastructure rewriting it, because in most of the
companies it would take years, right? Unless you do something pretty small. So thank you, everyone. So yeah, there's time for questions. You can find me on like noviral username, or you can use this query code to find me on LinkedIn.
And that's it for today. Thanks a lot. Wait for your questions. Thank you, Rafal. So as always, we have the microphones. If about anybody has, we have like 10 minutes of questions. So that's plenty, I guess. If anybody has any
questions on Discord online, you can put them in the chat and I'll spell them out for you. Does it seem that I was pretty clear this time? Okay, sorry, go on, go on. So I have a question. There is a pattern when you want to do zero downtime, which is to
always have a version, a superior version, which is backward compatible with the previous one. So you mentioned about database migrations. When you do this on the database level, is it enough to prevent any issues on production? Or is it too simple or in real case,
in real production use cases, it's always more complicated than this? So you're asking if like having that pattern with major, minor and patch versioning, right?
Do you want me to rephrase? Yeah, yeah, just if you could shorten the question. So about database migrations, you mentioned it was something hard to achieve with zero downtime. My question is, if you're unsure that each release is backward compatible,
your shimas are always backward compatible with the previous one, is it something that you do first? Yeah, so this is something like without, so some releases don't actually need to like to touch the database and it's fine.
So sometimes like you provide new feature and ideally is that once you want to deploy new feature, that new database structure, new database schema is already there. So this is the best approach because during the deployment, we don't actually
deploy everything in one. So you don't migrate the schema, make the database migration maybe, and also change the code, right? So this is something that you don't do. You usually do that, especially if you use, for example, Django, with that code to migration, it's quite close.
So you need to change a bit the way of thinking to always write the, so it's not only about writing the migration that will just,
if you remove the field, it will just create the field, but it's more about, like if you create a new field, you need to like, the current version of application cannot use that new field that you want to migrate, you need to like make that jump.
So first you create that field and then in another deployment, you use it. The same for removing the fields, renaming, actually renaming the table is almost something
not something worth to do. So you need to have some way of don't using the the structure from this deployment, you just need to have the database structure from the previous one and then just like deploy the features on top of it.
Thank you. Thanks a lot. Hi, thanks for the talk. Actually, I have two questions. One is related, actually one is kind of similar to the question you just answered. So you mentioned
about the best practices and the feature flags and my question is about data migration. So basically you said that, you know, you can use the feature flags and roll back the changes that you just, you know, deployed. And I wonder about the best practices when you actually have to migrate, I mean, you have to proceed migration of the database or if you
had a feature flags and then if you want to re-back this feature, so what's the strategy for that? So again, like feature flags is something that you turn it on after the deployment, which means that at least so in this case, when you use the feature flags, you could actually deploy the new feature with the flags turned on. So imagine that
this is some like completely new, like something created to the completely new table, right? So like if you use the feature flags or maybe if you don't use the feature flags, you would need to first deploy the creation of the schema, that migration,
and then in another deployment release the code for it. Or if you use the feature flags, you could start with the feature turn off while deployment created the table, like
run that migration schema. And after it's done, like after some time, just turn the feature on. So yeah, so this is like how the feature flags could help.
Obviously, you could have some cases like there is a bit more complicated scenarios where you swap to new field type, then how to make it backward compatible. If you turn on
the switch, maybe some of the data will appear in the one field, in the one column. And if you turn off the feature, it will appear in a different field. So here as well, you need to write kind of like a port adapter pattern maybe to make that like your API is reading
sometimes from one field, sometimes from another field. So you don't really lose that data or you don't make unexpected behavior on the user. All right. Thank you. And another question about feature flags.
What's the best strategy for cleaning up feature flags? So yeah, that's true. This is very often a mess, especially if you don't have a process for creating the feature flags and the developers do it on your own. The one thing that worked from the team where I'm working now is something that we
have a script that manage the feature flags. So we force people to write down what is the name of the feature flag, what is the description, what it actually does. And this script also
not only creates because we keep the feature flags in the database for doing that from the admin panel. So the script ensures that things that are written on that script,
on that dictionary, whatever, reminds the same. So it removes for you some old feature flags or creates the new one, creates ones that came with the new deployment. Thank you. Thank you.
For everyone's purpose, if you're interested, the way how we deal with feature flags in our code
is that, well, on the code, we don't store them in the database. So we basically put a to-do comment next to the use of the feature flag with the date of the release when we, when this feature flag is expected to be removed and become a permanent part of the code base. Basically, when the build runs, people get notifications and a list of things which
are actually to be, need to be removed. Thanks for that. So quite related to this is, I think it's like in general, like practicing of deprecating some pieces of the code. So it's
like really, I think it's worth to write it in a way that at the end of the release, you have a list that your users are notified about what will disappear soon or what kind of migration they need to change to be up to date with your future releases.
So I think that was it. We have one more minute if anybody has any other quick question. Otherwise, I guess we're done. Thank you very much for your talk. It was nice listening to you and have a great day.
Thanks. Thanks a lot. Have a great day of the conference and the rest of the week. See you.