We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

What we learned from reading 100+ Kubernetes Post-Mortems

Formal Metadata

Title
What we learned from reading 100+ Kubernetes Post-Mortems
Title of Series
Number of Parts
56
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
When building our Kubernetes-native product, we wanted to find the most common sources of failures, anti-patterns and root causes for Kubernetes outages, so we got to work. We rolled up our sleeves and read 100+ Kubernetes post-mortems. This is what we discovered. A smart person learns from their own mistakes, but a truly wise person learns from the mistakes of others. When launching our product, we wanted to learn as much as possible about typical pains in our ecosystem, and did so by reviewing many post-mortems (100+!) to discover the recurring patterns, anti-patterns, and root causes of typical outages in Kubernetes-based systems. In this talk we have aggregated for you the insights we gathered, and in particular will review the most obvious DON’Ts and some less obvious ones, that may help you prevent your next production outage by learning from others’ real world (horror) stories.