We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Autopsy of an automation disaster

Formal Metadata

Title
Autopsy of an automation disaster
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
You’ve deployed automation, enabled automatic master failover and tested itmany times: great, you can now sleep at night without being paged by a failingserver. However, when you wake up in the morning, things might not have gonethe way you expect. This talk will be about such a surprise. Once upon a time, a failure brought down a master. Automation kicked in andfixed things. However, a fancy failure, combined with human errors, with anedge-case recovery, and a lack of oversight in automation, lead to a split-brain. This talk will go into details about the convoluted - but still realworld - sequence of events that lead to this disaster. I will cover what couldhave avoided the split-brain and what could have make things easier to fix it.