We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Reliability of distributed systems

Formal Metadata

Title
Reliability of distributed systems
Alternative Title
Reliability in distributed systems
Title of Series
Number of Parts
132
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Is your system stable? Do you know what happens if any of your system's dependency will start failing? Do you even know what exactly each part of your system does or did any time in the past? Or how fast you will identify root of the problem in case your system goes down at 2am? The talk focuses on distributed systems (microservices, APIs that communicate with databases, memory, third party services, etc.), monitoring, their failures and recovery in order to help you answer yourself questions above. First part aims on importance of monitoring such systems on several levels - monitoring of hardware, application monitoring, monitoring from outside of the systems, detecting malfunctions based on anomalies within system's data flows. Second part presents several standard techniques for preventing system failure in case of outage of dependency and technique how to recover from inconsistent state after outage. Content of presentation is helpful and interesting for beginners and intermediates. Senior developers and developers working on reliable distributed systems should bear in mind content of this presentation and master shown techniques.