Reliability of distributed systems

Cite

EuroPython

Benes, Jiri

Formal Metadata

Title

Reliability of distributed systems

Alternative Title

Reliability in distributed systems

Title of Series

EuroPython 2018

Number of Parts

132

Author

Benes, Jiri

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/44938 (DOI)

Publisher

EuroPython

Release Date

2018

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Is your system stable? Do you know what happens if any of your system's dependency will start failing? Do you even know what exactly each part of your system does or did any time in the past? Or how fast you will identify root of the problem in case your system goes down at 2am? The talk focuses on distributed systems (microservices, APIs that communicate with databases, memory, third party services, etc.), monitoring, their failures and recovery in order to help you answer yourself questions above. First part aims on importance of monitoring such systems on several levels - monitoring of hardware, application monitoring, monitoring from outside of the systems, detecting malfunctions based on anomalies within system's data flows. Second part presents several standard techniques for preventing system failure in case of outage of dependency and technique how to recover from inconsistent state after outage. Content of presentation is helpful and interesting for beginners and intermediates. Senior developers and developers working on reliable distributed systems should bear in mind content of this presentation and master shown techniques.