...Lag

CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/19132 (DOI)

Publisher

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Release Date

2015

Language

English

Production Place

Ottawa, Canada

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Most of the time, a streaming replication slave in the same data center is so close to the master that lag can be measured in milliseconds. However when it's not, that lag can be baffling at best, and catastrophic at worst. We will look at all things lag; strategies of monitoring, configuration options to fit application needs, diagnosing common issues and real cases of 'what went wrong'. If you google from "postgres streaming replication lag" (go ahead, I'll wait...) your result set will include much information on set up and monitoring, but very little on diagnosing and even less on correcting. This talk is an attempt to fill that gap. We will start with the basics of monitoring and trending over time, look at configuration options and 'gotchas' for making your slaves trusted read sources, diagnose hardware and system factors, and finally share the pain of elusive lag patterns that took days, if not weeks to figure out. This talk takes a broad look at system health. Many factors contribute to making a database cluster run perfectly; disk speed, network latency, user query patterns, etc., etc. It can be easy to over look, or take for granted things that may strongly effect how close a slave follows the master. In fall of 2014 iParadigms converted 8 server clusters across two data centers to streaming replication, allowing us to find and document many such issues.