pg_shard: Shard and Scale Out PostgreSQL

CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/19139 (DOI)

Publisher

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Release Date

2015

Language

English

Production Place

Ottawa, Canada

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

PostgreSQL extension to scale out real-time reads and writes pg_shard is an open source sharding extension for PostgreSQL. It shards PostgreSQL tables for horizontal scale, and replicates them for high availability. The extension also seamlessly distributes SQL statements, without requiring any changes to the application layer. pg_shard addresses many NoSQL use-cases, and becomes more powerful with the new JSONB data type. Further, the extension leverages the rich analytic capabilities in PostgreSQL, and enables real-time analytics for big data sets. In this talk, we first summarize challenges in distributed systems associated with scaling out databases. We then describe "logical sharding", and discuss how it helps overcome these challenges. Next, we show how pg_shard uses hook APIs, such as the planner and executor hooks, to make PostgreSQL a powerful distributed database. We then cover example customer use-cases, and conclude with a futuristic demo: a distributed table with JSONB fields, backed by a dynamically changing row and columnar store. pg_shard is an open source sharding extension for PostgreSQL. It shards PostgreSQL tables for horizontal scale, and replicates them for high availability. The extension also seamlessly distributes SQL statements, without requiring any changes to the application layer. pg_shard addresses many NoSQL use-cases, and becomes more powerful with the new JSONB data type. Further, the extension leverages the rich analytic capabilities in PostgreSQL, and enables real-time analytics for big data sets. In this talk, we first summarize challenges in distributed systems: dynamically scaling a cluster when new machines are added or old ones fail, and distributed consistency semantics in the face of failures. We then describe "logical sharding", and show how it helps overcome these challenges. We also discuss this idea's application to Postgres. Next, we show how pg_shard uses hook APIs, such as the planner and executor hooks, to make PostgreSQL a powerful distributed database. We then cover example customer use-cases, and conclude with a futuristic demo: a distributed table with JSONB fields, backed by a dynamically changing row and columnar store.