Optimizing queries for not so big data in PostgreSQL

Zitieren

EuroPython

Mifsud, Stephanie

Formale Metadaten

Titel

Optimizing queries for not so big data in PostgreSQL

Serientitel

EuroPython 2017

Anzahl der Teile

160

Autor

Mifsud, Stephanie

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben

Lizenzieren

Identifikatoren

10.5446/33803 (DOI)

Herausgeber

EuroPython

Erscheinungsjahr

2017

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Optimizing queries for not so big data in PostgreSQL [EuroPython 2017 - Talk - 2017-07-13 - Arengo] [Rimini, Italy] Hotjar’s user recordings count above 400 million, with supporting tables containing 4.5 billion records. This 5TB data fits nicely into Postgres and doesn’t quite merit the full big data suite of tools. However, at the rate of 1000 recordings per minute, and overall request rate of 750K per minute, the penalty of inefficient queries and updates can quickly cause nasty performance spikes if not thought out well. This talk is about the challenges we faced at the lower end of big data: the good decisions which helped keep our application running and other lessons we had to learn the hard way Considerations for Database Design Design entities for the domain Balance normalization with performance Sharding later has big migration costs, consider designing for this early Speak to the database from your Web Application Why use ORMs and at which level of abstraction? Stored Procedures are fast, should we have more of those? Bringing data closer to the application Materialize Views Defer aggregations Application Level Caching Handling Operational Troubles Explain(analyze, buffers) is your friend Detect and manage Index Bloat Reduce Deadlocks Reducing Impact of Background Maintenance Jobs Keep impact on database low with cursors and streaming Plan data retention policies early, so cleaning can be an ongoing proces