Vacuum more efficient than ever

PGCon - PostgreSQL Conference for Users and Developers

Sawada, Masahiko

Formale Metadaten

Titel

Serientitel

PGCon 2018

Anzahl der Teile

Autor

Sawada, Masahiko

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/49123 (DOI)

Herausgeber

PGCon - PostgreSQL Conference for Users and Developers

Erscheinungsjahr

2018

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Information und Dokumentation

Genre

Konferenz/Talk

Abstract

VACUUM is an important feature for PostgreSQL to reclaim old row version. The PostgreSQL community has made great progress in improving VACUUM, but for large installation that requires stable and good performance we still have to do something about VACUUM at some point. For instance, VACUUM on whole large and heavily updated table could take a long time for completion, which means taking the table lock for a long time. Also it might be canceled half way through. In this talk, I'll explain the state of VACUUM and recent changes to VACUUM, and toward more efficient VACUUM I'll talk about improvements of VACUUM with ideas and its results. VACUUM is an important feature for PostgreSQL that uses MVCC to implement transaction isolation, to get rid of dead row version. If PostgreSQL doesn't have it, the number of dead row versions would grow without bound, and therefore the database size would grow without bound. Therefore PostgreSQL has to do VACUUM periodically and even under transaction processing. Making VACUUM more efficient is very important if one wants to get stable and good database system performance. PostgreSQL community has made great progress in improving VACUUM over the years, such as, introducing autovacuum, cost-based vacuum delay, visibility map, etc. With the latest PostgreSQL, small systems do not have to worry about VACUUM very much and even for larger systems there are less things to worry about now than earlier versions. However, considering that many systems that require stable and good performance have started using PostgreSQL, VACUUM still need to evolve to handle their performance requirements. For instance, for very large table, since vacuum is a bulk operation, vacuumming on a whole very large table could take a long time for completion. It seems worth thinking about the VACUUM of a large table could be performed parallelly or performs portion of table that likely has many garbage. Also, although recent VACUUM improvements were mainly related to heap, not much has been done with regard to index vacuuming. We can improve index vacuum by reducing unnecessary page scan. In this talk, I'll share the state of VACUUM and recent changes. I'll also propose some ideas to enhance to VACUUM more: parallel vacuum, reduce scanning of unnecessary pages by index vacuum and I/O cost-effective VACUUM for very large table. For each idea I'll also share the result in the form of a proof-of-concept patch.