We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Vacuum more efficient than ever

Formal Metadata

Title
Vacuum more efficient than ever
Title of Series
Number of Parts
37
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
VACUUM is an important feature for PostgreSQL to reclaim old row version. The PostgreSQL community has made great progress in improving VACUUM, but for large installation that requires stable and good performance we still have to do something about VACUUM at some point. For instance, VACUUM on whole large and heavily updated table could take a long time for completion, which means taking the table lock for a long time. Also it might be canceled half way through. In this talk, I'll explain the state of VACUUM and recent changes to VACUUM, and toward more efficient VACUUM I'll talk about improvements of VACUUM with ideas and its results. VACUUM is an important feature for PostgreSQL that uses MVCC to implement transaction isolation, to get rid of dead row version. If PostgreSQL doesn't have it, the number of dead row versions would grow without bound, and therefore the database size would grow without bound. Therefore PostgreSQL has to do VACUUM periodically and even under transaction processing. Making VACUUM more efficient is very important if one wants to get stable and good database system performance. PostgreSQL community has made great progress in improving VACUUM over the years, such as, introducing autovacuum, cost-based vacuum delay, visibility map, etc. With the latest PostgreSQL, small systems do not have to worry about VACUUM very much and even for larger systems there are less things to worry about now than earlier versions. However, considering that many systems that require stable and good performance have started using PostgreSQL, VACUUM still need to evolve to handle their performance requirements. For instance, for very large table, since vacuum is a bulk operation, vacuumming on a whole very large table could take a long time for completion. It seems worth thinking about the VACUUM of a large table could be performed parallelly or performs portion of table that likely has many garbage. Also, although recent VACUUM improvements were mainly related to heap, not much has been done with regard to index vacuuming. We can improve index vacuum by reducing unnecessary page scan. In this talk, I'll share the state of VACUUM and recent changes. I'll also propose some ideas to enhance to VACUUM more: parallel vacuum, reduce scanning of unnecessary pages by index vacuum and I/O cost-effective VACUUM for very large table. For each idea I'll also share the result in the form of a proof-of-concept patch.