In-Memory Columnar Store for PostgreSQL

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/19081 (DOI)

Herausgeber

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Erscheinungsjahr

2014

Sprache

Englisch

Produktionsort

Ottawa, Canada

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

IMCS is In-Memory Columnar Store for PostgreSQL. Vertical data model is more efficient for analytic queries performing operations on entire column. IMCS provides 10-100 times improvement in performance comparing with standard SQL queries because of: data skipping: fetching only data needed for query execution parallel execution: using multiple threads to execute query vector operations: minimizing interpretation overhead and allowing SIMD instructions reduced locking overhead: simple array level locking no disk IO: all data is in memory IMCS is implemented as standard PostgreSQL extension. It provides set of function and operators for manipulations with timeseries. Some of them are analog of standard SQL operators (arithmetic, comparisons, sorting, aggregation...). But there are also complex analytic operators like calculation of ranks, percentiles, cross points and extended set of aggregates for financial application like split-adjusted price, volume-weighted average price, moving average... Columnar store manager stores data tables as sections of columns of data rather than as rows of data.In comparison, most relational DBMSs store data in rows. Such approach allows to load the whole record using one read operation which leads to better performance for OLTP queries. But OLAP queries are mostly performing operations on entire columns, for example calculating sum or average of some column. In this case vertical data representation is more efficient. Columnar store or vertical representation of data allows to achieve better performance in comparison with horizontal representation due to three factors: * Data skipping. Only columns involved in query are accessed. * Vector operations. Applying an operator to set of values minimize interpretation cost. Also SIMD instructions of modern processors accelerate execution of vector operations. Compression of data. For example such simple compression algorithm like RLE allows not only to reduce used space, but also minimize number of performed operations. IMCS is first of all oriented on work with timeseries. Timeseries is sequence of usually small fixed size elements ordered by some timestamp. Operations with timeseries rarely access some particular timeseries element, instead of it them operate either with whole timeseries either with some time interval. Such specific of timeseries operation requires special index for timeseries, which is different from traditional database indexes. Such index should not provide efficient way of locating arbitrary timeseries element. Instead of it this index should be able to efficiently extract range of timeseries elements. Advantages of IMCS approach: Fast execution based on vector operations Parallel execution of query No changes in PostgreSQL core (just standard extension) No MVCC overhead (MURSIW isolation level) No disk IO (in-memory store) Optimized for timeseries (massive operations with time slices)