GIN in 9.4 and further

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/19101 (DOI)

Publisher

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Release Date

2014

Language

English

Production Place

Ottawa, Canada

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

This talk presents set of GIN advances in PostgreSQL 9.4 and further which brings GIN to new level of performance and extendability. Most important advances are: posting lists compression, fast-scan algorithm, storing additional information and index-based ranking. This talk presents set of GIN advances: Compression posting lists. Indexes become 2 times smaller without any work in opclass. pg_upgrade is supported, old indexes will be recompressed on the fly. Fast scan algorithm. Fast scan allows GIN to skip parts of large posting trees during index scan. It dramatically improve performance of hstore and json search operators as well as FTS "frequentterm & rareterm" case. In order to use this improvement three-state logic support required in "consistent" opclass method. Storing additional (opclass defined) information in posting lists. Usage of additional information for filtering enables new features for GIN opclasses: better phrase search, better array similarity search, inverse FTS search (search for tsqueries matching tsvector), inverse regex search (search for regexes matching string), better string similarity using positioned n-grams. Index based ranking. This improvement allows GIN to return results in opclass defined manner. Most important application is returning results in relevance order for FTS which dramatically reduces IO load. But there are other applications like returns arrays in similarity order. We present the results of benchmarks for FTS using several datasets (6 M and 15 M documents) and real-life load for PostgreSQL and Sphinx full-text search engines and demonstrate that improved PostgreSQL FTS (with all ACID overhead) outperforms the standalone Sphinx search engine.

Keywords

Alexander Korotkov

Oleg Bartunov