Index support for regular expression search

Cite

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Korotkov, Alexander

Formal Metadata

Title

Index support for regular expression search

Title of Series

PGCon 2012

Number of Parts

Author

Korotkov, Alexander

Contributors

Heroku (Provider)

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/19024 (DOI)

Publisher

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Release Date

2012

Language

English

Producer

FOSSLC

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Regular expressions (regex) are powerful tool for text processing. When dealing with large string collections it's important to search fast on that collections (i.e. search using index). Indexing for regex search is a quite hard task. This talk presents novel technique (and WIP patch for PostgreSQL implementing it) for regex search using trigram indexes. Proposed technique provides more comprehensive trigram extraction than analogues, i.e. higher performance. There are two existed approaches for index-based regex search. The FREE indexing engine is based on extractions continued text fractions from regex and perform substring search. Google Code Search approach present more sophisticated recursive analysis of regex with extraction of various regex attributes. This talk presents novel technique of regex analysis which is based on automata transformation rather than original regex analysis. Superiority of proposed technique will be proved by examples and tests. The talk would be organized as following: Introduction. Regular expressions Finite automata pg_trgm contrib module Existing techniques for index-based regular expression search FREE indexing engine Google Code Search Proposed technique Description Examples Comparison with analogues Limitations Performance results.