We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

#bbuzz: A Journey to Write a New Lucene PostingsFormat

Formal Metadata

Title
#bbuzz: A Journey to Write a New Lucene PostingsFormat
Title of Series
Number of Parts
48
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Some hard technical challenges are better solved by changing the foundations. We had a use case of searching many fields with strong constraints on memory and performance. We needed a massive number of fields to support field level security at scale and open the path to machine learned ranking models. A custom PostingsFormat allowed for a solution with greater efficiencies than our prior solution. We developed a new Lucene PostingsFormat called UniformSplit, we deployed it at a very large scale, and we open-sourced it. We learned a lot during the journey, especially about micro-benchmarking, java memory consumption, compact data representation and high performance Lucene indices. This presentation is a good medium to share what we learned with step backwards, the learnings on the Lucene mechanisms, the tips and the pitfalls we encountered. And as we continued the development, we will share the latest works and production measures.