Document classification search; joins vs payloads

Zitieren

Zugehöriges Material

Plain Schwarz

Watters, Kevin

Formale Metadaten

Titel

Document classification search; joins vs payloads

Serientitel

Berlin Buzzwords 2021

Anzahl der Teile

Autor

Watters, Kevin

Mitwirkende

N. N. (Moderation)

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/67330 (DOI)

Herausgeber

Plain Schwarz

Erscheinungsjahr

2021

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Payloads are a powerful though seldom utilized feature in the Lucene-Solr ecosystem. This talk reviews the existing payload support in Lucene and introduces the new features in Lucene and Solr 9 (LUCENE-9659 / SOLR-14787). The main focus of the talk will be to explore real world search & ml use cases that traditionally utilize a query time join and the application of Lucene payloads to solve them. This talk is for search practitioners interested in utilizing machine learned data in search based analytics dashboards. Many Solr based applications attempting to deal with machine learned classifications are forced to implement a parent-child join relationship between a document and its classifications. This model introduces many additional system constraints and costs at both query and index time to maintain the ability to filter results as desired. New features in the payload span query in Lucene provide applications a way to maintain query flexibility without incurring the cost of performing a query-time join. This greatly simplifies system design and architecture and can provide dramatic improvements to query performance. A reference implementation will be presented that compares the join and payload approaches. The demonstration will show how to search for documents that have classifications above a particular confidence threshold at scale.