We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Text to Context: How We Introduced Hybrid Search

Formal Metadata

Title
Text to Context: How We Introduced Hybrid Search
Title of Series
Number of Parts
64
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
With numerous references to current literature, we will explain how we designed our new system and solved the multiple challenges we encountered on both the ML and engineering side (data pipeline encoding documents, live service encoding queries, integration with search engine) as well as sharing insights from analyzing the impact. Our system is based on OpenSearch, the lessons can be applied to other search engines as well. To be more specific, the presentation will cover: - Status and Short-Comings of our old Search - Introduction of Hybrid Search - general setup - recommendations from literature - Machine Learning - model decision (quality vs. latency) - fine-tuning and offline evaluation (in particular: using Paid Search / SEM data if you have few historic own search performance data) - Architecture and Implementation: (with special consideration of latency) - pipeline for encoding documents and indexing the resulting vectors (PySpark) - service for live-encoding of queries (Python) - implementing hybrid search within OpenSearch (including important filter value extraction from query and ranking scores) - Learnings and Next Steps: - observations from our A-B test - challenge of cut-off decisions - realistic training / evaluation data - filter value extraction from query vs. semantic search - combining search with auto-complete - impact of the call to action in the search bar - for which other use cases we successfully apply such semantic vector approaches