We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

NrtSearch: Yelp’s fast, scalable, and cost-effective open source search engine

Formale Metadaten

Titel
NrtSearch: Yelp’s fast, scalable, and cost-effective open source search engine
Serientitel
Anzahl der Teile
56
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Search and ranking are part of many important features on the Yelp platform - from looking for a plumber to showing relevant photos of the dish you search for. These varied use cases led to the creation of Yelp’s Elasticsearch-based ranking platform which we presented at Berlin Buzzwords 2019, allowing real-time indexing, learning-to-rank, and lesser maintenance overhead, as well as enabling access to search functionality to more teams at Yelp. We recently built Nrtsearch, a Lucene-based search engine, to replace Elasticsearch. We have open sourced this search engine under the Apache 2.0 license. This talk will detail Challenges associated with scaling Elasticsearch costs and performance. Mainly issues related to the document-based replication approach. Difficulties with real time auto scaling of Elasticsearch. Inefficient usage of resources due to hot and cold node issues. Architecture of Nrtsearch Uses Lucene’s near-real-time (NRT) segment replication Primary-Replica architecture: Primary does all writing including segment merges while replicas simply copy over segments using Lucene's NRT APIs and serve search queries. Cluster orchestration, availability and management of nodes is left to systems like Kubernetes that excel at resource management and scheduling. Truly stateless architecture: Deployed as a standard microservice using Kubernetes. State is committed to s3, upon a restart of a primary or replica, the most recent state from s3 is pulled down. Benefits of this architecture Performance increased by up to 50% Cluster costs lowered by up to 50% Use of standard tools (k8s) to manage operational aspects of the cluster, relieving ranking infrastructure teams to focus on search-related problems. Challenges involved in rolling this out to production Lucene’s segment replication approach and the code itself is not widely used in the industry so had some rough edges. Exciting performance bugs! Future work Enhance feature support via extensible plugins like vector-embeddings Continue to simplify and open source deployment tooling to help others deploy NrtSearch in their own cloud environments.