We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Kaldb: serverless lucene at petabyte scale

Formale Metadaten

Titel
Kaldb: serverless lucene at petabyte scale
Serientitel
Anzahl der Teile
60
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr2023
SpracheEnglisch

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Running petabyte-scale columnar stores has become a routine operation in today's data-driven world. However, running a petabyte-scale search system is still a challenging task operationally. Enter Kaldb, an open-source, serverless Lucene serving system designed specifically for petabyte-scale Lucene workloads. We've designed Kaldb to automate and reduce operational toil without sacrificing performance or reliability. But designing a serverless Lucene system at this scale poses several unique challenges, such as ensuring durability of data, modifying replication and caching protocols for high availability, high fanout reads, managing ephemeral nodes, and more. In this talk, we'll delve into the details of how our redesigned Kaldb system overcomes these challenges. We've separated durability of the data from storage, separated compute from storage, modified replication algorithms to handle ephemeral nodes, use Kafka as a write ahead log and developed a novel query execution layer to handle high-fanout queries. Our implementation not only reduces operational toil but also adds several self-healing properties to the system. We're proud to say that Kaldb currently runs on Kubernetes at petabyte scale with improved reliability and performance. Join us in this talk to learn more about how Kaldb can help you overcome the challenges of running a petabyte-scale Lucene serving system. We'll share our experiences, best practices, and lessons learned in designing and operating a serverless Lucene serving system at this scale, and provide practical insights and techniques that you can use to optimize your own search systems.