Scaling your Kafka pipeline can be a pain - but it doesn’t have to be

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/67149 (DOI)

Herausgeber

Plain Schwarz

Erscheinungsjahr

2022

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Kafka data pipeline maintenance can be painful. It usually comes with complicated and lengthy recovery processes, scaling difficulties, traffic ‘moodiness’, and latency issues after downtimes and outages. It doesn’t have to be that way! We’ll examine one of our multi-petabyte scale Kafka pipelines, and go over some of the pitfalls we’ve encountered. We’ll offer solutions that alleviate those problems, and go over comparisons between the before and after . We’ll then explain why some common sense solutions do not work well and offer an improved, scalable and resilient way of processing your stream. We’ll cover: - Costs of processing in stream compared to in batch - Scaling out for bursts and reprocessing - Making the tradeoff between wait times and costs - Recovering from outages - And much more…