#bbuzz: When your stream doesn’t stream very well

Zitieren

Zugehöriges Material

Plain Schwarz

Basjes, Niels

Formale Metadaten

Titel

#bbuzz: When your stream doesn’t stream very well

Serientitel

Berlin Buzzwords 2020

Anzahl der Teile

Autor

Basjes, Niels

Mitwirkende

N. N. (Moderation)

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/68799 (DOI)

Herausgeber

Plain Schwarz

Erscheinungsjahr

2020

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

So you got the job of creating this great new streaming analysis on top of an existing data stream. To avoid overloading your laptop you start the project by listening to the data from the test environment which gives you a manageable volume. You get the data into Apache Flink or Beam and you see raw data coming in. Yet your first attempts at doing a very simple analysis on this data results in … nothing coming out. Then you simply take the raw data and use the advised way to write it to something like HBase ... and it takes 30 minutes for the records to appear in the database. What is going wrong? The reality of streaming analytics is that the analytics works great on a continuous and big enough stream. Testing streams are very often “too dry” causing all kinds of basic systems in stream processing frameworks to behave differently from what you want. But not only the streams in the test environment are a problem, also streams that are used to process incoming files (i.e. batches on a stream) can be quite a problem to handle correctly. In this talk I will go into some of the practical problems we ran into while building streaming applications with Apache Kafka, Apache Flink and similar tools over the last years. And I will show the solutions we use that allow people to successfully build analytics/processing solutions on these "not so streaming" streams.