We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

#bbuzz: When your stream doesn’t stream very well

Formal Metadata

Title
#bbuzz: When your stream doesn’t stream very well
Title of Series
Number of Parts
48
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
So you got the job of creating this great new streaming analysis on top of an existing data stream. To avoid overloading your laptop you start the project by listening to the data from the test environment which gives you a manageable volume. You get the data into Apache Flink or Beam and you see raw data coming in. Yet your first attempts at doing a very simple analysis on this data results in … nothing coming out. Then you simply take the raw data and use the advised way to write it to something like HBase ... and it takes 30 minutes for the records to appear in the database. What is going wrong? The reality of streaming analytics is that the analytics works great on a continuous and big enough stream. Testing streams are very often “too dry” causing all kinds of basic systems in stream processing frameworks to behave differently from what you want. But not only the streams in the test environment are a problem, also streams that are used to process incoming files (i.e. batches on a stream) can be quite a problem to handle correctly. In this talk I will go into some of the practical problems we ran into while building streaming applications with Apache Kafka, Apache Flink and similar tools over the last years. And I will show the solutions we use that allow people to successfully build analytics/processing solutions on these "not so streaming" streams.