Snorkel Beambell - Real-time Weak Supervision on Apache Flink

Cite

FOSDEM VZW

Marthi, Suneel

Formal Metadata

Title

Snorkel Beambell - Real-time Weak Supervision on Apache Flink

Title of Series

FOSDEM 2020

Number of Parts

490

Author

Marthi, Suneel

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/47263 (DOI)

Publisher

FOSDEM VZW

Release Date

2020

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The advent of Deep Learning models has led to a massive growth of real-world machine learning. Deep Learning allows Machine Learning Practitioners to get the state-of-the-art score on benchmarks without any hand-engineered features. These Deep Learning models rely on massive hand-labeled training datasets which is a bottleneck in developing and modifying machine learning models. Most large scale Machine Learning systems today like Google’s DryBell use some form of Weak Supervision to construct lower quality, large scale training datasets that can be used to continuously retrain and deploy models in a real-world scenario. The challenge with continuous retraining is that one needs to maintain prior state (e.g., the learning functions in case of Weak Supervision or a pre-trained model like BERT or Word2Vec for Transfer Learning) that is shared across multiple streams, while continuously updating the model. Apache Beam’s Stateful Stream processing capabilities are a perfect match here including support for scalable Weak Supervision.