We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Real Time Machine Learning with Python

Formale Metadaten

Titel
Real Time Machine Learning with Python
Alternativer Titel
Real Time Stream Processing for Machine Learning at Massive Scale: Processing Massively Parallel Stream of Data with Python (+ Kafka, SKlearn, SpaCy and Seldon)
Serientitel
Anzahl der Teile
130
Autor
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
This talk will provide a practical insight on how to build scalable data streaming machine learning pipelines to process large datasets in real time using Python and popular frameworks such as Kafka, SpaCy and Seldon. We will be covering a case study performing automated content moderation on Reddit comments in real time. Our dataset will consist of 200k reddit comments from /r/science, 50,000 of which have been removed by moderators. We will be handling the stream data in a Kubernetes cluster, and the stream processing will be handled using the stream processing library Kafka. We will be running the end-to-end pipeline in Kubernetes with various components legeraging SKLearn, SpaCy and Seldon. We will then dive into fundamental concepts on stream processing such as windows, watermarking and checkponting, and we will show how to use each of these frameworks to build complex data streaming pipelines that can perform real time processing at scale by building, deploying and monitoring a machine learning model which will process production incoming data.. Finally we will show best practices when using these frameworks, as well as a high level overview of tools that can be used for monitoring in-depth.