We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Feature store: the missing data layer in ML pipelines?

Formale Metadaten

Titel
Feature store: the missing data layer in ML pipelines?
Alternativer Titel
Feature store: A Data Management Layer for Machine Learning Data Management for ML
Serientitel
Anzahl der Teile
561
Autor
Mitwirkende
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Data may be the new oil, but refined data is the fuel for AI. Machine learning (ML) systems are only as good as the data they are trained on and getting the data in the right format at the right time is a challenge. ML systems are trained using sets of features, a feature can be as simple as the value of a column in a database entry, or it can be a complex value that is computed from diverse sources. A feature store is a central vault for storing documented and curated features, ideally with support for access control. A feature store enables automatic feature analysis and monitoring, feature sharing across models and teams, feature discovery, feature backfilling, and feature versioning. The feature store is a data management layer that fills an important piece in the modern machine learning infrastructure, it empowers enterprises to scale their machine learning workflows and make full use of their investment in machine learning. In this talk, we will present key points on how to take your machine learning workflow to the next level using a feature store, and demonstrate how the feature store fits into the larger machine learning pipeline. We will introduce HopsML, an open-source, end-to-end machine learning pipeline built on the world's most fastest and most scalable Hadoop distribution, Hops Hadoop. With HopsML you can build production-ready machine learning pipelines using open source software, where features are stored in a shared feature store that is automatically backfilled as new data arrive, where machine learning models can be trained on datasets in the order of billions examples using distributed deep learning, where data scientists can follow engineering principles by using versioned and reproducible experiments, and where models can be automatically deployed in an elastic manner using auto-scaling.