We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Facets of Reproducible research on Kubernetes

Formal Metadata

Title
Facets of Reproducible research on Kubernetes
Title of Series
Number of Parts
69
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Kubernetes and software engineering practice are quietly revolutionizing data science by providing practitioners with better infrastructure and more disciplined habits, and many tools build on these primitives and practices to make machine learning deployments on Kubernetes simple, portable, and scalable. However, bringing engineering discipline to data science workflows turns out to be a thorny problem, and reproducible research is harder to achieve than we might assume. In this talk, we’ll examine the problem of reproducible research from several angles and present tools we’ve built on Kubernetes that address different facets of the problem. You’ll see how to treat Jupyter notebooks as real software artifacts -- not merely as ad hoc environments for discovery -- and learn about what that mindset change entails. You’ll see how we build workflows from notebooks, how we automatically generate model services with CI/CD pipelines, and the tools we use to generate and track metrics to identify concept drift. You’ll learn about some surprising challenges of reproducibility and learn why some convenient model operationalization workflows might require heroic practitioner discipline to produce consistent results.