Facets of Reproducible research on Kubernetes

Cite

Related Material

Plain Schwarz

Watson, Sophie Benton, William

Formal Metadata

Title

Facets of Reproducible research on Kubernetes

Title of Series

Berlin Buzzwords 2021

Number of Parts

Author

Watson, Sophie

Benton, William

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/67357 (DOI)

Publisher

Plain Schwarz

Release Date

2021

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Kubernetes and software engineering practice are quietly revolutionizing data science by providing practitioners with better infrastructure and more disciplined habits, and many tools build on these primitives and practices to make machine learning deployments on Kubernetes simple, portable, and scalable. However, bringing engineering discipline to data science workflows turns out to be a thorny problem, and reproducible research is harder to achieve than we might assume. In this talk, we’ll examine the problem of reproducible research from several angles and present tools we’ve built on Kubernetes that address different facets of the problem. You’ll see how to treat Jupyter notebooks as real software artifacts -- not merely as ad hoc environments for discovery -- and learn about what that mindset change entails. You’ll see how we build workflows from notebooks, how we automatically generate model services with CI/CD pipelines, and the tools we use to generate and track metrics to identify concept drift. You’ll learn about some surprising challenges of reproducibility and learn why some convenient model operationalization workflows might require heroic practitioner discipline to produce consistent results.