Develop and deploy a Machine Learning pipeline in 30 minutes with Ploomber

Zitieren

Zugehöriges Material

EuroPython

Blancas, Eduardo

Formale Metadaten

Titel

Develop and deploy a Machine Learning pipeline in 30 minutes with Ploomber

Serientitel

EuroPython 2021

Anzahl der Teile

115

Autor

Blancas, Eduardo

Mitwirkende

Pierfederici, Francesco (Moderation)

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.

Identifikatoren

10.5446/58760 (DOI)

Herausgeber

EuroPython

Erscheinungsjahr

2021

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Development tools such as Jupyter are prevalent among data scientists because they provide an environment to explore data visually and interactively. However, when deploying a project, we must ensure the analysis can run reliably in a production environment like Airflow or Argo; this causes data scientists to move code back and forth between their notebooks and these production tools. Furthermore, data scientists have to spend time learning an unfamiliar framework and writing pipeline code, which severely delays the deployment process. Ploomber solves this problem by providing: A workflow orchestrator that automatically infers task execution order using static analysis. A sensible layout to bootstrap projects. A development environment integrated with Jupyter. Capabilities to export to production systems (Kubernetes, Airflow, and AWS Batch) without code changes. * Who and why * This talk is for data scientists (with experience developing Machine Learning projects) looking to enhance their workflow. Experience with production tools such as Airflow or Argo is not necessary. The talk has two objectives: Advocate for more development-friendly tools that let data scientists focus on analyzing data and take off the overhead of popular production tools. Demonstrate an example workflow using Ploomber where a pipeline is developed interactively (using Jupyter) and deployed without code changes.