Ray: Scalability from a Laptop to a cluster

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/49929 (DOI)

Publisher

EuroPython

Release Date

2020

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Ray is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications from a laptop to a cluster. While broadly applicable, it was developed to solve the unique performance challenges of ML/AI systems, such as the heterogeneous task scheduling and state management required for hyperparameter tuning and model training, running simulations when training reinforcement learning (RL) models, and model serving. Ray is now used in many production deployments. I'll explain the problems that Ray solves for cluster-wide scaling of general Python applications and for specific examples, like RL workloads. Ray’s features include rapid scheduling and execution of “tasks” and management of distributed state, such as model parameters during training. I'll compare Ray to other libraries for distributed Python. This talk is for you if you need to scale your Python applications to a cluster and you want a robust, yet easy-to-use API to do it. You don't need to be a distributed systems expert to use Ray. You'll learn when to use Ray versus alternatives, how it’s used in several open source systems, and how to use it in your projects.