We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Flux: Solving Exascale Workflow and Resource Challenges

Formal Metadata

Title
Flux: Solving Exascale Workflow and Resource Challenges
Subtitle
Plus - How Open-Source Drives Our Project Design
Title of Series
Number of Parts
637
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Many emerging scientific workflows that target high-end HPC systems require a complex interplay with resource and job management software (RJMS). However, portable, efficient and easy-to-use scheduling of these workflows is still an unsolved problem. In this talk, we present Flux, a next-generation RJMS designed specifically to address the key scheduling challenges of modern workflows in a scalable, easy-to-use, and portable manner. At the heart of Flux lies its ability to be seamlessly nested within batch allocations created by itself as well as other system schedulers (e.g., SLURM, MOAB, LSF, etc), serving the target workflows as their “personal RJMS instances”. In particular, Flux’s consistent and rich set of well-defined APIs portably and efficiently support those workflows that can feature non-traditional patterns such as complex co-scheduling, massive ensembles of small jobs and coordination among jobs in an ensemble. We will also cover how the Flux-Framework project is structured around open-source development, including our use of the Collective Code Construction Contract (C4), RFCs, LGPL, and various online open-source platforms. We discuss how these choices of open-source processes have influenced the repo structure, the code, our collaborations, and even the sub-teams within the project. Expected prior knowledge / intended audience: Audience should have basic knowledge of batch job systems; knowledge of or experience with running scientific workflows is a plus. There will be some background on common workflows in the talk. This will be interesting to HPC users, workflow developers, and admins. Speaker bio: Stephen Herbein is a computer scientist in Livermore Computing at Lawrence Livermore National Laboratory. His research interests include batch job scheduling, parallel IO, and data analytics. He is a part of the Flux team, developing next-generation IO-aware and multi-level schedulers for HPC.