Keeping the HPC ecosystem working with Spack CI

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/62040 (DOI)

Publisher

FOSDEM VZW

Release Date

2023

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The Spack package manager is widely used by HPC sites, users, and developers to install HPC software, and the Spack project began offering a public binary cache in June of 2022. The cache includes builds for x86_64, Power, and aarch64, as well as for AMD and NVIDIA GPUs and Intel's oneapi compiler. Currently, the system handles nearly 40,000 builds per week to maintain a core set of Spack packages. Keeping this many different stacks working continuously has been a challenge, and this talk will dive into the build infrastructure we use to make it happen. Spack is hosted on GitHub, but the CI system is orchestrated by GitLab CI in the cloud. Builds are automated and triggered by pull requests, with runners both in the cloud and on bare metal. We will talk about the architecture of the CI system, from the user-facing stack descriptions in YAML to backend services like Kubernetes, Karpenter, S3, CloudFront, and the challenges of tuning runners to give good build performance. We'll also talk about how we've implemented security in a completely PR-driven CI system, and the difficulty of serving all the relevant HPC platforms when most commits are from untrusted contributors. Finally, we'll talk about some of the architectural decisions in Spack itself that had to change to better support CI.