We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Workflow managers in high-energy physics: enhancing analyses with Snakemake

Formal Metadata

Title
Workflow managers in high-energy physics: enhancing analyses with Snakemake
Title of Series
Number of Parts
798
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Workflow management tools have long been used in scientific computing to organise and operate workflows. Many such tools, e.g., Snakemake, Luigi, and Toil, have grown from the foundation of Make (wherein users define simple rules with interdependent inputs and outputs), incorporating additional features to suit increasingly complex user needs. Initially seeing a widespread uptake in bioinformatics, workflow managers have become commonplace in many fields, for example, high-energy physics (HEP). Analyses in HEP typically consist of many non-trivially related processes with widely varying requirements. Workflow managers can vastly simplify such analyses, providing user-friendly methods to define, review and run analysis workflows. Snakemake has emerged as a leading workflow manager for HEP, with an established user base spread across major experiments. Dialogue between developers and HEP has led to integrations for distributed storage/transfer frameworks, e.g., XRootD, FTP and Amazon S3, and scheduling frameworks, e.g., HTCondor, Slurm, and DRMAA. These integrations enable analysts to better leverage the distributed computing resources made available by experiments, significantly improving the efficiency of HEP analyses. Further collaboration between analysts and developers has seen Snakemake form the core of several standardised analysis frameworks aimed at improving analysis reproducibility such as REANA. This contribution discusses the current use of workflow managers in HEP, including best practices for their application. Additionally, the anticipated requirements of analysts are considered within the context of ever-increasing data scales in HEP.