With the advent of modern package managers for scientific applications(EasyBuild, Spack, etc.) automated building of large amounts of software isbecoming easier, quickly giving rise to issues related to life cyclemanagement of applications. This makes tracking the applications and librariesthat actually get used considerably more important. Existing solutions (moduleload hooks, launch wrappers) do not account for user-built software, are hardto deploy or produce inconclusive results.
OGRT enables the tracking of jobs on a cluster with process-level granularityand without discernible performance penalty. It tracks used shared libraries,environment variables and loaded modules at the moment of process execution.It also supports watermarking executables and shared objects and reading thosewatermarks out of memory at runtime. Gathered information is collected andshipped to various backends.
OGRT aims to be a versatile tool, which can be used to:
* provide a census of used software (including user-built) * troubleshoot problems with programs picking up unexpected shared libraries * retroactively inform users about buggy libraries * overlay process-level data onto existing job monitoring tools * contribute to reproducibility of application runs
This presentation will give an overview of the design and implementation ofOGRT, as well as demoing some of its capabilities when plugged into anElasticsearch backend. |