We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

MALT & NUMAPROF, Memory Profiling for HPC Applications

Formale Metadaten

Titel
MALT & NUMAPROF, Memory Profiling for HPC Applications
Serientitel
Anzahl der Teile
561
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Memory is now a key component to understand and control to extract the best performance from High Performance Computing computer architectures. With an increasing gap between CPU and memory speed, the memory accesses themself now represent one of the largest performance impact which increase the impact of the memory layout on performance. On the other side, the memory size available is still growing with now easily reachable TB memory systems which make a huge space to handle with a management cost which if badly done can become non neglectable. In order to better address those questions I will present two tools I developped during my post-docs arround the HPC fields: MALT [MALloc Tracker] and NUMAPROF. MALT is dedicated to memory management analysis by tracking calls to malloc and annotate the source code of a C/C++/Fortan application. It provides many metrics arround memory management, like the allocations sizes, lifetime, time charts to better understand the memory behavior of you application in them of memory management. The tool comes with the nice web based graphical interface. NUMAPROF is built arround the same code base to provide NUMA (Non Uniform Memory Access) performance analysis by tracking the memory location of every access in a NUMA system by using binary instrumentation. It is usefull to easily track the remote of unbound memory access which can be a source of large performance loss in such systems. The tool has currently mostly been tested on the new Intel Knigh Landing architecture. With TB of now availble memory on large HPC servers and NUMA (Non Uniform Memory Access) architectures, memory management can be on of the biggest and hard to handle bottleneck to reach performance at scale on large systems. During my PhD. I studied and developed a parallel memory allocator specifily designed for NUMA architectures and tuned for large-scale application. On this topic I was able to quickly gain a speedup of 2x on a multi-million line C++ numerical simulation at CEA just by changing the memory allocator (not recompiling the app) when running on a 16-processor machine (128 cores) and making 75 million allocations in 5 minutes. The gains where coming from better handling of the memory management overhead over the OS (Linux) which was not well scaling on this platform and explicit handling of the NUMA topology which mixed where providing the 2x performance gap compared to the best today production memory allocator. For this development I would have been really happy to have a memory profiler to look how the memory is placed and what was the allocation profile of the application. This is what I'm doing with MALT and NUMAPROF by projecting dedicated memory metrics onto the source code by annotating it with many metrics (allocation size, chunk lifetime, NUMA remote memory accesses....). The tool also provide global views on the applications over charts and computed metrics. The two tools also use an uncommon approach for such tools in HPC as it provides a web-based interface using tools like D3JS/Bootstrap/Jquery and exposed by a small python webserver. This permit to fix a big issue in HPC when running remotely as the GUI of the profiler needs to be X-forwared which make it slow, badly themed. Or to run locally without having our source code at the same place. The web server permit to easily ssh-port-forward the interface and eventually to work at many people remotely looking at the same profile. This also provides quickly a nicer rendering with less development overhead. MALT has been developped and financed as a post-doc at the Exascale Computing Research Lab. NUMAPROF was then extrapoled from this work as a side research project at CERN. I will presend shortly both tools and some motivation showing the possibly huge impact on memory management on HPC applications, one with rougthly one million line of codes.