Collectl: A system monitoring tool like no other

Zitieren

openSUSE

Seger, Mark

Formale Metadaten

Titel

Collectl: A system monitoring tool like no other

Serientitel

openSUSE Conference 2017

Anzahl der Teile

Autor

Seger, Mark

Mitwirkende

Chaos Computer Club e.V.

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/54487 (DOI)

Herausgeber

openSUSE

Erscheinungsjahr

2017

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Collectl is a comprehensive, fine-grained monitoring tool that collects a vast quantity of system metrics Collectl was developed over a dozen years ago as a very lightweight yet highly detailed system monitoring tool, capable of collecting hundreds system performance metrics as frequently as every second. Its companion tool colplot, provides an easy to use web-based plotting package capable of displaying detailed statistics for multiple systems at the same time. Think of colmux, a third tool, as top-anything across a number of machines. It is capable of displaying anything collectl can collect in top-format, sorted by any column of your choice. For example, say you have a 100 node cluster, with colmux you can look at the disk wait times across thousands of disks sorted sorted from slowest to fasted, allowing you to easily identify bad or hot drives, OR look at memory consumption for leaks. How about a bad NIC consuming a CPU by interrupts? Or how about which process is doing the more disk reads, or write, or page faults? And remember, this is across a cluster. The focus of collectl has always been highly efficient metrics collection and display via a CLI, no pretty pictures and no databases to slow it down. However what it does have is an API to allow it to pass those metrics onto whatever high level tools one may wish to communicate with. For example it has native support for ganglia or graphite over a socket. If you have some other favorite tool it can usually be adapted to communicate with it as well. Unfortunately most centralized tools are easily overwhelmed with fine-grained metrics and can only deal with them at granularities in the 1-min range. Not to worry, collectl has the ability to record and save metrics to local disk at one rate and send simultaneously send them to a central tool at a different rate, making it possible to get a coarser-grained centralized view and if there is a problem, still have access to finer-grained data. Collectl has been used for monitoring some of the largest computing clusters in the world and in the last several years has been enhanced for monitoring Open Stack Clouds. It is currently packaged as part of OpenSUSE.