Let There Be Topology-Awareness in Kube-Scheduler!

Zitieren

Zugehöriges Material

FOSDEM VZW

Sehgal, Swati

Formale Metadaten

Titel

Let There Be Topology-Awareness in Kube-Scheduler!

Untertitel

Enhancing Kubernetes Scheduler

Serientitel

FOSDEM 2021

Anzahl der Teile

637

Autor

Sehgal, Swati

Lizenz

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/14104 (DOI)

Herausgeber

FOSDEM VZW

Erscheinungsjahr

2021

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

With Kubernetes gaining popularity for performance-critical workloads such as 5G, Edge, IoT, Telco, and AI/ML, it is becoming increasingly important to meet stringent networking and resource management requirements of these use cases. Performance-critical workloads like these require topology information in order to use co-located CPU cores and devices. Despite the success of Topology Manager, aligning topology of requested resources, the current native scheduler does not select a node based on it. It's time to solve this problem! We will introduce the audience to hardware topology, the current state of Topology Manager, gaps in the current scheduling process, and prior out-of-tree solutions. We'll explain the workarounds available right now: custom schedulers, creating scheduling extensions, using node selectors, or manually assigning resources semi-automatically. All these methods have their drawbacks. Finally, we will explain how we plan to improve the native scheduler to work with Topology Manager. Attendees will learn both current workarounds, and the future of topology aware scheduling in Kubernetes. Kubernetes has taken the world by storm attracting unconventional workloads such as HPC Edge, IoT, Telco and Comm service providers, 5G, AI/ML and NFV solutions to it. This talk would benefit users, engineers, and cluster admins deploying performance sensitive workloads on k8s. Addition of newer nodes running alongside older ones in data centers results in hardware heterogeneity. Motivated by saving physical space in the data centers, newer nodes are packed with more CPUs, enhanced hardware capabilities. Exposing to use fine grain topology information for optimised workload placement would help service providers and VNF vendors too. We’ll explain numerous challenges encountered in efficiently deploying workloads due to inability to understand the hardware topology of the underlying bare metal infrastructure and scheduling based on it. Scheduler’s lack of knowledge of resource topology can lead to unpredictable application performance, in general under-performance, and in the worst case, complete mismatch of resource requests and kubelet policies, scheduling a pod where it is destined to fail, potentially entering a failure loop. Exposing cluster level topology to the scheduler empowers it to make intelligent NUMA aware placement decisions optimizing cluster wide performance of workloads. This would benefit Telco User Group in kubernetes, kubernetes and the overall CNCF ecosystem enabling improved application performance without impacting user experience.