ML inference acceleration on K8s > with kata containers & AWS Firecracker

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/52450 (DOI)

Herausgeber

FOSDEM VZW

Erscheinungsjahr

2021

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

The Serverless computing paradigm facilitates the use of cloud computing resources by developers without the burden of administering and maintaining infrastructure. This simplification of cloud programming appears ideal (in theory) but the catch is that when someone needs to perform a more complex task, things could get a bit more complicated. Hardware acceleration, for instance, has been a pain point, even for traditional cloud computing programming models: IaaS providers chose dedicated solutions to avoid interference and preserve tenant isolation (device passthrough), while losing one of the most important benefits of virtualization, flexibility in workload placement through live migration. Various solutions have been proposed to overcome this limitation (API remoting, hardware slicing etc.). In the Serverless world though, do we need users to interface with a hardware device directly? Most serverless deployments are backed by containers, however, the most popular (and used) one, AWS Lambda, uses a ligthweight VMM (AWS Firecracker) integrated in the container ecosystem, in order to ensure strict isolation, while maintaining scalability. To this end, enabling hardware acceleration on this kind of deployment incurs the same pain points with traditional cloud infrastructure. Kata containers evolved from clear containers and offer hypervisor support for popular orchestrators container deployments such as docker, Kubernetes etc. Through kata containers, AWS Firecracker VMs can be easily provisioned as Pods on a kubernetes system, serving workloads prepared as container images. We build on the kata container runtime and port the necessary components to support vAccel, a lightweight framework for hardware acceleration on VMs, on Firecracker. In this talk, we briefly go through vAccel, its design principles and implementation, while focusing on the integration with kata-containers and the end-to-end system applicability on ML inference workloads. We present a short patch for kata-containers to support AWS Firecracker v0.23.1, and go through the necessary patching to add the vAccel framework on k8s. Finally, we present a short demo that scales image classification purpose-built microVMs across a working K8s cluster with GPUs. Hardware acceleration for serverless deployments has never been more secure!