We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Running MPI applications on Toro unikernel

Formal Metadata

Title
Running MPI applications on Toro unikernel
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Unikernels aims at improving the way single purpose systems are built with minimalist kernels that the user’s application compiles within. This results in deployments that require less memory, less disk, less cpu and less time to be up and running. Also, the whole system spends most of the time in the user application or doing IO for that single application thus cpu time is used more efficiently. In this presentation, we talk about the use of unikernels for High Performance Computing. We present a work-in-progress that aims at implementing the MPI standard on top of Toro, an open-source non-POSIX unikernel. In this work, we implement a library that conforms to the Open MPI implementation. This library relies on Toro API to implement the MPI functions. In particular, the library leverages Toro’s features like per-CPU memory allocation, cooperative scheduler, thread migration and inter-core communication based on Virtio. During the initialization, Toro creates one instance of the MPI application per core. Each instance is a thread that is migrated to the corresponding core and then executes without any interference. When applications are required to allocate memory, each core has its own memory pool from where memory is allocated. This allows us to keep memory allocation local, thus improving the way cache is used. Also, primitives like MPIGather() or MPIScatter() that require communication between instances are implemented by relying on a new Virtio device named virtio-bus that allows core-to-core communication without locking. At the moment, we have implemented the following APIs: - MPIGather() - MPIScatter() - MPIReduce() - MPIBarrier() The goal of this PoC is to port benchmarks from the osu-microbenchmarking (http://mvapich.cse.ohio-state.edu/benchmarks/) to compare with existing implementations. During the presentation, we present how this is implemented and we demonstrate the use of the current implementation by executing different MPI applications on top of Toro.