We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Kerrighed: Flexible distributed checkpoint/restart

Formal Metadata

Title
Kerrighed: Flexible distributed checkpoint/restart
Title of Series
Number of Parts
97
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Process checkpoint consists in saving the state of a running process, so that the process can be restarted at any time later. Uses include fault tolerance, job suspend that frees memory resources, process live-migration across physical machines. Checkpoint services may checkpoint only single processes as well as full operating systems with processes, file systems, socket states, etc. This talk will present Kerrighed's application checkpoint/restart and show its advantages in flexibility over other checkpoint services. Kerrighed is a Single System Image operating system for clusters. It offers the view of a unique SMP machine on top of a cluster of standard PCs. Kerrighed is implemented as an extension to the Linux operating system (a set of modules and a patch to the kernel). Current development version is based on Linux 2.6.30. Main available features are: ◦Cluster wide process management with customizable load balancing over the cluster (through process migration and remote forking) ◦Cluster wide shared memory ◦Application checkpointing ◦Node addition/removal