Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Stable Checkpointing in Distributed Systems without Shared Disks

Peter Sobe

Abstract

Interacting processes an distributed systems save their checkpoints on local disks for efficiency reasons. But, because local checkpoints get unavailable with failing hosts, redundancy schemes similar to RAID-like storage schemes have to be used. In such systems, checkpoints are stable under a particular fault model because they can get reconstructed in the distributed system. In this paper, two variants of stable checkpoint storage are compared, (a) parity grouping over local checkpoints and (ii) RAID-like distribution of each checkpoint using a software based distributed storage system. An analysis is given to compare costs for collective checkpoint creation, recovery of a single process and rollback of all processes. The results show that despite the differences in detail, checkpointing using a distributed storage system is a reasonable solution.

OriginalspracheEnglisch
TitelProceedings International Parallel and Distributed Processing Symposium
Herausgeber (Verlag)IEEE
Erscheinungsdatum01.01.2003
Aufsatznummer1213392
ISBN (Print) 0-7695-1926-1
DOIs
PublikationsstatusVeröffentlicht - 01.01.2003
VeranstaltungInternational Parallel and Distributed Processing Symposium - Nice, Frankreich
Dauer: 22.04.200326.04.2003
Konferenznummer: 115724

UN SDGs

Dieser Output leistet einen Beitrag zu folgendem(n) Ziel(en) für nachhaltige Entwicklung

  1. SDG 9 – Industrie, Innovation und Infrastruktur
    SDG 9 – Industrie, Innovation und Infrastruktur

Fingerprint

Untersuchen Sie die Forschungsthemen von „Stable Checkpointing in Distributed Systems without Shared Disks“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren