Reaching Efficient Fault-Tolerance for Cooperative Applications

Abstract

Cooperative applications are widely used, e.g. as parallel calculations or distributed information processing systems. Whereby such applications meet the users demand and offer a performance improvement, the susceptibility to faults of any used computer node is raised. Often a single fault may cause a complete application failure. On the other hand, we the redundancy in distributed systems can be utilized for fast fault detection and recovery. So, we followed an approach that is based on duplication of each application process to detect crashes and faulty functions of single computer nodes. We concentrate on two aspects of efficient fault-tolerance - fast fault detection and recovery without delaying the application progress significantly. The contribution of this work is first a new fault detecting protocol for duplicated processes. Secondly, we enhance a roll forward recovery scheme so that it is applicable to a set of cooperative processes in conformity to the protocol.

Original languageEnglish
Title of host publication Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000
Number of pages10
PublisherIEEE
Publication date01.01.2000
Pages 48-57
ISBN (Print)0-7695-0553-8
DOIs
Publication statusPublished - 01.01.2000
EventThe 4th IEEE International Computer Performance and Dependability Symposium
- Chicago, United States
Duration: 27.03.200029.03.2000
Conference number: 56663

Fingerprint

Dive into the research topics of 'Reaching Efficient Fault-Tolerance for Cooperative Applications'. Together they form a unique fingerprint.

Cite this