Reconfiguration in Octagonal Mesh-Based Multicomputer Systems with Distributed Checkpointing

Andreas Bauch, Erik Maehle

Abstract

In the field of large multicomputer systems fault tolerance is no longer negligible. For the implementation of fault tolerance in mesh-based systems dynamic redundancy is a suitable approach. One major problem is the reconfiguration of the interconnection network after a fault. This paper presents two reconfiguration schemes for octagonal mesh-based multicomputer systems that are closely related to the distributed checkpointing approach. One scheme is able to reconfigure a 2D-mesh as an application graph in an octagonal 2D-mesh as a machine graph after a single fault, provided the checkpoints are organized as a meander. This reconfiguration can be done with a dilation of 2 and a congestion of 2. The other algorithm reconfigures any application graph in an octagonal mesh as machine graph that was originally embedded with the congestion of 1 under the assumption that the checkpoints are organized as a spiral and only single faults occur. In this case the dilation is increased by a factor of 2 for a square mesh and by a factor of 4 for a rectangular one, while the congestion is 3 in both cases. Also, some practical experiences with a sample implementation of the first reconfiguration scheme and a scheme for more general application graphs on the DAMP multicomputer system are reported.

OriginalspracheEnglisch
Seiten169-180
Seitenumfang12
DOIs
PublikationsstatusVeröffentlicht - 01.12.1995
VeranstaltungProceedings of the 1995 Fault-Tolerant Parallel and Distributed Systems
- Galveston, USA / Vereinigte Staaten
Dauer: 13.06.199614.06.1996
Konferenznummer: 44607

Tagung, Konferenz, Kongress

Tagung, Konferenz, KongressProceedings of the 1995 Fault-Tolerant Parallel and Distributed Systems
Land/GebietUSA / Vereinigte Staaten
OrtGalveston
Zeitraum13.06.9614.06.96

Fingerprint

Untersuchen Sie die Forschungsthemen von „Reconfiguration in Octagonal Mesh-Based Multicomputer Systems with Distributed Checkpointing“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren