Abstract
In the field of large multicomputer systems fault tolerance is no longer negligible. For the implementation of fault tolerance in mesh-based systems dynamic redundancy is a suitable approach. One major problem is the reconfiguration of the interconnection network after a fault. This paper presents two reconfiguration schemes for octagonal mesh-based multicomputer systems that are closely related to the distributed checkpointing approach. One scheme is able to reconfigure a 2D-mesh as an application graph in an octagonal 2D-mesh as a machine graph after a single fault, provided the checkpoints are organized as a meander. This reconfiguration can be done with a dilation of 2 and a congestion of 2. The other algorithm reconfigures any application graph in an octagonal mesh as machine graph that was originally embedded with the congestion of 1 under the assumption that the checkpoints are organized as a spiral and only single faults occur. In this case the dilation is increased by a factor of 2 for a square mesh and by a factor of 4 for a rectangular one, while the congestion is 3 in both cases. Also, some practical experiences with a sample implementation of the first reconfiguration scheme and a scheme for more general application graphs on the DAMP multicomputer system are reported.
| Original language | English |
|---|---|
| Pages | 169-180 |
| Number of pages | 12 |
| DOIs | |
| Publication status | Published - 01.12.1995 |
| Event | Proceedings of the 1995 Fault-Tolerant Parallel and Distributed Systems - Galveston, United States Duration: 13.06.1996 → 14.06.1996 Conference number: 44607 |
Conference
| Conference | Proceedings of the 1995 Fault-Tolerant Parallel and Distributed Systems |
|---|---|
| Country/Territory | United States |
| City | Galveston |
| Period | 13.06.96 → 14.06.96 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 9 Industry, Innovation, and Infrastructure
Fingerprint
Dive into the research topics of 'Reconfiguration in Octagonal Mesh-Based Multicomputer Systems with Distributed Checkpointing'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver