Within this paper a simulation study of coordinated checkpointing protocols for parallel systems using message passing is presented. The aim is to provide an estimation of the overhead that is produced by checkpointing. Furthermore, a comparison of different protocols and their execution on different parallel computing systems is considered. To enable this analysis a simple application model is derived which is used as a representative of a class of number-crunching programs. By means of simulations general statements on the runtime overhead generated by coordinated checkpointing protocols can be given.
|Title of host publication||Dependable Network Computing|
|Editors||Dimiter R. Avresky|
|Number of pages||20|
|Place of Publication||Boston, MA|
|Publication status||Published - 2000|