Abstract
Within this paper a simulation study of coordinated checkpointing protocols for parallel systems using message passing is presented. The aim is to provide an estimation of the overhead that is produced by checkpointing. Furthermore, a comparison of different protocols and their execution on different parallel computing systems is considered. To enable this analysis a simple application model is derived which is used as a representative of a class of number-crunching programs. By means of simulations general statements on the runtime overhead generated by coordinated checkpointing protocols can be given.
Original language | English |
---|---|
Title of host publication | Dependable Network Computing |
Editors | Dimiter R. Avresky |
Number of pages | 20 |
Volume | 538 |
Place of Publication | Boston, MA |
Publisher | Springer US |
Publication date | 2000 |
Pages | 359-378 |
ISBN (Print) | 978-1-4613-7053-6 |
ISBN (Electronic) | 978-1-4615-4549-1 |
DOIs | |
Publication status | Published - 2000 |