Consistent Checkpointing in Distributed Computations: Theoretical Results and Protocols

Francesco Quaglia



pdf Download PDF

Abstract:
This thesis is focused on the study of consistent checkpointing in distributed computations. The model of the computation is asynchronous. The investigated checkpointing approach is known as communication-induced. In this approach, processes of the distributed computation take checkpoints at their own pace (namely basic checkpoints) and some additional checkpoints (namely forced checkpoints) are induced by a lazy coordination scheme, in order to guarantee consistency of global checkpoints. The lazy coordination is realized by piggybacking control information on application messages. Upon the receipt of a message, the recipient process evaluates a predicate basing on the incoming control information and on its local context; if the predicate is evaluated to TRUE, a forced checkpoint is taken. The thesis reports both theoretical results on this issue and protocols derived from those results.

BibTeX Entry:

@phdthesis{tQuag99,
author = {Quaglia, Francesco},
school = {Sapienza, University of Rome},
title = {Consistent Checkpointing in Distributed Computations: Theoretical Results and Protocols},
year = {1999},
type = {phdthesis},
comment = {Supervisor: B. Ciciani}
}