Autonomic State Management for Optimistic Simulation Platforms

Alessandro Pellegrini, Roberto Vitali, and Francesco Quaglia

Published in: IEEE Transactions on Parallel and Distributed Systems, 2015
pdf Download PDF

We present the design and implementation of an autonomic state manager (ASM) tailored for integration within optimistic parallel discrete event simulation (PDES) environments based on the C programming language and the executable and linkable format (ELF), and developed for execution on ×86_64 architectures. With ASM, the state of any logical process (LP), namely the individual (concurrent) simulation unit being part of the simulation model, is allowed to be scattered on dynamically allocated memory chunks managed via standard API (e.g., malloc/free). Also, the application programmer is not required to provide any serialization/ deserialization module in order to take a checkpoint of the LP state, or to restore it in case a causality error occurs during the optimistic run, or to provide indications on which portions of the state are updated by event processing, so to allow incremental checkpointing. All these tasks are handled by ASM in a fully transparent manner via (A) runtime identification (with chunk-level granularity) of the memory map associated with the LP state, and (B) runtime tracking of the memory updates occurring within chunks belonging to the dynamic memory map. The co-existence of the incremental and non-incremental log/restore modes is achieved via dual versions of the same application code, transparently generated by ASM via compile/link time facilities. Also, the dynamic selection of the best suited log/ restore mode is actuated by ASM on the basis of an innovative modeling/optimization approach which takes into account stability of each operating mode with respect to variations of the model/environmental execution parameters.

BibTeX Entry:

author = {Pellegrini, Alessandro and Vitali, Roberto and Quaglia, Francesco},
title = {Autonomic State Management for Optimistic Simulation Platforms},
journal = {IEEE Transactions on Parallel and Distributed Systems},
year = {2015},
issn = {1045-9219},
month = jun,
number = {6},
pages = {1560--1569},
volume = {26},
doi = {10.1109/TPDS.2014.2323967},
publisher = {IEEE Computer Society},
series = {TPDS}