Journal of Inforamtion Science and Engineering, Vol. 16, No. 2, pp. 271-290 (March 2000)

A New Log-Based Approach to Independent Recovery in
Distributed Shared Memory Systems*

Jenn-Wei Lin and Sy-Yen Kuo
Department of Electrical Engineering
National Taiwan University
Taipei, Taiwan 106, R.O.C.
E-mail: sykuo@cc.ee.ntu.edu.tw

This paper investigates the problem of rollback recovery in distributed shared memory (DSM) systems. We propose a new log-based recovery approach, which can tolerate multiple node failures. The recovery approach employs an independent checkpointing technique and a new logging scheme. The independent checkpointing technique periodically interrupts the execution of a node to save the nodes state. The new logging scheme takes advantage of the DSMs unique properties to reduce the logging overhead. Based on the proposed recovery approach, the pre-failure state of a faulty node can be deterministically created without involving any fault-free node. In addition, some consistency information may be lost after a node becomes faulty. To reconstruct the lost consistency information, we also present an efficient consistency reconstruction method in this paper. Finally, extensive trace-driven simulations are performed to show the effectiveness of the new logging scheme.

Keywords: rollback recovery, distributed shared memory, independent checkpointing, logging, trace-driven simulation

Received March 9, 1999; revised July 19, 1999; accepted September 8, 1999.
Communicated by Chyi-Nan Chen.
*This research was supported by the National Science Council, Taiwan, R.O.C., under Grant NSC 84-2213-E-002-035.