| [Previous | [1] | [2] | [3] | [4] | [5] | [6] | [7] | [8] |
Jenn-Wei Lin and Sy-Yen Kuo
Department of Electrical Engineering
National Taiwan University
Taipei, Taiwan 106, R.O.C.
E-mail: sykuo@cc.ee.ntu.edu.tw
This paper investigates the problem of rollback recovery in distributed shared memory (DSM) systems. We propose a new log-based recovery approach, which can tolerate multiple node failures. The recovery approach employs an independent checkpointing technique and a new logging scheme. The independent checkpointing technique periodically interrupts the execution of a node to save the node’s state. The new logging scheme takes advantage of the DSM’s unique properties to reduce the logging overhead. Based on the proposed recovery approach, the pre-failure state of a faulty node can be deterministically created without involving any fault-free node. In addition, some consistency information may be lost after a node becomes faulty. To reconstruct the lost consistency information, we also present an efficient consistency reconstruction method in this paper. Finally, extensive trace-driven simulations are performed to show the effectiveness of the new logging scheme.
Keywords: rollback recovery, distributed shared memory, independent checkpointing, logging, trace-driven simulation
Received March 9, 1999; revised July 19, 1999; accepted September 8, 1999.
Retrieve PDF document (200003_06.pdf : 3,563,874 bytes)
Communicated by Chyi-Nan Chen.
*This research was supported by the National Science Council, Taiwan, R.O.C., under Grant NSC 84-2213-E-002-035.