Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10]


Journal of Information Science and Engineering, Vol. 20 No. 2, pp. 379-390 (March 2004)

Management of Fault Tolerance Information for Coordinated
Checkpointing Protocol without Sympathetic Rollbacks

Kwang Sik Chung, YoungJun Lee*, HeoChang Yu** and WonGyu Lee**
Department of Computer Science
University College London
Gower Street, WC1E 6BT, London
*Department of Computer Education
Korea National University of Education
Chungbuk, 363791 Korea
**Department of Computer Science Education
Korea University
Seoul, 136-701 Korea

This paper presents the condition for an extended global recovery line for coordinated checkpointing protocol and a new garbage collection protocol on checkpoints and message logs in order to avoid the sympathetic rollback caused by lost messages. Since previous works assumed the communication channel does not lose the in-transit messages, those works on garbage collection in coordinated checkpointing protocols delete all the checkpoints except for the last checkpoints on each process. But coordinated checkpointing protocol based on the communication protocol with reliability (TCP) causes in-transit messages to be lost when a failure occurs, and lost messages lead to sympathetic rollbacks of faulty processes or related processes. Thus there is a need for management methods of fault tolerance information that can store and delete the coordinated checkpoint and light message log to avoid sympathetic rollback. In this paper, we define the extended global recovery line conditions for garbage collection of checkpoints and message logs for lost messages, and present the new garbage collection algorithm within the extended global recovery line. The proposed algorithm uses piggybacked process information on each message so that the additional messages for garbage collection and extended global recovery line are not needed. Since it relies on the piggybacked checkpoint information in communication message, the proposed garbage collection algorithm is called 'the lazy garbage collection algorithm'.

Keywords: coordinated checkpointing protocol, message log, garbage collection, sympathetic rollback, garbage collection

Full Text () Retrieve PDF document (200403_09.pdf)

Received June 3, 2002; revised July 15, 2003; accepted August 8, 2003.
Communicated by Jang-Ping Sheu.