Phoinix : A Fault-Tolerant Object Service in OMA

Deron Liang, S. C. Chou and S. M. Yuan

TR-IIS-96-009 (Fulltext)


Keywords:
Fault-tolerance, object-oriented programming, OMA, CORBA, distributed computing environment, distributed object services.

Abstract

The Object Management Architecture (OMA) has been recognized as a de facto standard in the development of object services in distributed computing environment. In a distributed system, the provision for failure-recovery is always a vital design issue. However, the fault-tolerant service has not been extensively considered in the current OMA framework, despite the fact that a increasing number of useful common services and common facilities have been adopted in OMA. In this paper, we propose a fault-tolerance developing environment, called Phoinix, which is compatible to the OMA framework. In Phoinix, object services can be developed with embedded fault-tolerance capability to tolerate both hardware and software failures. The fault-tolerance capability in Phoinix is classified into three levels: restart, rollback-recovery and replication; where the fault-tolerance capability enhances as the level increases. Currently, Phoinix is ported on Orbix 3.0 and on SunOS 4.2. Object services provided in the current version of Phoinix are able to tolerate hardware failures with capability up to the level two fault-tolerance, i.e., the level of rollback-recovery. We plan to continue the development of Phoinix so that object services can tolerate not only hardware failures but also software failures, such as process hangs, with all three levels of fault-tolerance.