Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21] [ 22] [ 23] [ 24] [ 25]

@

Journal of Information Science and Engineering, Vol. 26 No. 6, pp. 2173-2198 (November 2010)

Design and Implementation of High Availability OSPF Router*

CHIA-TAI TSAI, RONG-HONG JAN AND KUOCHEN WANG
Department of Computer Science
National Chiao Tung University
Hsinchu, 300 Taiwan
E-mail: {tai; rhjan; kwang}@cs.nctu.edu.tw

In this paper, we propose a High Availability Open Shortest Path First (HA-OSPF) router which consists of two OSPF router modules, active and standby, to support a high availability network. First, we used the continuous-time Markov chain (CTMC) to analyze the steady-state availability of an HA-OSPF router with one active router and N standby routers (1 + N redundancy model). Then, with the failure detection and recovery rate considered, from analytic results, we show that the HA-OSPF router with 1 + 1 redundancy model, one active and one standby, is the preferred model for enhancing router availability. We also show that the carrier-grade HA-OSPF router availability (i.e., five-nine availability) can be achieved under an appropriate combination of the router module failure rate (f), repair rate (g), and the failure detection and recovery rate (_). Since there is a lack of research on the integration of the redundancy model, link state information backup, and failure detection and recovery, we propose a high availability management middleware (HAM middleware) framework to integrate these three elements. The HAM middleware consists of Availability Management Framework (AMF) service, Checkpoint service, and Failure Manager. It supports health check, state information exchange, and failure detection and recovery. Each HA-OSPF router was designed to have a Linux operating system, HAM middleware, and OSPF process. We have implemented the HA-OSPF router on a PC-based system. Experimental results show that the failure detection and recovery times of the proposed PC-based HA-OSPF router were reduced by 98.76% and 91.45% compared to those of an industry standard approach, VRRP (Virtual Router Redundancy Protocol), for a software failure and a hardware failure, respectively. In addition, we have also implemented the HA-OSPF router on an ATCA (Advanced Telecom Computing Architecture) platform, which can provide an industrial standardized modular architecture for an efficient, flexible, and reliable router design. Based on our ATCA-based platform with 1/_ = 217 ms for a software failure and 1/_ = 1066 ms for a hardware failure, along with the router module data, 1/f = 7 years and 1/g = 4 hours, obtained from Cisco, the availabilities of the proposed ATCA-based HA-OSPF router are 99.99999905% for a software failure and 99.99999867% for a hardware failure. Therefore, the experimental results have shown that both our proposed ATCA-based and PC-based HA-OSPF routers with 1 + 1 redundancy model can easily meet the requirement of carrier-grade availabilities with five- nine.

Keywords: continues time Markov chain, failure detection and recovery mechanism, high availability, OSPF, redundancy model, router availability

Full Text () Retrieve PDF document (201011_15.pdf)

Received October 29, 2008; revised December 10, 2009; accepted January 19, 2010.
Communicated by Chung-Ta King.
* The authors would like to express their appreciation to Professor Kishor S. Trivedi for his insightful comments that help improve the quality of the paper and to Dr. Chien Chen, Dr. Chia-Yuan Huang, Lo-Chuan Hu, and Ching-Chun Kao for their helpful assistance in conducting experiments. This work was supported in part by the National Science Council of Taiwan, R.O.C., under Grants No. NSC 96-2219-E-009-023 and NSC 96- 2219-E009-008.