| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] | [ 21] | [ 22] | [ 23] | [ 24] | [ 25] |
¡@
CHIA-TAI TSAI, RONG-HONG JAN AND KUOCHEN WANG
Department of Computer Science
National Chiao Tung University
Hsinchu, 300 Taiwan
E-mail: {tai; rhjan; kwang}@cs.nctu.edu.tw
In this paper, we propose a High Availability Open Shortest Path First (HA-OSPF)
router which consists of two OSPF router modules, active and standby, to support a high
availability network. First, we used the continuous-time Markov chain (CTMC) to analyze
the steady-state availability of an HA-OSPF router with one active router and N standby
routers (1 + N redundancy model). Then, with the failure detection and recovery rate considered,
from analytic results, we show that the HA-OSPF router with 1 + 1 redundancy
model, one active and one standby, is the preferred model for enhancing router availability.
We also show that the carrier-grade HA-OSPF router availability (i.e., five-nine availability)
can be achieved under an appropriate combination of the router module failure rate
(£f), repair rate (£g), and the failure detection and recovery rate (£_). Since there is a lack of
research on the integration of the redundancy model, link state information backup, and
failure detection and recovery, we propose a high availability management middleware
(HAM middleware) framework to integrate these three elements. The HAM middleware
consists of Availability Management Framework (AMF) service, Checkpoint service, and
Failure Manager. It supports health check, state information exchange, and failure detection
and recovery. Each HA-OSPF router was designed to have a Linux operating system,
HAM middleware, and OSPF process. We have implemented the HA-OSPF router on a
PC-based system. Experimental results show that the failure detection and recovery times
of the proposed PC-based HA-OSPF router were reduced by 98.76% and 91.45% compared
to those of an industry standard approach, VRRP (Virtual Router Redundancy
Protocol), for a software failure and a hardware failure, respectively. In addition, we have
also implemented the HA-OSPF router on an ATCA (Advanced Telecom Computing
Architecture) platform, which can provide an industrial standardized modular architecture
for an efficient, flexible, and reliable router design. Based on our ATCA-based platform
with 1/£_ = 217 ms for a software failure and 1/£_ = 1066 ms for a hardware failure,
along with the router module data, 1/£f = 7 years and 1/£g = 4 hours, obtained from Cisco,
the availabilities of the proposed ATCA-based HA-OSPF router are 99.99999905% for a
software failure and 99.99999867% for a hardware failure. Therefore, the experimental
results have shown that both our proposed ATCA-based and PC-based HA-OSPF routers
with 1 + 1 redundancy model can easily meet the requirement of carrier-grade availabilities
with five- nine.
Received October 29, 2008; revised December 10, 2009; accepted January 19, 2010.
Communicated by Chung-Ta King.
* The authors would like to express their appreciation to Professor Kishor S. Trivedi for his insightful comments
that help improve the quality of the paper and to Dr. Chien Chen, Dr. Chia-Yuan Huang, Lo-Chuan Hu, and
Ching-Chun Kao for their helpful assistance in conducting experiments. This work was supported in part by
the National Science Council of Taiwan, R.O.C., under Grants No. NSC 96-2219-E-009-023 and NSC 96-
2219-E009-008.