| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] | [ 21] | [ 22] | [ 23] | [ 24] |
¡@
JING-XIN WANG+, YUNG-CHANG CHIU++, ALVIN W. Y. SU+ AND CE-KUEN SHIEH++
+Department of Computer Science and Information Engineering
++Department of Electrical Engineering
National Cheng Kung University
Tainan, 701 Taiwan
A H.264/AVC encoder can incorporate many coding schemes, such as rate-distortion
optimization (RDO), into its design to improve its compression performance, dramatically
raising computational complexity. With the H.264/AVC RDO encoder, computation
time is primarily spent calculating the rate-distortion cost in choosing the optimal
coding mode for both inter and intra coding modes. Parallel computation is one of
the ways to speed up the encoder. However, calculating rate-distortion costs requires a
great amount of reference data obtained from coded adjacent macroblocks in order to
maintain the coding efficiency established by the JM encoder. This is an undesirable
property for any parallel computing strategy. The transmission of such a large amount of
reference data, as well as the frequency of transmission between processing nodes, reduces
the speed of the entire encoding process. Thus, it may become necessary to drop
part of the reference data and decrease the frequency of transmission in order to reduce
the traffic. In the investigation of this problem, this study uses three different parallel
schemes for the implementation of the H.264/AVC RDO encoder. These schemes are
each run over a software DSM-based (distributed shared memory) PC cluster system
consisting of 1 to 5 PC computers (one master node, with or without one to several slave
processing nodes). The amount of data to be exchanged among processing nodes is analyzed
for each scheme. In addition, the PSNR performance and the number of speedup
results are provided for each scheme. Experiments show that considerable reduction in
coding gain is expected, as more information is dropped. In lower bit rate cases, performance
is reduced to the level of a regular H.264 encoder. Nevertheless, this paper
provides a good reference for implementing such an encoder utilizing a cluster computing
system.
Received April 11, 2008; revised March 9, 2009; accepted June 4, 2009.
Communicated by Liang-Gee Chen.
* This work was sponsored by the National Science Council of Taiwan, under project No. 96-2221-E-006-193-
MY3.