Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21] [ 22] [ 23] [ 24]

@

Journal of Information Science and Engineering, Vol. 26 No. 2, pp. 409-426 (March 2010)

On Parallelizing H.264/AVC Rate-Distortion Optimization Baseline Profile Encoder*

JING-XIN WANG+, YUNG-CHANG CHIU++, ALVIN W. Y. SU+ AND CE-KUEN SHIEH++
+Department of Computer Science and Information Engineering
++Department of Electrical Engineering
National Cheng Kung University
Tainan, 701 Taiwan

A H.264/AVC encoder can incorporate many coding schemes, such as rate-distortion optimization (RDO), into its design to improve its compression performance, dramatically raising computational complexity. With the H.264/AVC RDO encoder, computation time is primarily spent calculating the rate-distortion cost in choosing the optimal coding mode for both inter and intra coding modes. Parallel computation is one of the ways to speed up the encoder. However, calculating rate-distortion costs requires a great amount of reference data obtained from coded adjacent macroblocks in order to maintain the coding efficiency established by the JM encoder. This is an undesirable property for any parallel computing strategy. The transmission of such a large amount of reference data, as well as the frequency of transmission between processing nodes, reduces the speed of the entire encoding process. Thus, it may become necessary to drop part of the reference data and decrease the frequency of transmission in order to reduce the traffic. In the investigation of this problem, this study uses three different parallel schemes for the implementation of the H.264/AVC RDO encoder. These schemes are each run over a software DSM-based (distributed shared memory) PC cluster system consisting of 1 to 5 PC computers (one master node, with or without one to several slave processing nodes). The amount of data to be exchanged among processing nodes is analyzed for each scheme. In addition, the PSNR performance and the number of speedup results are provided for each scheme. Experiments show that considerable reduction in coding gain is expected, as more information is dropped. In lower bit rate cases, performance is reduced to the level of a regular H.264 encoder. Nevertheless, this paper provides a good reference for implementing such an encoder utilizing a cluster computing system.

Keywords: H.264/AVC, rate-distortion optimization, distributed shared memory system, parallel video encoder, cluster computing system

Full Text () Retrieve PDF document (201003_06.pdf)

Received April 11, 2008; revised March 9, 2009; accepted June 4, 2009.
Communicated by Liang-Gee Chen.
* This work was sponsored by the National Science Council of Taiwan, under project No. 96-2221-E-006-193- MY3.