Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18]


Journal of Information Science and Engineering, Vol. 22 No. 5, pp. 1059-1075 (September 2006)

A Talking Face Driven by Voice using Hidden Markov Model*

Guang-Yi Wang, Mau-Tsuen Yang, Cheng-Chin Chiang and Wen-Kai Tai
Department of Computer Science and Information Engineering
National Dong Hwa University
Hualien, 974 Taiwan

In this paper, we utilized Hidden Markov Model (HMM) as a mapping mechanism between two different kinds of correlated signals. Specifically, we developed a voicedriven talking head system by exploiting the physical relationships between the shape of the mouth and the sound that is produced. The proposed system can be easily trained and a talking head can be efficiently animated. In the training phase, the Mel-scale Frequency Cepstral Coefficients (MFCC) were analyzed from audio signals and the Facial Animation Parameters (FAP) were extracted from video signals. Then both audio and video features were integrated to train a single HMM. In the synthesis phase, the HMM was used to correlate a completely novel audio track to a FAP sequence for face synthesis with the help of Facial Animation Engine (FAE). The experiments demonstrated the effects of the proposed voice-driven talking head on both man and woman, with two kinds of styles (speaking and singing) and using three kinds of languages (Chinese, English and Taiwanese). The possible applications of the proposed system are computer aided instruction, online guide, virtual conference, lip synchronization, human computer interaction and so on.

Keywords: talking head, audio-to-visual mapping, HMM, FAP, lip synchronizatio

Full Text () Retrieve PDF document (200609_05.pdf)

Received August 16, 2005; accepted January 17, 2006.
Communicated by Jhing-Fa Wang, Pau-Choo Chung and Mark Billinghurst.
*This research was supported in part by the National Science Council of Taiwan, R.O.C., under grant No. NSC 95-2221-E-259-027-MY2.