| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] |
¡@
Guang-Yi Wang, Mau-Tsuen Yang, Cheng-Chin Chiang and Wen-Kai Tai
Department of Computer Science and Information Engineering
National Dong Hwa University
Hualien, 974 Taiwan
E-mail: mtyang@mail.ndhu.edu.tw
In this paper, we utilized Hidden Markov Model (HMM) as a mapping mechanism
between two different kinds of correlated signals. Specifically, we developed a voicedriven
talking head system by exploiting the physical relationships between the shape of
the mouth and the sound that is produced. The proposed system can be easily trained and
a talking head can be efficiently animated. In the training phase, the Mel-scale Frequency
Cepstral Coefficients (MFCC) were analyzed from audio signals and the Facial
Animation Parameters (FAP) were extracted from video signals. Then both audio and
video features were integrated to train a single HMM. In the synthesis phase, the HMM
was used to correlate a completely novel audio track to a FAP sequence for face synthesis
with the help of Facial Animation Engine (FAE). The experiments demonstrated the
effects of the proposed voice-driven talking head on both man and woman, with two
kinds of styles (speaking and singing) and using three kinds of languages (Chinese, English
and Taiwanese). The possible applications of the proposed system are computer
aided instruction, online guide, virtual conference, lip synchronization, human computer
interaction and so on.
Received August 16, 2005; accepted January 17, 2006.
Communicated by Jhing-Fa Wang, Pau-Choo Chung and Mark Billinghurst.
*This research was supported in part by the National Science Council of Taiwan, R.O.C., under grant No. NSC
95-2221-E-259-027-MY2.