| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] |
¡@
HUNG-YAN GU AND CHUNG-CHIEH YANG*
Department of Computer Science and Information Engineering
*Institute of Electrical Engineering
National Taiwan University of Science and Technology
Taipei, 106 Taiwan
In this paper, a method is proposed to generate pitch-contours for Mandarin speech
synthesis. In this method, an HMM (hidden Markov model) is used to model the prosodic
states implicitly stayed and a syllable¡¦s pitch-contour is treated as an observation
generated from a prosodic state. Such an HMM is called a syllable pitch-contour HMM
(SPC-HMM). For training the SPC-HMM, we developed a feasible method to normalize
a pitch-contour¡¦s height. After normalization, each training syllable¡¦s pitch-contour is
vector quantized and represented with a VQ (vector quantization) code. Then, the VQ
code and its adjacent syllables¡¦ lexical tones are combined to define an observation
symbol for training the SPC-HMM. In the synthesis phase, a sentence-wide most probable
observation symbol sequence is searched on the SPC-HMM using a dynamic programming
algorithm proposed here. Then, the observation symbol found for a syllable is
decoded to obtain its pitch-contour VQ code. We conducted testing experiments to determine
the size of a pitch-contour codebook and the number of states for an SPC-HMM.
The results indicate that setting the codebook size to eight and using six states are the
best choices. Also, we conducted perception tests to compare the naturalness levels of
synthetic speech files. The results show that the two generation modes for operating an
SPC-HMM studied here are comparable to each other in naturalness level.
Received January 12, 2010; revised April 21, June 15 & July 20, 2010; accepted August 24, 2010.
Communicated by Chin-Teng Lin.