Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19]

@

Journal of Information Science and Engineering, Vol. 30 No. 4, pp. 1149-1166 (July 2014)


Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis


WEIXUN GAO1 AND QIYING CAO1,2
1School of Information Science and Technology
2College of Computer Science and Technology
Donghua University
Shanghai, 200051 P.R. China

Speaker adaptation in speech synthesis transforms a source utterance to a target utterance that differs from the source in terms of voice characteristics. In this paper, we employ vocal tract length normalization, which is generally used in speech recognition to remove individual speaker characteristics, to speaker adaptation in speech synthesis. We propose a frequency warping approach based on a time-varying bilinear function to reduce the weighted spectral distance between the source speaker and the target speaker. The warped spectra of the source speaker are then converted to line spectrum pairs to train hidden Markov models (HMM). HMMs are further adapted by algorithms based on maximum likelihood linear regression with the target speakers data. The experimental results show that our frequency warping approach can make the warped spectra of the source speaker closer to the target speaker, and the resultant adapted HMMs perform better than the HMMs trained by unwrapped spectra in terms of synthesized speech naturalness and speaker similarity.

Keywords: frequency warping, VTLN, speaker adaptation, HMM-based speech synthesis, MLLR

Full Text () Retrieve PDF document (201407_13.pdf)

Received March 1, 2012; revised May 13 & August 7, 2012; accepted September 4, 2012.
Communicated by Hsin-Min Wang.