Chung-Hsien Wu, Jau-Hung Chenand Jhing-Fa Wang
Institute of Information Engineering
National Cheng Kung University,
Tainan, Taiwan, R.O.C.
In this paper, a robust speaker identification system is proposed. In this system, continuous Mandarin digits are adopted, and a set of 20 digit strings is designed to obtain all the combinations of coarticulation between two digits. The instantaneous and transitional Line Spectrum Pair (LSP) frequencies are combined as the identification features to obtain better performance. In the training process, a fast-search K-means algorithm is proposed to reduce the time for vector quantization (VQ). In the identification process, a hierarchical identification scheme is proposed to improve the identification performance. A speaker candidate selector in the first level is proposed tp reduce the identification time. In the second level, a Lateral Inhibition Gaussian (LIG) network is proposed to give better discrimination among speakers. In the experiments, using a codebook size of 128 vectors for each speaker VQ codebook, an average identification rate of 98.3% for token length from 1 to 10 among a population of the 29 speakers (24 male, 5 female) was obtained. As for the experiments on speed, the fast-search k-means algorithm took about half the time compared to the K-means algorithm, and the speaker candidate selector saved about half the total identification time.
Keywords: speaker identification, mandarin digits, line spectrum pair, fast-search K-means algorithm, vector quantization, lateral inhibition Gaussian network
Received July 13, 1994; revised April 13, 1995.
Communicated by Hsi-Jian Lee.