Previous [1] [2] [3] [4] [5]

Journal of Inforamtion Science and Engineering, Vol.6 No.2, pp.101-116 (June 1990)
On Segmentation and Recognition of Connected
Spoken Digits Based on a Neural Network Model*

Chung-Hsien Wu, Jhing-Fa Wang, Ruey-Ching Shyu
and Jau-Yien Lee
Department of Electrical Engineering
National Cheng Kung University
Tainan, Taiwan, Republic of China

In this paper, an automatic connected digits segmentation and recognition system based on a neural network model is proposed. A backpropagation learning algorithm is employed to train these networks. The main new idea for segmentation is to classify the energy, spectral and pitch-period transitions within a data window to determine the boundaries between syllables. These feature transitions are used as the input patterns for training the segmentation network. The segmented syllables are then used as the basic units in the training and recognition process. In speaker-independent segmentation experiments, ten digits (0-9) and syllables spoken in Mandarin are used as test patterns, while only ten digits are used in speaker-dependent recognition experiments. With an average speaking rate of 160 digits per minute, a coincidence rate of 95.7H and a recognition rate of 97.2H, can be achieved.

Keywords: segmentation, recognition, neural network, backpropagation

Received August 14, 1989; revised December 31, 1989.
Communicated by Lin-Shan Lee.
*Parts of this paper were presented at the 1990 IEEE International Symposium on Information Theory, San Diego, California, January 1990.