Hsin-Hsi Chen and Yue-Shi Lee
Department of Computer Science and Information Engineering
National Taiwan University
Taipei, Taiwan, R.O.C.
Natural language generation is an indispensable component in many natural language applications. Conventional grammar-based approaches have the drawbacks of rule development cost, rule inconsistency, time-consumption and situation information usage problems. The corpus-based approaches have specific features to overcome these problems. This paper deals with language modeling in natural language generation. It compares the Markov model and word association model from the standpoint of factors such as the parameters, the training corpus size, the training data storage requirement, the correct rates and the speed performance, and proposes different language models for bag generation and makes large experiments to deal with their feasibility. The direction of word association, the word/word linear constraint, the mutual information and the distance of association pairs are considered to enhance the word association model. This paper also presents criteria to measure the language models. The word association model with distance joining of the constraints by the Markov model and word association model has a higher correct rate, lower gradient and higher blurry degree. It can capture both long distance dependency and linear precedence relations, so it is the most powerful one among these models. This model also facilitates the new multi-lingual machine translation design.
Keywords: corpus, language generation, language model, Markov model, word association model
Received June 7, 1994; revised April 15, 1995.
Communicated by Hsi-Jian Lee.