Cheng-Huang Tung and Hsi-Jian Lee
Department of Computer Science and Information Engineering
National Chiao Tung University,
Hsinchu, Taiwan 300, R.O.C.
This paper presents an efficient computation method for speeding up contextual postprocessing in Chinese character recognition systems. In contextual postprocessing, the words contained in the candidate character sets generated by a recognition system have to be found in order to construct a word transition graph. A language model and be used to find the most promising sentence hypothesis from the word transition graph. Finding the words contained in the candidate character sets is generally time-consuming. To accomplish the task more efficiently, the words in the dictionary are organized according to their first two characters, which form a two-character index array. Because the size of the two-character index array is much greater than the number of words in the dictionary, the two-character index array is still sparse. We compress the index array by applying a row displacement method. A compression rate of 224 can be obtained. Experimental results show that the proposed method utilizing a well-organized dictionary can find words very efficiently and that the time consumed is almost independent of the word length. Thus, our method provides a more practical technique for contextual postprocessing.
Keywords: contextual postprocessing, two-character index array, sparse array compression, language model
Received August 9, 1994; revised February 9, 1995.
Communicated by Jhing-Fa Wang.