Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21] [ 22] [ 23] [ 24]

í@

Journal of Information Science and Engineering, Vol. 26 No. 2, pp. 505-525 (March 2010)

An Automated Term Definition Extraction System Using the Web Corpus in the Chinese Language

FANG-YIE LEU AND CHIH-CHIEH KO
Department of Computer Science
Tunghai University
Taichung, 407 Taiwan
E-mail: leufy@thu.edu.tw

This paper proposes a system, named DefExplorer, which analyzes the type of given Chinese terms, extracts term definitions from the Web, and selects answers from noisy Web pages. DefExplorer filters out invalid data with a semantic approach. Two types of candidate sets, common and domain specific, are employed to cluster similar candidates into groups. Different approaches are also deployed to evaluate candidatesíŽ importance which is the key factor for selecting the best answers from retrieved candidates. Experimental results show that DefExplorer can effectively extract term definitions from the Web, especially for the definitions of out-of-vocabulary terms.

Keywords: definitions, web corpus, information extraction, Chinese language, text mining

Full Text (ą■Ąň└╔) Retrieve PDF document (201003_11.pdf)

Received April 1, 2008; revised February 27 & April 22, 2009; accepted May 7, 2009.
Communicated by Chin-Teng Lin.