Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21]


Journal of Information Science and Engineering, Vol. 31 No. 3, pp. 1133-1148 (May 2015)

Self-Supervised Synonym Extraction from the Web*

Department of Computer Science and Engineering
East China University of Science and Technology
Shanghai, 200237 P.R. China
E-mail:; {zshao; ruantong}

Current synonym extraction methods work in a closed way. Given the problem word and set of target words, researchers have to choose words synonymous with the problem word using features such as lexical patterns and distributional similarities. This paper tries to discover synonyms in an open way and presents a synonym extraction framework based on self-supervised learning. We first analysis the nature of the open method and argue that a trained pattern-independent model for synonym extraction is feasible. We then model the extraction of synonyms from sentences as a sequential labeling problem and automatically generate labeled training samples by using structured knowledge from online encyclopedias and some generic heuristic rules. Finally, we train some Conditional Random Field (CRF) models and use them to extract synonyms from the web. We successfully extract more than 20 million facts, which contain 826,219 distinct pairs of synonyms.

Keywords: synonym extraction, self-supervised learning, sequential labeling, pattern, encyclopedia

Full Text () Retrieve PDF document (201505_20.pdf)

Received August 30, 2013; accepted October 28, 2013.
Communicated by Hsin-Hsi Chen.
* This research is supported by the National Science and Technology Pillar Program of China under Grant No. 2013BAH11F03 and National Nature Science Foundation of China under Grant No. 61003126.