Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16]


Journal of Information Science and Engineering, Vol. 32 No. 6, pp. 1613-1634 (November 2016)


Multi-document Summarization using Probabilistic Topic-based Network Models*


CHENG-ZEN YANG, JHIH-SHANG FAN AND YU-FAN LIU
Department of Computer Science and Engineering
Yuan Ze University
Chungli, 32003 Taiwan
E-mail: czyang@syslab.cse.yzu.edu.tw; {s1003305, s1001447}@mail.yzu.edu.tw

Multi-document summarization has obtained much attention in the research domain of text summarization. In the past, probabilistic topic models and network models have been leveraged to generate summaries. However, previous studies do not investigate different combinations of various topic models and network models. This paper describes an integrated approach considering both probabilistic topic models and network models. Two probabilistic topic models and four network models are investigated. We have conducted experiments to evaluate the effectiveness of the proposed approach with the DUC 2004-2007 datasets and make a systematic comparison between two representative topic models, PLSA and LDA. The results show that the PLSA-based network approach outperforms the TF-IDF baseline on all datasets. Moreover, PLSA has better ROUGE performance than LDA for multi-document summarization.

Keywords: multi-document summarization, probabilistic topic models, network models, extraction-based summarization, performance evaluation

Full Text () Retrieve PDF document (201611_12.pdf)

Received August 27, 2015; revised October 19, 2015; accepted December 9, 2015.
Communicated by Hsin-Hsi Chen.
* This work was supported in part by Ministry of Science and Technology, Taiwan, under Grant No. MOST 104-221-E-155-004.