Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20]

@

Journal of Information Science and Engineering, Vol. 25 No. 2, pp. 591-601 (March 2009)

Using Redundancy Reduction in Summarization to Improve Text Classification by SVMs

Jiaming Zhan and Han-Tong Loh
Department of Mechanical Engineering
National University of Singapore
Singapore 119260, Singapore

In this paper, we investigate the use of summarization technique to improve text classification. As summarization inherently assign more weights to the more important sentences in an article, this may improve the accuracy of classification of the article. Redundancy in summaries was reduced to different levels and its effect on classification performance was investigated. The classification algorithm used here was Support Vector Machines (SVMs) which has proven to be very effective and robust for text classification problem. Experimental results showed that summaries with lowest redundancy could improve the classification performance of Reuters corpus with more than 6% increase on average F1 measure. In order to explain why summarization can improve the performance while feature selection makes no sense for SVMs, a further experiment was conducted to demonstrate the difference between summarization and traditional feature selection techniques.

Keywords: text classification, text summarization support vector machines maximal marginal relevance, text mining

Full Text () Retrieve PDF document (200903_16.pdf)

Received June 5, 2007; revised September 28, 2007; accepted March 20, 2008.
Communicated by Suh-Yin Lee.