| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] |
¡@
Jiaming Zhan and Han-Tong Loh
Department of Mechanical Engineering
National University of Singapore
Singapore 119260, Singapore
In this paper, we investigate the use of summarization technique to improve text
classification. As summarization inherently assign more weights to the more important
sentences in an article, this may improve the accuracy of classification of the article. Redundancy
in summaries was reduced to different levels and its effect on classification
performance was investigated. The classification algorithm used here was Support Vector
Machines (SVMs) which has proven to be very effective and robust for text classification
problem. Experimental results showed that summaries with lowest redundancy could improve
the classification performance of Reuters corpus with more than 6% increase on
average F1 measure. In order to explain why summarization can improve the performance
while feature selection makes no sense for SVMs, a further experiment was conducted to
demonstrate the difference between summarization and traditional feature selection techniques.
Received June 5, 2007; revised September 28, 2007; accepted March 20, 2008.
Communicated by Suh-Yin Lee.