| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] | [ 21] | [ 22] | [ 23] | [ 24] |
¡@
ZAFER ERENEL, HAKAN ALTINCAY AND EKREM VAROGLU
Department of Computer Engineering
Eastern Mediterranean University
Famagusta, Northern Cyprus via Mersin 10, Turkey
E-mail: {zafer.erenel; hakan.altincay; ekrem.varoglu}@emu.edu.tr
In this paper, the behaviors of leading symmetric and asymmetric term weighting
schemes are analyzed in the context of text categorization. This analysis includes their
weighting patterns in the two dimensional term occurrence probability space and the dynamic
ranges of the generated weights. Additionally, one of the newly proposed term selection
schemes, multi-class odds ratio, is considered as a potential symmetric weighting
scheme. Based on the findings of this study, a novel symmetric weighting scheme derived
as a function of term occurrence probabilities is proposed. The experiments conducted on
Reuters-21578 ModApte Top10, WebKB, 7-Sectors and CSTR2009 datasets indicate that the
proposed scheme outperforms other leading schemes in terms of macro- averaged and
micro-averaged F1 scores.
Received December 10, 2009; revised March 1, 2010; accepted March 23, 2010.
Communicated by Chin-Teng Lin.
* The numerical calculations reported in this paper were partly performed at TUBITAK ULAKBIM, High Performance
and Grid Computing Center (TR-Grid e-Infrastructure). This work was supported by the research
grant MEKB-09-02 provided by the Ministry of Education and Culture of Northern Cyprus and the preliminary
version of it was presented in the 2009 International Conference on Soft Computing, Computing with
Words and Perceptions in System Analysis, Decision and Control.