Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14]


Journal of Information Science and Engineering, Vol. 30 No. 1, pp. 1-23 (January 2014)

Clustering Non-Ordered Discrete Data*

1Department of Computer Science and Engineering
Michigan State University
MI 48824-1226, USA
2Department of Computer Science and Engineering
Sogang University
Seoul 121-742, Korea
3School of Information Technology
Indian Institute of Technology
Kharagpur 721302, India

Clustering in continuous vector data spaces is a well-studied problem. In recent years there has been a significant amount of research work in clustering categorical data. However, most of these works deal with market-basket type transaction data and are not specifically optimized for high-dimensional vectors. Our focus in this paper is to efficiently cluster high-dimensional vectors in non-ordered discrete data spaces (NDDS). We have defined several necessary geometrical concepts in NDDS which form the basis of our clustering algorithm. Several new heuristics have been employed exploiting the characteristics of vectors in NDDS. Experimental results on large synthetic datasets demonstrate that the proposed approach is effective, in terms of cluster quality, robustness and running time. We have also applied our clustering algorithm to real datasets with promising results.

Keywords: clustering, data mining, categorical data, non-ordered discrete data, vector data

Full Text () Retrieve PDF document (201401_01.pdf)

Received November 21, 2012; revised February 1, 2013; accepted March 19, 2013.
Communicated by Hsin-Min Wang.
* This research was partially supported by a grant from The Department of Science & Technology, International Cooperation Division, Government of India, Technology Bhavan, New Delhi, India 110016. This research was also partially supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (Grant No. 2010-0023101).