Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11]

@

Journal of Information Science and Engineering, Vol. 21 No. 6, pp. 1157-1177 (November 2005)

An Efficient k-Means Clustering Algorithm Using Simple Partitioning*

Ming-Chuan Hung, Jugpin Wu+, Jin-Hua Chang and Don-Lin Yang
Electrical Engineering Department
Department of Information Engineering and Computer Science
+Department of Statistics
Feng Chia University
Taichung, 407 Taiwan

The k-means algorithm is one of the most widely used methods to partition a dataset into groups of patterns. However, most k-means methods require expensive distance calculations of centroids to achieve convergence. In this paper, we present an efficient algorithm to implement a k-means clustering that produces clusters comparable to slower methods. In our algorithm, we partition the original dataset into blocks; each block unit, called a unit block (UB), contains at least one pattern. We can locate the centroid of a unit block (CUB) by using a simple calculation. All the computed CUBs form a reduced dataset that represents the original dataset. The reduced dataset is then used to compute the final centroid of the original dataset. We only need to examine each UB on the boundary of candidate clusters to find the closest final centroid for every pattern in the UB. In this way, we can dramatically reduce the time for calculating final converged centroids. In our experiments, this algorithm produces comparable clustering results as other k-means algorithms, but with much better performance.

Keywords: clustering, k-means algorithm, centroid, k-d tree, data mining

Full Text () Retrieve PDF document (200511_04.pdf)

Received April 30, 2004; revised November 23, 2004; accepted February 3, 2005.
Communicated by Ding-Zhu Du.
* This work was supported by the National Science Council, Taiwan, under contract No. NSC 92-2213-E-035-039.