| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] |
¡@
Ming-Chuan Hung, Jugpin Wu+, Jin-Hua Chang and Don-Lin Yang
Electrical Engineering Department
Department of Information Engineering and Computer Science
+Department of Statistics
Feng Chia University
Taichung, 407 Taiwan
The k-means algorithm is one of the most widely used methods to partition a dataset
into groups of patterns. However, most k-means methods require expensive distance
calculations of centroids to achieve convergence. In this paper, we present an efficient
algorithm to implement a k-means clustering that produces clusters comparable to slower
methods. In our algorithm, we partition the original dataset into blocks; each block unit,
called a unit block (UB), contains at least one pattern. We can locate the centroid of a
unit block (CUB) by using a simple calculation. All the computed CUBs form a reduced
dataset that represents the original dataset. The reduced dataset is then used to compute
the final centroid of the original dataset. We only need to examine each UB on the
boundary of candidate clusters to find the closest final centroid for every pattern in the
UB. In this way, we can dramatically reduce the time for calculating final converged
centroids. In our experiments, this algorithm produces comparable clustering results as
other k-means algorithms, but with much better performance.
Received April 30, 2004; revised November 23, 2004; accepted February 3, 2005.
Communicated by Ding-Zhu Du.
* This work was supported by the National Science Council, Taiwan, under contract No. NSC
92-2213-E-035-039.