Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12]


Journal of Information Science and Engineering, Vol. 20 No. 4, pp. 665-677 (July 2004)

An Efficient Approach to Identifying and Validating
Clusters in Multivariate Datasets
with Applications in Gene Expression Analysis

Vincent Shin-Mu Tseng and Ching-Ping Kao
Department of Computer Science and Information Engineering
National Cheng Kung University
Tainan, 701 Taiwan

Gene expression data analysis has become an important topic in bioinformatics due to its wide application in the biomedical industry. Effective analysis of gene expression data is an essential part of various data mining methods, especially the clustering techniques. Various kinds of clustering methods have been proposed, yet they do not satisfy for the requirements of high efficiency, high quality and automation in the mining of gene expression data. In this paper, we propose an efficient and automatic clustering approach that is suitable for gene expression analysis. The proposed approach primarily employs similarity-matrix based clustering techniques, complemented by new heuristics for reducing the computation cost. In particular, a novel validation technique is incorporated for evaluating the quality of the discovered gene expression patterns. Because it includes empirical evaluation of different gene expression datum, the proposed approach is able perform better than other methods in terms of efficiency, clustering quality and automation.

Keywords: data mining, clustering, gene expression, microarray, validation technique

Full Text () Retrieve PDF document (200407_05.pdf)

Received November 7, 2002; revised June 10 & December 26, 2003; accepted February 2, 2004. Communicated by Wen-Lian Hsu.