Huge amount of gene expression data have been generated as a result of the human genomic project. Clustering has been used extensively in mining these gene expression data to find important genetic and biological information. Obtaining high quality clustering results is very challenging because of the inconsistency of the results of different clustering algorithms and noise in the gene expression data. Many clustering algorithms are available and different clustering algorithms may generate different clustering results due to their bias and assumptions. It is a challenging and daunting task for the genomic researchers to choose the best clustering algorithm and generate the best clustering results for their data sets. In this paper, we present a cluster ensemble framework for gene expression analysis to generate high quality and robust clustering results. In our framework, the clustering results of individual clustering algorithm are converted into a distance matrix, these distance matrices are combined and a weighted graph is constructed according to the combined matrix. Then a graph partitioning approach is used to cluster the graph to generate the final clusters. The experiment results indicate that cluster ensemble approach yields better clustering results than the single best clustering algorithm on both synthetic data set and yeast gene expression data set.
|Cite as: Hu, X. and Yoo, I. (2004). Cluster Ensemble and Its Applications in Gene Expression Analysis. In Proc. Second Asia-Pacific Bioinformatics Conference (APBC2004), Dunedin, New Zealand. CRPIT, 29. Chen, Y.-P. P., Ed. ACS. 297-302. |
(local if available)