Quantitative simultaneous monitoring of the expression levels of thousands of genes under various experimental conditions is now possible using microarray experiments. The resulting microarray data are very useful for elucidating the functional relationships among genes in the genomes. However, due to the experimental and biological nature of the data, wholegenome functional classification of genes on microarray data remains a challenging machine learning problem. In this paper, we introduce the application of latent semantic analysis (LSA) to microarray expression data for systematic, genome-wide functional classification of genes. In the LSA approach considered here, singular value decomposition is first applied as a dimensionreducing step on the gene expression data, followed by an unsupervised clustering procedure based on vector similarities in the truncated space. Functional classification is then conducted through calling by majority on each of the resulting gene clusters. Using this semi-supervised LSA approach on microarray data, we have performed systematic functional classification on the genes in the partially-annotated yeast genome, annotating more than 1,700 unknown genes into 40 distinct functional classes with promising results.
|Cite as: Ng, S.-K., Zhu, Z. and Ong, Y.-S. (2004). Whole-Genome Functional Classification of Genes by Latent Semantic Analysis on Microarray Data. In Proc. Second Asia-Pacific Bioinformatics Conference (APBC2004), Dunedin, New Zealand. CRPIT, 29. Chen, Y.-P. P., Ed. ACS. 123-129. |
(local if available)