TY - GEN
T1 - Spectral feature selection and its application in high dimensional gene expression studies
AU - Wang, Zixing
AU - Qiu, Peng
AU - Xu, Wenlong
AU - Liu, Yin
N1 - Publisher Copyright:
Copyright © 2014 ACM.
PY - 2014/9/20
Y1 - 2014/9/20
N2 - Many variable selection techniques have been proposed for clustering analysis of gene expression data. Motivated by spectral learning, we propose a new filtering method that uses the correlation between features and the eigenspace of sample similarity matrix as the variable selection criteria. Spectral algorithm states that a sample similarity matrix with q strongly connected components tends to have q piecewise almost constant eigenvectors representing a specific partition of the sample space. Using distance correlation metric, our proposed method, spectral correlation (Scorrelation) measures features' correlation with the top q eigenvectors of sample similarity matrix and then infers their ability in differentiating the underlying clusters of samples. Our method has been applied to large-scale gene expression datasets. Compared to other filtering methods, our method is more effective and provides better clustering results in terms of clustering error rate and the reliability of the selected features. Our framework can be easily extended to other types of datasets for addressing clustering and classification problems.
AB - Many variable selection techniques have been proposed for clustering analysis of gene expression data. Motivated by spectral learning, we propose a new filtering method that uses the correlation between features and the eigenspace of sample similarity matrix as the variable selection criteria. Spectral algorithm states that a sample similarity matrix with q strongly connected components tends to have q piecewise almost constant eigenvectors representing a specific partition of the sample space. Using distance correlation metric, our proposed method, spectral correlation (Scorrelation) measures features' correlation with the top q eigenvectors of sample similarity matrix and then infers their ability in differentiating the underlying clusters of samples. Our method has been applied to large-scale gene expression datasets. Compared to other filtering methods, our method is more effective and provides better clustering results in terms of clustering error rate and the reliability of the selected features. Our framework can be easily extended to other types of datasets for addressing clustering and classification problems.
KW - Clustering
KW - Distance correlation
KW - Spectral feature selection
KW - Unsupervised
UR - http://www.scopus.com/inward/record.url?scp=84920729285&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84920729285&partnerID=8YFLogxK
U2 - 10.1145/2649387.2649396
DO - 10.1145/2649387.2649396
M3 - Conference contribution
AN - SCOPUS:84920729285
T3 - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
SP - 314
EP - 320
BT - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PB - Association for Computing Machinery
T2 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014
Y2 - 20 September 2014 through 23 September 2014
ER -