TY - GEN
T1 - A fast convergence clustering algorithm merging MCMC and EM methods
AU - Matusevich, David Sergio
AU - Ordonez, Carlos
AU - Baladandayuthapani, Veerabhadran
N1 - Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.
PY - 2013
Y1 - 2013
N2 - Clustering is a fundamental problem in statistics and machine learning, whose solution is commonly computed by the Expectation-Maximization (EM) method, which finds a locally optimal solution for an objective function called log-likelihood. Since the surface of the log-likelihood function is non convex, a stochastic search with Markov Chain Monte Carlo (MCMC) methods can help escaping locally optimal solutions. In this article, we tackle two fundamental conflicting goals: Finding higher quality solutions and achieving faster convergence. With that motivation in mind, we introduce an efficient algorithm that combines elements of the EM and MCMC methods to find clustering solutions that are qualitatively better than those found by the standard EM method. Moreover, our hybrid algorithm allows tuning model parameters and understanding the uncertainty in their estimation. The main issue with MCMC methods is that they generally require a very large number of iterations to explore the posterior of each model parameter. Convergence is accelerated by several algorithmic improvements which include sufficient statistics, simplified model parameter priors, fixing covariance matrices and iterative sampling from small blocks of the data set. A brief experimental evaluation shows promising results.
AB - Clustering is a fundamental problem in statistics and machine learning, whose solution is commonly computed by the Expectation-Maximization (EM) method, which finds a locally optimal solution for an objective function called log-likelihood. Since the surface of the log-likelihood function is non convex, a stochastic search with Markov Chain Monte Carlo (MCMC) methods can help escaping locally optimal solutions. In this article, we tackle two fundamental conflicting goals: Finding higher quality solutions and achieving faster convergence. With that motivation in mind, we introduce an efficient algorithm that combines elements of the EM and MCMC methods to find clustering solutions that are qualitatively better than those found by the standard EM method. Moreover, our hybrid algorithm allows tuning model parameters and understanding the uncertainty in their estimation. The main issue with MCMC methods is that they generally require a very large number of iterations to explore the posterior of each model parameter. Convergence is accelerated by several algorithmic improvements which include sufficient statistics, simplified model parameter priors, fixing covariance matrices and iterative sampling from small blocks of the data set. A brief experimental evaluation shows promising results.
KW - Bayesian model
KW - Clustering
KW - EM
KW - Gaussian mixtures
KW - MCMC
UR - http://www.scopus.com/inward/record.url?scp=84889582365&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889582365&partnerID=8YFLogxK
U2 - 10.1145/2505515.2507835
DO - 10.1145/2505515.2507835
M3 - Conference contribution
AN - SCOPUS:84889582365
SN - 9781450322638
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1525
EP - 1528
BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Y2 - 27 October 2013 through 1 November 2013
ER -