|
| | | |
Improved Consensus Clustering via Linear Optimization
Downing, N., Wirth, A. and Stuckey, P. J.
We consider the problem of Consensus Clustering. Given a finite set of input clusterings over some data items, a consensus clustering is a partitioning of the items which matches as closely as possible the given input clusterings. The best exact approach to tackling this problem is by modelling it as a Boolean Integer Program (BIP). Unfortunately, the size of the BIP grows cubically in the number of data items, hence this method is applicable to only small sets of items. In this paper we show how to tackle the problem progressively, leading to much improved solution times and far less memory usage than previously. For the case where approximate clusterings are acceptable, we show a number of heuristic techniques for extracting good clusterings from the solutions of the linear relaxation of the BIP, and on several very large data sets we demonstrate much higher quality approximations than previously possible. When optimal solutions are desired, the problem is much harder, and we present some novel and existing techniques that can assist in finding candidate answers and proving the optimality thereof. For the first time we present optimal Consensus Clusterings for several complete, albeit small, data sets. |
Cite as: Downing, N., Wirth, A. and Stuckey, P. J. (2010). Improved Consensus Clustering via Linear Optimization. In Proc. 33rd Australasian Computer Science Conference (ACSC 2010) Brisbane, Australia. CRPIT, 102. Mans, B. and Reynolds, M. Eds., ACS. 61-70 |
(from crpit.com)
(local if available)
|
|