Previous |  Up |  Next


consensus clustering; differential evolution; ensemble; data
Consensus clustering algorithms are used to improve properties of traditional clustering methods, especially their accuracy and robustness. In this article, we introduce our approach that is based on a refinement of the set of initial partitions and uses differential evolution algorithm in order to find the most valid solution. Properties of the algorithm are demonstrated on four benchmark datasets.
[1] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. 2001 ACM SIGMOD International Conference on Management of data 27 (1998), 2, pp. 94-105.
[2] Bache, K., Lichman, M.: UCI machine learning repository, 2013. URL
[3] Bailey, K. D.: Typologies and Taxonomies: An Introduction to Classification Techniques. Sage Publications Inc., Los Angeles 1994.
[4] Bezdek, J. C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York 1981. MR 0631231 | Zbl 0503.68069
[5] Das, S., Abraham, A., Konar, A.: Automatic clustering using an improved differential evolution algorithm. IEEE Trans. Sys. Man Cyber., Part A: Systems and Humans 38 (2008), 1, 218-237. DOI 10.1109/TSMCA.2007.909595
[6] Dempster, A. P., Laird, N. M., Rubin, D. B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. Ser. B 39 (1977), 1, 1-38. MR 0501537 | Zbl 0364.62022
[7] Dimitriadou, E.: cclust: Convex Clustering Methods and Clustering Indexes, 2012. URL
[8] Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19 (2003), 9, 1090-2003. DOI 10.1093/bioinformatics/btg038
[9] Ester, M., Kriegel, H. P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd International Conference on Knowledge Discovery and Data Mining 1996, pp. 226-231.
[10] Fern, X., Brodley, C.: Solving cluster ensemble problems by bipartite graph partitioning. In: Proc. 21st International Conference on Machine learning 2004, pp. 36-43.
[11] Fraley, C., Raftery, A. E.: Model-based clustering, discriminant analysis and density estimation. J. Amer. Statist. Assoc. 97 (2002), 611-631. DOI 10.1198/016214502760047131 | MR 1951635 | Zbl 1073.62545
[12] Fraley, C., Raftery, A. E.: MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering. Techn. Report 504, University of Washington, Department of Statistics, 2006.
[13] Ghaemi, R., Sulaiman, N., Ibrahim, H., Mustapha, N.: A survey: Clustering ensembles techniques. In: Proc. International Conference on Computer, Electrical, and Systems Science, and Engineering (CESSE) 38 (2009), pp. 644-653.
[14] Ghosh, J., Acharya, A.: Cluster ensembles. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery 1 (2011), 4, 305-315.
[15] Gould, S. J.: Full House: The Spread of Excellence from Plato to Darwin. Harmony, New York 1996.
[16] Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part i. SIGMOD Record 31 (2002), 2, 40-45. DOI 10.1145/565117.565124
[17] Handl, J., Knowles, J.: Multi-objective clustering and cluster validation. In: Multi-Objective Machine Learning (Studies in Computational Intelligence, Vol, 16), Springer, Berlin 2006, pp. 21-47.
[18] Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evolutionary Comput. 11 (2007), 56-76. DOI 10.1109/TEVC.2006.877146
[19] Handl, J., Knowles, J., Kell, D.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21 (2005), 15, 3201-3212. DOI 10.1093/bioinformatics/bti517
[20] Hartigan, J., Wong, M.: A k-means clustering algorithm. Applied Statistics 28 (1979), 100-108. DOI 10.2307/2346830 | Zbl 0447.62062
[21] Hornik, K., Feinerer, I., Kober, M., Buchta, C.: Spherical $k$-means clustering. J. Statist. Software 50 (2012), 10, 1-22. DOI 10.18637/jss.v050.i10
[22] Hruschka, E., Campello, R., Freitas, A., Carvalho, A. de: A survey of evolutionary algorithms for clustering. IEEE Trans. Sys. Man Cyber. Part C: Applications and Reviews 39 (2009), 2, 133-155. DOI 10.1109/TSMCC.2008.2007252
[23] Jain, A. K.: Data clustering: 50 years beyond k-means. Pattern Recognition Lett. 31 (2010), 8, 651-666.
[24] Jain, A. K., Murty, M. N., Flynn, P. J.: Data clustering: A review. ACM Comput. Surveys 31 (1999), 3, 316-323.
[25] Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for kernel methods in R. J. Statist. Software 11 (2004), 9, 1-20.
[26] Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: Applications in vlsi domain. In: Proc. Design and Automation Conference, 1997, pp. 526-529.
[27] Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York 1990. MR 1044997
[28] Krishna, K., Murty, M. Narasimha: Genetic k-means algorithm. Trans. Sys. Man Cyber. Part B 29 (1999), 3, 433-439. DOI 10.1109/3477.764879
[29] Kwedlo, W.: A clustering method combining differential evolution with the k-means algorithm. Pattern Recognition Letters 32 (2011), 12, 1613-1621. DOI 10.1016/j.patrec.2011.05.010
[30] MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symposium on Mathematical Statistics and Probability 1 (1967), pp. 281-297. MR 0214227 | Zbl 0214.46201
[31] Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.: cluster: Cluster Analysis Basics and Extensions, 2013. R package version 1.14.4.
[32] Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52 (2003), 1-2, 91-118. DOI 10.1023/A:1023949509487 | Zbl 1039.68103
[33] Mullen, K., Ardia, D., Gil, D., Windover, D., Cline, J.: DEoptim: An R package for global optimization by differential evolution. J. Statist. Software 40 (2011), 6, 1-26. DOI 10.18637/jss.v040.i06
[34] Murthy, C., Chowdhury, N.: In search of optimal clusters using genetic algorithms. Pattern Recognition Lett. 17 (1996), 8, 825-832.
[35] Pal, S. K., Majumder, D. D.: Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Trans. Sys. Man Cyber. 7 (1977), 625-629. DOI 10.1109/TSMC.1977.4309789
[36] Paterlini, S., Krink, T.: Differential evolution and particle swarm optimisation in partitional clustering. Comput. Statist. Data Anal. 50 (2006), 5, 1220-1247. DOI 10.1016/j.csda.2004.12.004 | MR 2224370
[37] Price, K. V., Storn, R. M., Lampinen, J. A.: Differential Evolution: A Practical Approach to Global Optimization. Springer-Verlag, Berlin 2006. MR 2191377 | Zbl 1186.90004
[38] Raghavan, V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proc. Second International Conference on Information Storage and Retrieval, 1979, pp. 10-22.
[39] R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna 2012. URL
[40] Shi, J., Malik, J.: Normalized cuts and image segmentation. In: IEEE Trans. Pattern Analysis and Machine Intelligence 22 (2000), 8, 888-905.
[41] Simovici, D. A., Djeraba, Ch.: Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics. Advanced information and knowledge processing. Springer, London 2008. MR 2451001 | Zbl 1151.68386
[42] Simpson, T. I., Armstrong, J. D., Jarman, A. P.: Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinform. 11 (2010), 11-590. DOI 10.1186/1471-2105-11-590
[43] Sneath, P. H.: The application of computers to taxonomy. Journal of general microbiology 17 (1957), 1, 201-226. DOI 10.1099/00221287-17-1-201
[44] Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11 (1997), 4, 341-359. DOI 10.1023/A:1008202821328 | MR 1479553 | Zbl 0888.90135
[45] Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining partitionings. In: Proc. 11th National Conference On Artificial Intelligence, NCAI, Edmonton, Alberta 2002, pp. 93-98. MR 1991087
[46] Topchy, A., Jain, A., Punch, W.: A mixture model of clustering ensembles. In: Proc. SIAM International Conference on Data Mining 2004, pp. 22-24.
[47] Trotter, W. M.: Combinatorics and Partially Ordered Sets. The Johns Hopkins University Press, Baltimore 1992. MR 1169299 | Zbl 0764.05001
[48] Tvrdík, J., Křivý, I.: Differential evolution with competing strategies applied to partitional clustering. Lecture Notes Comput. Sci. 7269 (2012), 136-144. DOI 10.1007/978-3-642-29353-5_16
[49] Wang, P., Domeniconi, C., Laskey, K.: Nonparametric bayesian clustering ensembles. Lecture Notes Comput. Sci. 6323 (2010), 3, 435-450. DOI 10.1007/978-3-642-15939-8_28
[50] Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. Stat. Anal. Data Min. 4 (2011), 1, 54-70. DOI 10.1002/sam.10098 | MR 2814500
[51] Wikipedia: Partition of a set.
[52] Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Networks 16 (2005), 3, 645-678. DOI 10.1109/TNN.2005.845141
[53] Zahn, Ch. T.: Graph-theoretic methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 20 (1971), 31, 68-86.
Partner of
EuDML logo