Previous |  Up |  Next


feature selection; curse of dimensionality; over-fitting; stability; machine learning; dimensionality reduction
The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit or perform unstably. Such an outcome is unlikely to generalize well and the resulting recognition system may fail to deliver the expectable performance. In many tasks, it is therefore crucial to employ additional mechanisms of making the feature selection process more stable and resistant the curse of dimensionality effects. In this paper we discuss three different approaches to reducing this problem. We present an algorithmic extension applicable to various feature selection methods, capable of reducing excessive feature subset dependency not only on specific training data, but also on specific criterion function properties. Further, we discuss the concept of criteria ensembles, where various criteria vote about feature inclusion/removal and go on to provide a general definition of feature selection hybridization aimed at combining the advantages of dependent and independent criteria. The presented ideas are illustrated through examples and summarizing recommendations are given.
[1] Brown, G.: A new perspective for information theoretic feature selection. In: Proc. AISTATS ’09, JMLR: W&CP 5 (2009), pp. 49–56.
[2] Chang, Ch.-Ch., Lin, Ch.-J.: LIBSVM: A Library for SVM, 2001.
[3] Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: Proc. 18th Int. Conf. on Machine Learning (ICML ’01), Morgan Kaufmann Publishers Inc. 2001, pp. 74–81.
[4] Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering – a filter solution. In: Proc. 2002 IEEE Int. Conf. on Data Mining (ICDM ’02), Vol. 00, IEEE Comp. Soc. 2002, p. 115.
[5] Devijver, P. A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice Hall 1982. MR 0692767 | Zbl 0542.68071
[6] Dutta, D., Guha, R., Wild, D., Chen, T.: Ensemble feature selection: Consistent descriptor subsets for multiple qsar models. J. Chem. Inf. Model. 43 (2007), 3, pp. 989–997.
[7] C. Emmanouilidis: Multiple-criteria genetic algorithms for feature selection in neuro-fuzzy modeling. In: Internat. Conf. on Neural Networks, Vol. 6, 1999, pp. 4387–4392.
[8] Frank, A., Asuncion, A.: HASH(0x3185490). UCI Machine Learning Repository, 2010.
[9] Gheyas, I. A., Smith, L. S.: Feature subset seleciton in large dimensionality domains. Pattern Recognition 43 (2010), 1, 5–13. DOI 10.1016/j.patcog.2009.06.009
[10] Glover, F. W., Kochenberger, G. A., eds.: Handbook of Metaheuristics. Internat. Ser. Operat. Research & Management Science 5, Springer 2003. MR 1975894 | Zbl 1058.90002
[11] Günter, S., Bunke, H.: An evaluation of ensemble methods in handwritten word recog. based on feature selection. In: Proc. ICPR ’04, IEEE Comp. Soc. 2004, pp. 388–392.
[12] Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3 (2003), 1157–1182. Zbl 1102.68556
[13] Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. A., eds.: Feature Extraction – Foundations and Applications. Studies in Fuzziness and Soft Comp. 207 Physica, Springer 2006. Zbl 1114.68059
[14] Ho, Tin Kam: The random subspace method for constructing decision forests. IEEE Trans. PAMI 20 (1998), 8, 832–844. DOI 10.1109/34.709601
[15] Hussein, F., Ward, R., Kharma, N.: Genetic algorithms for feature selection and weighting, a review and study. In: Proc. 6th ICDAR, Vol. 00, IEEE Comp. Soc. 2001, pp. 1240–1244.
[16] Jensen, R.: Performing feature selection with ACO. Studies Comput. Intelligence 34, Springer 2006, pp. 45–73.
[17] Special issue on variable and feature selection. J. Machine Learning Research., 2003.
[18] Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: A study on high-dimensional spaces. Knowledge Inform. Systems 12 (2007), 1, 95–116. DOI 10.1007/s10115-006-0040-8
[19] Kittler, J., Hatef, M., Duin, R. P. W., Matas, J.: On combining classifiers. IEEE Trans. PAMI 20 (1998), 3, 226–239. DOI 10.1109/34.667881
[20] Kohavi, R., John, G. H.: Wrappers for feature subset selection. Artificial Intelligence 97 (1997), 1–2, 273–324. DOI 10.1016/S0004-3702(97)00043-X | Zbl 0904.68143
[21] Kononenko, I.: Estimating attributes: Analysis and extensions of RELIEF. In: Proc. ECML-94, Springer 1994, pp. 171–182.
[22] Kuncheva, L. I.: A stability index for feature selection. In: Proc. 25th IASTED Internat. Mul.-Conf. AIAP’07, ACTA Pr. 2007, pp. 390–395.
[23] Lai, C., Reinders, M. J. T., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recognition Lett. 27 (2006), 10, 1067–1076. DOI 10.1016/j.patrec.2005.12.018
[24] Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers 1998. Zbl 0908.68127
[25] Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. KDE 17 (2005), 4, 491–502.
[26] Nakariyakul, S., Casasent, D. P.: Adaptive branch and bound algorithm for selecting optimal features. Pattern Recognition Lett. 28 (2007), 12, 1415–1427. DOI 10.1016/j.patrec.2007.02.015
[27] Nakariyakul, S., Casasent, D. P.: An improvement on floating search algorithms for feature subset selection. Pattern Recognition 42 (2009), 9, 1932–1940. DOI 10.1016/j.patcog.2008.11.018 | Zbl 1178.68503
[28] Novovičová, J., Pudil, P., Kittler, J.: Divergence based feature selection for multimodal class densities. IEEE Trans. PAMI 18 (1996), 2, 218–223. DOI 10.1109/34.481557
[29] Pudil, P., Novovičová, J., Choakjarernwanit, N., Kittler, J.: Feature selection based on the approximation of class densities by finite mixtures of special type. Pattern Recognition 28 (1995), 9, 1389–1398. DOI 10.1016/0031-3203(94)00009-B
[30] Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognition Lett. 15 (1994), 11, 1119–1125. DOI 10.1016/0167-8655(94)90127-9
[31] Raudys, Š. J.: Feature over-selection. In: Proc. S+SSPR, Lecture Notes in Comput. Sci. 4109, Springer 2006, pp. 622–631.
[32] al., V. C. Raykar et: Bayesian multiple instance learning: automatic feature selection and inductive transfer. In: Proc. ICML ’08, ACM 2008, pp. 808–815.
[33] Reunanen, J.: A pitfall in determining the optimal feature subset size. In: Proc. 4th Internat. Workshop on Pat. Rec. in Inf. Systs (PRIS 2004), pp. 176–185.
[34] Reunanen, J.: Less biased measurement of feature selection benefits. In: Stat. and Optimiz. Perspectives Workshop, SLSFS, Lecture Notes in Comput. Sci. 3940, Springer 2006, pp. 198–208.
[35] Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23 (2007), 19, 2507–2517. DOI 10.1093/bioinformatics/btm344
[36] Salappa, A., Doumpos, M., Zopounidis, C.: Feature selection algorithms in classification problems: An experimental evaluation. Optimiz. Methods Software 22 (2007), 1, 199–212. DOI 10.1080/10556780600881910 | MR 2272838 | Zbl 1116.62069
[37] Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surveys 34 (2002), 1, 1–47. DOI 10.1145/505282.505283
[38] Sebban, M., Nock, R.: A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recognition 35 (2002), 835–846. DOI 10.1016/S0031-3203(01)00084-X | Zbl 0997.68115
[39] Somol, P., Grim, J., Pudil, P.: Criteria ensembles in feature selection. In: Proc. MCS, Lecture Notes in Comput. Sci. 5519, Springer 2009, pp. 304–313.
[40] Somol, P., Grim, J., Pudil, P.: The problem of fragile feature subset preference in feature selection methods and a proposal of algorithmic workaround. In: ICPR 2010. IEEE Comp. Soc. 2010.
[41] Somol, P., Novovičová, J., Pudil, P.: Flexible-hybrid sequential floating search in statistical feature selection. In: Proc. S+SSPR, Lecture Notes in Comput. Sci. 4109, Springer 2006, pp. 632–639.
[42] Somol, P., Novovičová, J.: Evaluating the stability of feature selectors that optimize feature subset cardinality. In: Proc. S+SSPR, Lecture Notes in Comput. Sci. 5342 Springer 2008, pp. 956–966.
[43] Somol, P., Novovičová, J., Grim, J., Pudil, P.: Dynamic oscillating search algorithms for feature selection. In: ICPR 2008. IEEE Comp. Soc. 2008.
[44] Somol, P., Novovičová, J., Pudil, P.: Improving sequential feature selection methods performance by means of hybridization. In: Proc. 6th IASTED Int. Conf. on Advances in Computer Science and Engrg. ACTA Press 2010.
[45] Somol, P., Pudil, P.: Oscillating search algorithms for feature selection. In: ICPR 2000, IEEE Comp. Soc. 02 (2000), 406–409.
[46] Somol, P., Pudil, P., Kittler, J.: Fast branch & bound algorithms for optimal feature selection. IEEE Trans. on PAMI 26 (2004), 7, 900–912. DOI 10.1109/TPAMI.2004.28
[47] Sun, Y.: Iterative RELIEF for feature weighting: Algorithms, theories, and applications. IEEE Trans. PAMI 29 (2007), 6, 1035–1051. DOI 10.1109/TPAMI.2007.1093
[48] al, M.-A. Tahir et: Simultaneous feature selection and feature weighting using hybrid tabu search/k-nearest neighbor classifier. Patt. Recognition Lett. 28 (2007), 4, 438–446. DOI 10.1016/j.patrec.2006.08.016
[49] Whitney, A. W.: A direct method of nonparametric measurement selection. IEEE Trans. Comput. 20 (1971), 9, 1100–1103. Zbl 0227.68047
[50] Yang, Y., Pedersen, J. O.: A comparative study on feature selection in text categorization. In: Proc. 14th Internat. Conf. on Machine Learning (ICML ’97), Morgan Kaufmann 1997, pp. 412–420.
[51] Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proc. 20th Internat. Conf. on Machine Learning (ICML-03), Vol. 20, Morgan Kaufmann 2003, pp. 856–863.
[52] Zhu, Z., Ong, Y. S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Systems Man Cybernet., Part B 37 (2007), 1, 70. DOI 10.1109/TSMCB.2006.883267
Partner of
EuDML logo