Previous |  Up |  Next

Article

Keywords:
machine learning; inductive logic programming; propositionalization
Summary:
Systems aiming at discovering interesting knowledge in data, now commonly called data mining systems, are typically employed in finding patterns in a single relational table. Most of mainstream data mining tools are not applicable in the more challenging task of finding knowledge in structured data represented by a multi-relational database. Although a family of methods known as inductive logic programming have been developed to tackle that challenge by immediate means, the idea of adapting structured data into a simpler form digestible by the wealth of AVL systems has been always tempting to data miners. To this end, we present a method based on constructing first-order logic features that conducts this kind of conversion, also known as propositionalization. It incorporates some basic principles suggested in previous research and provides significant enhancements that lead to remarkable improvements in efficiency of the feature-construction process. We begin by motivating the propositionalization task with an illustrative example, review some previous approaches to propositionalization, and formalize the concept of a first-order feature elaborating mainly the points that influence the efficiency of the designed feature-construction algorithm.
References:
[1] Agrawal R., Srikant R.: Fast algorithms for mining association rules. In: Proc. 20th Internat. Conference Very Large Data Bases, VLDB, Morgan Kaufmann, xxxxxxx 1994 pp. 487–499
[2] Alphonse E., Rouveirol C.: Lazy propositionalization for relational learning. In: Proc. 14th European Conference on Artificial Intelligence (ECAI’2000) (W. Horn, ed.), IOS Press 2000, pp. 256–260
[3] Blatǎk J., Popelínský L.: Feature construction with RAP. In: Proc. of the Work-in-Progress Track at the 13th Internat. Conference on Inductive Logic Programming. University of Szeged 2003
[4] Clark P., Niblett T.: The cn2 induction algorithm. Mach. Learning 3 (1989), 261–283 DOI 10.1007/BF00116835
[5] Džeroski S.: Numerical constraints and learnability in inductive logic programming. Ph.D. Thesis. Faculty of Electrical Engineering and Computer Science, University of Ljubljana 1995
[6] Džeroski S., (eds.) N. Lavrač: Relational Data Mining. Springer–Verlag, Berlin 2001 Zbl 1003.68039
[7] Emde W., Wettschereck D.: Relational instance based learning. In: Machine Learning – Proc. 13th Internat. Conference on Machine Learning, Morgan Kaufmann, xxxxxxx 1996, pp. 122–130
[8] Hájek P.: Mechanizing Hypothesis Formation. Springer–Verlag, Berlin 1966 Zbl 0371.02002
[9] Kietz J. U.: Some lower bounds for the computational complexity of inductive logic programming. In: Machine Learning: ECML-93, Proceedings of the European Conference on Machine Learning, volume 667, Springer–Verlag, Berlin 1993, pp. 115–123 MR 1235394
[10] Knobbe A. J., Haas, M. de, Siebes A.: Propositionalisation and aggregates. In: Proc. Fifth European Conference on Principles of Data Mining and Knowledge Disovery (PKDD). Springer–Verlag, Berlin 2001 Zbl 1009.68749
[11] Kramer S., Lavrač, N., Flach P. A.: Propositionalization Approaches to relational data mining. In: Relational Data Mining (N. Lavrač and S. Džeroski, eds.), Springer–Verlag, Berlin 2001
[12] Krogel M. A., Rawles S., Železný F., Flach P. A., Lavrač, N., Wrobel S.: Comparative evaluation of approaches to propositionalization. In: Proc. 13th Internat. Conference on Inductive Logic Programming. Springer–Verlag, Berlin 2003
[13] Krogel M. A., Wrobel S.: Transformation-based learning using multirelational aggregation. In: Proc. 11th Internat. Conference on Inductive Logic Programming (ILP), Springer–Verlag, Berlin 2001, pp. 142–155 Zbl 1006.68519
[14] Lavrač N., Flach P. A.: An extended transformation approach to inductive logic programming. ACM Trans. Comput. Logic 2 (2001), 4, 458–494 DOI 10.1145/383779.383781
[15] Lavrač N., Džeroski S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1993 Zbl 0830.68027
[16] Lavrač N., Železný, F., Flach P. A.: RSD: Relational subgroup discovery through first-order feature construction. In: Proc. 12th Internat. Conference on Inductive Logic Programming (ILP). Springer–Verlag, Berlin 2002 Zbl 1017.68523
[17] Liu H., Motoda H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Dordrecht 1998 Zbl 0908.68127
[18] Maloberti J., Sebag M.: Theta-subsumption in a constraint satisfaction perspective. In: Proc. 11th Internat. Conference on Inductive Logic Programming (ILP) (Lectures Notes in Artificial Intelligence 2157), Springer–Verlag, Berlin 2001, pp. 164–178 MR 1906956 | Zbl 1006.68517
[19] Muggleton S.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13 (1995), 3–4, 245–286 DOI 10.1007/BF03037227
[20] Pfahringer B., Holmes G.: Propositionalization through stochastic discrimination. In: Proc. of the Work-in-Progress Track at the 13th Internat. Conference on Inductive Logic Programming. University of Szeged 2003
[21] Quinlan J. Ross: C4. 5: Programs for Machine Learning. Morgan Kaufmann, xxxxxxx 1992
[22] Sebag M., Rouveirol C.: Tractable induction and classification in first-order logic via stochastic matching. In: Proc. 15th Internat. Joint Conference on Artificial Intelligence, Morgan Kaufmann, xxxxxxx 1997, pp. 888–893
[23] Srinivasan A., Muggleton S. H., Sternberg M. J. E., King R. D.: Theories for mutagenicity: a study in first-order and feature-based induction. Artificial Intelligence 85 (1996), 1, 2, 277–299 DOI 10.1016/0004-3702(95)00122-0
[25] Witten I. H., Frank E., Trigg L., Hall M., Holmes, G., Cunningham, Sally Jo: Weka: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, xxxxxxx 1999
[26] Zucker J. D., Ganascia J. G.: Representation changes for efficient learning in structural domains. In: Internat. Conference on Machine Learning 1996, pp. 543–551
[27] Železný F., Lavrač, N., Džeroski S.: Constraint-based relational subgroup discovery. In: Proc. Multi-Relational Data Mining Workshop at KDD 2003, Washington 2003
Partner of
EuDML logo