Previous |  Up |  Next

Article

Title: Hierarchical text categorization using fuzzy relational thesaurus (English)
Author: Tikk, Domonkos
Author: Yang, Jae Dong
Author: Bang, Sun Lee
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 39
Issue: 5
Year: 2003
Pages: [583]-600
Summary lang: English
.
Category: math
.
Summary: Text categorization is the classification to assign a text document to an appropriate category in a predefined set of categories. We present a new approach for the text categorization by means of Fuzzy Relational Thesaurus (FRT). FRT is a multilevel category system that stores and maintains adaptive local dictionary for each category. The goal of our approach is twofold; to develop a reliable text categorization method on a certain subject domain, and to expand the initial FRT by automatically added terms, thereby obtaining an incrementally defined knowledge base of the domain. We implemented the categorization algorithm and compared it with some other hierarchical classifiers. Experimental results have been shown that our algorithm outperforms its rivals on all document corpora investigated. (English)
Keyword: text mining
Keyword: knowledge base management
Keyword: multi-level categorization
Keyword: hierarchical text categorization
MSC: 62P30
MSC: 68T30
MSC: 68T37
MSC: 68U15
MSC: 68W99
idZBL: Zbl 1249.68241
.
Date available: 2009-09-24T19:56:59Z
Last updated: 2015-03-24
Stable URL: http://hdl.handle.net/10338.dmlcz/135557
.
Reference: [1] Aas L., Eikvil L.: Text Categorisation: A Survey.Raport NR 941, Norwegian Computing Center, 1999
Reference: [2] Apte C., Damerau F. J., Weiss S. M.: Automated learning of decision rules for text categorization.ACM Trans. Information Systems 12 (1994), 3, 233–251 10.1145/183422.183423
Reference: [3] Baker K. D., McCallum A. K.: Distributional clustering of words for text classification.In: Proc. 21th Annual Internat. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), Melbourne, Australia 1998, pp. 96–103
Reference: [4] Chakrabarti S., Dom B., Agrawal, R., Raghavan P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies.The VLDB Journal 7 (1998), 3, 163–178 10.1007/s007780050061
Reference: [5] Choi J. H., Park J. J., Yang J. D., Lee, and D. K.: An Object-based Approach to Managing Domain Specific Thesauri: Semiautomatic Thesaurus Construction, Query-based Browsing.Technical Report TR 98/11, Dept. of Computer Science, Chonbuk National University, 1998.http://cs.chonbuk.ac.kr/$\sim $jdyang/publication/techpaper.html
Reference: [6] Chuang W., Tiyyagura A., Yang, J., Giuffrida G.: A fast algorithm for hierarchical text classification.In: Proc. 2nd Internat. Conference on Data Warehousing and Knowledge Discovery (DaWaK’00), London–Greenwich, UK 2000, pp. 409–418
Reference: [7] Dagan I., Karov, Y., Roth D.: Mistake-driven learning in text categorization.In: Proc. Second Conference on Empirical Methods in Natural Language Processing (C. Cardie and R. Weischedel, eds.), Association for Computational Linguistics, Somerset, NJ 1997, pp. 55–63
Reference: [8] Dumais S. T.: Improving the retrieval information from external sources.Behaviour Research Methods, Instruments and Computers 23 (1991), 2, 229–236 10.3758/BF03203370
Reference: [9] Dumais S. T., Platt J., Heckerman, D., Sahami M.: Inductive learning algorithms and representations for text categorization.In: Proc. 7th ACM Internat. Conference on Information and Knowledge Management (CIKM-98), Bethesda, MD 1998, pp. 148-ů155
Reference: [10] Fisher D. H.: Knowledge acquisition via incremental conceptual clustering.Machine Learning 2 (1987), 139–172 10.1007/BF00114265
Reference: [11] Joachims T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features.Technical Report, University of Dortmund, Dept. of Informatics, Dortmund, Germany 1997
Reference: [12] Koller D., Sahami M.: Hierarchically classifying documents using a very few words.In: International Conference on Machine Learning, Volume 14, San Mateo, CA, Morgan-Kaufmann 1997
Reference: [13] Korfhage R.: Information Storage and Retrieval.Wiley, New York 1997
Reference: [14] Larsen H. L., Yager R. R.: The use of fuzzy relational thesaurus for classificatory problem solving in information retrieval and expert systems.IEEE Trans. on Systems, Man, and Cybernetics 23 (1993), 1, 31–40 10.1109/21.214765
Reference: [15] Lewis D. D., Ringuette M.: A comparison of two learning algorithms for text classification.In: Proc. Third Annual Symposium on Document Analysis and Information Retrieval, 1994, pp. 81–93
Reference: [16] McCallum A., Rosenfeld R., Mitchell, T., Ng A.: Improving text classification by shrinkage in a hierarchy of classes.In: Proceedings of ICML-98, 1998. http://www-2.cs.cmu.edu/$\sim $mccallum/papers/hier-icml98.ps.gz
Reference: [17] Mitchell T. M.: Machine Learning.McGraw Hill, New York 1996 Zbl 0913.68167
Reference: [18] Miyamoto S.: Fuzzy Sets in Information Retrieval and Cluster Analysis.(Number 4 in Theory and Decision Library D: System Theory, Knowledge Engineering and Problem Solving.) Kluwer, Dordrecht 1990 Zbl 0716.68030, MR 1060316
Reference: [19] Mladenić D., Grobelnik M.: Feature selection for classification based on text hierarchy.In: Working Notes of Learning from Web, Conference on Automated Learning and Discovery (CONALD), 1998
Reference: [20] Nigam K., McCallum A., Thrun, S., Mitchell T.: Learning to classify text from labeled and unlabeled documents.In: Proc. 15th National Conference on Artifical Intelligence (AAAI-98), 1998
Reference: [21] Radecki T.: Fuzzy set theoretical approach to document retrieval.Information Processing and Management 15 (1979), 5, 247–259 Zbl 0413.68101, 10.1016/0306-4573(79)90031-1
Reference: [22] Ruspini E. H., Bonissone P. P., (eds.) W. Pedrycz: Handbook of Fuzzy Computation.Oxford University Press and Institute of Physics Publishing, Bristol and Philadelphia 1998 Zbl 0902.68068, MR 1668348
Reference: [23] Salton G.: Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer.Addision-Wesley, Reading, MA 1989
Reference: [24] Salton G., McGill M. J.: An Introduction to Modern Information Retrieval.McGraw-Hill, New York 1983
Reference: [25] Sebastiani F.: Machine learning in automated text categorization.ACM Computing Surveys 34 (2002), 1, 1–47 10.1145/505282.505283
Reference: [26] Rijsbergen C. J. van: Information Retrieval.Second edition. Butterworths, London 1979. http://www.dcs.gla.ac.uk/Keith
Reference: [27] Weiss S. M., Apte C., Damerau F. J., Johnson D. E., Oles F. J., Goetz, T., Hampp T.: Maximizing text-mining performance.IEEE Intelligent Systems 14 (1999), 4, 2–8
Reference: [28] Wiener E., Pedersen J. O., Weigend A. S.: A neural network approach to topic spotting.In: Proc. 4th Annual Symposium on Document Analysis and Information Retrieval, pages 22–34, 1993
Reference: [29] Yang Y.: An evaluation of statistical approaches to text categorization.Information Retrieval 1 (1999), 1–2, 69–90. http://citeseer.nj.nec.com/yang97evaluation.html 10.1023/A:1009982220290
.

Files

Files Size Format View
Kybernetika_39-2003-5_7.pdf 2.323Mb application/pdf View/Open
Back to standard record
Partner of
EuDML logo