Analysis of Textual Data Based on Inductive Learning Techniques-Reference-Cited by-同舟云学术

Analysis of Textual Data Based on Inductive Learning Techniques

Published:2013-04 Issue:2 Volume:3 Page:40-57
ISSN:2155-6377
Container-title:International Journal of Information Retrieval Research
language:en
Short-container-title:

Author:

Sakurai Shigeaki¹

Affiliation:

1. IT Research and Development Center, Toshiba Solutions Corporation, Tokyo, Japan

Abstract

This paper introduces knowledge discovery methods based on inductive learning techniques from textual data. The author argues three methods extracting features of the textual data. First one activates a key concept dictionary, second one does a key phrase pattern dictionary, and third one does a named entity extractor. These features are used in order to generate rules representing relationships between the features and text classes. The rules are described in the format of a fuzzy decision tree. Also, these features are used in order to acquire a classification model based on SVM (Support Vector Machine). The model can classify new textual data into the text classes with high classification accuracy. Lastly, this paper introduces two application tasks based on these methods and verifies the effect of the methods.

Publisher

IGI Global

Reference25 articles.

1. Abe, K., Kawasoe, S., Asai, T., Arimura, H., & Arikawa, S. (2002). Optimized substructure discovery for semi-structured data. In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases, Helsinki, Finland (pp. 1-14).

2. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th International Conference Very Large Data Bases, Santiago de Chile, Chile (pp. 487-499).

3. Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proceedings of the 11th International Conference Data Engineering, Taipei, Taiwan (pp. 3-14).

4. Cardoso-Cachopo, A., & Oliveria, A. L. (2003). An empirical comparison of text categorization methods. In Proceedings of the 10th International Symposium on String Processing and Information Retrieval, Manaus, Brazil (pp. 183-196).

5. Chiang, J.-H., Yin, Z.-X., & Chen, C.-Y. (2004). Discovering gene-gene relations form fuzzy sequential sentence patterns in biomedical literature. In Proceedings of the 13th IEEE International Conference on Fuzzy Systems, Budapest, Hungary (Vol. 2, pp. 1165-1168).