Automated learning of decision rules for text categorization-Reference-Cited by-同舟云学术

Automated learning of decision rules for text categorization

Published:1994-07 Issue:3 Volume:12 Page:233-251
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Apté Chidanand¹,Damerau Fred¹,Weiss Sholom M.²

Affiliation:

1. IBM T. J. Watson Research Center, Yorktown Heights, NY

2. Rutgers Univ., New Brunswick, NJ

Abstract

We describe the results of extensive experiments using optimized rule-based induction methods on large document collections. The goal of these methods is to discover automatically classification patterns that can be used for general document categorization or personalized filtering of free text. Previous reports indicate that human-engineered rule-based systems, requiring many man-years of developmental efforts, have been successfully built to “read” documents and assign topics to them. We show that machine-generated decision rules appear comparable to human performance, while using the identical rule-based representation. In comparison with other machine-learning techniques, results on a key benchmark from the Reuters collection show a large gain in performance, from a previously reported 67% recall/precision breakeven point to 80.5%. In the context of a very high-dimensional feature space, several methodological alternatives are examined, including universal versus local dictionaries, and binary versus frequency-related features.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/183422.183423

Reference24 articles.

1. The automatic indexing system AIR/PHYS - from research to applications

2. BREIMAN L. FRIEDMAN J. OLSHEN R. AND STONE C. 1984. Class~f~catwn and Regresszon Trees. Wadsworth Monterey Calif BREIMAN L. FRIEDMAN J. OLSHEN R. AND STONE C. 1984. Class~f~catwn and Regresszon Trees. Wadsworth Monterey Calif

3. CLARK P. AND NIBLETT T. 1989. The CN2 induction algorithm Mach Learn. 3 261-283 10.1023/A:1022641700528 CLARK P. AND NIBLETT T. 1989. The CN2 induction algorithm Mach Learn. 3 261-283 10.1023/A:1022641700528

Cited by 400 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Regional bias in monolingual English language models;Machine Learning;2024-07-09

2. CoocNet: a novel approach to multi-label text classification with improved label co-occurrence modeling;Applied Intelligence;2024-07-02

3. An optimal feature selection method for text classification through redundancy and synergy analysis;Multimedia Tools and Applications;2024-06-28

4. Chinese Fraudulent Text Message Detection Based on Graph Neural Networks;2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE);2024-05-10

5. Anchor graph-based multiview spectral clustering;Neurocomputing;2024-05