Information extraction as a basis for high-precision text classification-Reference-Cited by-同舟云学术

Information extraction as a basis for high-precision text classification

Published:1994-07 Issue:3 Volume:12 Page:296-333
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Riloff Ellen¹,Lehnert Wendy¹

Affiliation:

1. Univ. of Massachusetts, Amherst

Abstract

We describe an approach to text classification that represents a compromise between traditional word-based techniques and in-depth natural language processing. Our approach uses a natural language processing task called “information extraction” as a basis for high-precision text classification. We present three algorithms that use varying amounts of extracted information to classify texts. The relevancy signatures algorithm uses linguistic phrases; the augmented relevancy signatures algorithm uses phrases and local context; and the case-based text classification algorithm uses larger pieces of context. Relevant phrases and contexts are acquired automatically using a training corpus. We evaluate the algorithms on the basis of two test sets from the MUC-4 corpus. All three algorithms achieved high precision on both test sets, with the augmented relevancy signatures algorithm and the case-based algorithm reaching 100% precision with over 60% recall on one set. Additionally, we compare the algorithms on a larger collection of 1700 texts and describe an automated method for empirically deriving appropriate threshold values. The results suggest that information extraction techniques can support high-precision text classification and, in general, that using more extracted information improves performance. As a practical matter, we also explain how the text classification system can be easily ported across domains.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/183422.183428

Reference36 articles.

1. Information filtering and information retrieval

2. BORKO H. AND BERNICK M. 1963 Automatic document classification. J. ACM 10 2 151 162.]] 10.1145/321160.321165 BORKO H. AND BERNICK M. 1963 Automatic document classification. J. ACM 10 2 151 162.]] 10.1145/321160.321165

Cited by 90 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hidden Variable Models in Text Classification and Sentiment Analysis;Electronics;2024-05-10

2. Maximizing total yield in safety hazard monitoring of online reviews;Expert Systems with Applications;2023-11

3. Simulation of Big Data Order-Preserving Matching and Retrieval Model Based on Deep Learning;2023 International Conference on Power, Electrical Engineering, Electronics and Control (PEEEC);2023-09-25

4. Learning Relation Ties with a Force-Directed Graph in Distant Supervised Relation Extraction;ACM Transactions on Information Systems;2023-01-09

5. Classification of Traffic Event Tweets in Portuguese Language Using Deep Learning;2022 International Wireless Communications and Mobile Computing (IWCMC);2022-05-30