Mining the Frequent Patterns of Named Entities for Long Document Classification-Reference-Cited by-同舟云学术

Mining the Frequent Patterns of Named Entities for Long Document Classification

Published:2022-02-28 Issue:5 Volume:12 Page:2544
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Wang Bohan,Qi Rui,Gao Jinhua,Zhang Jianwei,Yuan Xiaoguang,Ke Wenjun

Abstract

Nowadays, a large amount of information is stored as text, and numerous text mining techniques have been developed for various applications, such as event detection, news topic classification, public opinion detection, and sentiment analysis. Although significant progress has been achieved for short text classification, document-level text classification requires further exploration. Long documents always contain irrelevant noisy information that shelters the prominence of indicative features, limiting the interpretability of classification results. To alleviate this problem, a model called MIPELD (mining the frequent pattern of a named entity for long document classification) for long document classification is demonstrated, which mines the frequent patterns of named entities as features. Discovered patterns allow semantic generalization among documents and provide clues for verifying the results. Experiments on several datasets resulted in good accuracy and marco-F1 values, meeting the requirements for practical application. Further analysis validated the effectiveness of MIPELD in mining interpretable information in text classification.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/5/2544/pdf

Reference63 articles.

1. Jang, B., Kim, I., and Kim, J.W. (2019). Word2vec convolutional neural networks for classification of news articles and tweets. PLoS ONE, 14.

2. Bai, J., Shim, I., and Park, S. (2020). MEXN: Multi-Stage Extraction Network for Patent Document Classification. Appl. Sci., 10.

3. Wang, X., and Tong, Y. (2021). Application of an emotional classification model in e-commerce text based on an improved transformer model. PLoS ONE, 16.

4. Semantic text-pairing for relevant provision identification in construction specification reviews;Autom. Constr.,2021

5. Venkataraman, G.R., Pineda, A.L., Bear Don’t Walk IV, O.J., Zehnder, A.M., Ayyar, S., Page, R.L., Bustamante, C.D., and Rivas, M.A. (2020). FasTag: Automatic text classification of unstructured medical narratives. PLoS ONE, 15.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. End-to-end speech topic classification based on pre-trained model Wavlm;2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP);2022-12-11