A Rule-Based Approach to Embedding Techniques for Text Document Classification-Reference-Cited by-同舟云学术

A Rule-Based Approach to Embedding Techniques for Text Document Classification

Published:2020-06-10 Issue:11 Volume:10 Page:4009
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Aubaid Asmaa M.,Mishra Alok^ORCID

Abstract

With the growth of online information and sudden expansion in the number of electronic documents provided on websites and in electronic libraries, there is difficulty in categorizing text documents. Therefore, a rule-based approach is a solution to this problem; the purpose of this study is to classify documents by using a rule-based. This paper deals with the rule-based approach with the embedding technique for a document to vector (doc2vec) files. An experiment was performed on two data sets Reuters-21578 and the 20 Newsgroups to classify the top ten categories of these data sets by using a document to vector rule-based (D2vecRule). Finally, this method provided us a good classification result according to the F-measures and implementation time metrics. In conclusion, it was observed that our algorithm document to vector rule-based (D2vecRule) was good when compared with other algorithms such as JRip, One R, and ZeroR applied to the same Reuters-21578 dataset.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/11/4009/pdf

Reference51 articles.

1. C4.5: Programs for Machine Learning;Quinlan,1993

2. Knowledge Based Information Systems;Partridge,1994

3. Rule Based Systems for Big Data: A Machine Learning Approach;Han,2015

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A New Decision Support System for Enhancing Tourism Destination Management and Competitiveness;2024 11th International Conference on Wireless Networks and Mobile Communications (WINCOM);2024-07-23

2. Privacy BERT-LSTM: a novel NLP algorithm for sensitive information detection in textual documents;Neural Computing and Applications;2024-05-16

3. Comprehensive Evaluation of CNN, RNN, and CRNN for document retrieval in Healthcare Systems;2024 5th International Conference on Recent Trends in Computer Science and Technology (ICRTCST);2024-04-09

4. Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study;Journal of Healthcare Informatics Research;2024-01-03

5. Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review;Artificial Intelligence in Medicine;2023-12