Systematic Comparison of Vectorization Methods in Classification Context-Reference-Cited by-同舟云学术

Systematic Comparison of Vectorization Methods in Classification Context

Published:2022-05-19 Issue:10 Volume:12 Page:5119
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Krzeszewska Urszula^ORCID,Poniszewska-Marańda Aneta^ORCID,Ochelska-Mierzejewska Joanna^ORCID

Abstract

Natural language processing has been the subject of numerous studies in the last decade. These have focused on the various stages of text processing, from text preparation to vectorization to final text comprehension. The goal of vector space modeling is to project words in a language corpus into a vector space in such a way that words that are similar in meaning are close to each other. Currently, there are two commonly used approaches to the topic of vectorization. The first focuses on creating word vectors taking into account the entire linguistic context, while the second focuses on creating document vectors in the context of the linguistic corpus of the analyzed texts. The paper presents the comparison of different existing text vectorization methods in natural language processing, especially in Text Mining. The comparison of text vectorization methods is possible by checking the accuracy of classification; we used the methods NBC and k-NN, as they are some of the simplest methods. They were used for the classification in order to avoid the influence of the choice of the method itself on the final result. The conducted experiments provide a basis for further research for better automatic text analysis.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/10/5119/pdf

Reference22 articles.

1. Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports

2. Retrieving similar cases for construction project risk management using Natural Language Processing techniques

3. Natural Language Processing

4. Natural Language Processing: State of The Art. Current Trends and Challenges;Khurana;arXiv,2017

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Combining Topological Signature with Text Embeddings: Multi-Modal Approach to Fake News Detection;2024 35th Irish Signals and Systems Conference (ISSC);2024-06-13

2. Exploring the interpretability of legal terms in tasks of classification of final decisions in administrative procedures;Quality & Quantity;2024-05-03

3. Fine-Tuned Understanding: Enhancing Social Bot Detection With Transformer-Based Classification;IEEE Access;2024

4. Selecting the Best Compiler Optimization by Adopting Natural Language Processing;IEEE Access;2024

5. Enhancing Classification of Low-Quality Text Documents through Multimodal Approach;2023 IEEE International Workshop on Mechatronic Systems Supervision (IW_MSS);2023-11-02