Continuous-bag-of-words and Skip-gram for word vector training and text classification-Reference-Cited by-同舟云学术

Continuous-bag-of-words and Skip-gram for word vector training and text classification

Published:2023-11-01 Issue:1 Volume:2634 Page:012052
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

Xia Haowen

Abstract

Abstract Natural language processing is one of the most challenging parts in the study of artificial intelligence and is widely used in real-life applications. One of the basic questions is how to calculate the probability of a particular text sequence appearing in a certain context. Word2Vec is a powerful tool that provides a solution to the question for its ability to transform words into word vectors, and to train in high efficiency on large datasets and corpora. It has many models of which Continuous-Bag-Of-Words and Skip-gram are of great significance and also known to many people. Furthermore, some extended techniques related to the models are also proposed in order to simultaneously decrease required training time and increase the rate of accuracy for the training. Even though there are now a number of papers that describe these fundamental concepts, the quality vary greatly. To better understand the models and their extensions, and how well they behave when used for real tasks, different combinations of the models and techniques are made in this paper so as to compare their performance in processing large input data and the ability for correct prediction in the task of text classification. This is done as it could lead to more provision of details and understandings of the model for subsequent researches on this field of study.

Publisher

IOP Publishing

Subject

Computer Science Applications,History,Education

Link

https://iopscience.iop.org/article/10.1088/1742-6596/2634/1/012052/pdf

Reference12 articles.

1. Distributed representations of words and phrases and their compositionality;Mikolov;Advances in neural information processing systems,2013

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Suicide Ideation Detection Using ML and DL Algorithms Assisted by NLP Techniques;2024 5th International Conference on Recent Trends in Computer Science and Technology (ICRTCST);2024-04-09

2. Intelligent Content Generation Mechanism for Diversified Music Teaching and Learning;Applied Mathematics and Nonlinear Sciences;2024-01-01

3. Innovative Cultural Tourism in the Perspective of Cultural Ecology in the Context of Big Data: Theory, Strategy, Practice;Applied Mathematics and Nonlinear Sciences;2024-01-01