Bahasa Indonesia pre-trained word vector generation using word2vec for computer and information technology field-Reference-Cited by-同舟云学术

Bahasa Indonesia pre-trained word vector generation using word2vec for computer and information technology field

Published:2021-06-01 Issue:1 Volume:1898 Page:012007
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

Putri Syarifah K,Amalia A,Nababan E B,Sitompul O S

Abstract

Abstract Words embedding or distributed representations is a popular method for representing words. In this method, the resulting vector value is a set of real values with specific dimensions that are more effective than the Bag of Word (BoW) method. Also, the advantages of distributed representations can produce word vectors that contain semantic and syntactic information, so that word vectors with close meanings will have close word vectors. However, distributed representation requires a huge corpus with a long training time. For this reason, many researchers have created trained word vectors that can be used repeatedly. The problem is that the available trained word vectors are usually general domain word vectors. This study aims to form pre-trained word vectors for specific domains, namely computers and information technology. Researchers used a dataset of student scientific papers from the Universitas Sumatera Utara (USU) repository. Researchers used the word2vec model, where the model has two architectures, namely the Continuous Bag-of-Words (CBOW) and Skip-gram. This research’s result is word2vec model with the CBOW method is more effective than the Skip-gram method.

Publisher

IOP Publishing

Subject

General Physics and Astronomy

Link

https://iopscience.iop.org/article/10.1088/1742-6596/1898/1/012007/pdf

Reference10 articles.

1. Recent trends in deep learning poly based natural language processing;Young;ieee Computational intelligence magazine,2018

2. Context-sensitive normalization of social media text in bahasa Indonesia based on neural word embeddings;Kusumawardani;Procedia computer science,2018

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving document representation using KPCA and clustered word embeddings;2021 5th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT);2021-12-10