Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models-Reference-Cited by-同舟云学术

Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models

Published:2022-04-29 Issue:9 Volume:12 Page:4522
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Lee Eunchan,Lee Changhyeon,Ahn Sangtae^ORCID

Abstract

Recently, transformer-based pretrained language models have demonstrated stellar performance in natural language understanding (NLU) tasks. For example, bidirectional encoder representations from transformers (BERT) have achieved outstanding performance through masked self-supervised pretraining and transformer-based modeling. However, the original BERT may only be effective for English-based NLU tasks, whereas its effectiveness for other languages such as Korean is limited. Thus, the applicability of BERT-based language models pretrained in languages other than English to NLU tasks based on those languages must be investigated. In this study, we comparatively evaluated seven BERT-based pretrained language models and their expected applicability to Korean NLU tasks. We used the climate technology dataset, which is a Korean-based large text classification dataset, in research proposals involving 45 classes. We found that the BERT-based model pretrained on the most recent Korean corpus performed the best in terms of Korean-based multiclass text classification. This suggests the necessity of optimal pretraining for specific NLU tasks, particularly those in languages other than English.

Funder

BK21 Four project funded by the Ministry of Education, Korea

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/9/4522/pdf

Reference28 articles.

1. Deep Contextualized Word Representations

2. Attention Is All You Need;Vaswani;Adv. Neural Inf. Process. Syst.,2017

3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding;arXiv,2018

4. Cross-Lingual Language Model Pretraining;Lample;arXiv,2019

5. Effectiveness of Fine-Tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews;Bilal;Electron. Commer. Res.,2022

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Classification of Conversational Sentences Using an Ensemble Pre-Trained Language Model with the Fine-Tuned Parameter;Computers, Materials & Continua;2024

2. Learning to Effectively Identify Reliable Content in Health Social Platforms with Large Language Models;Lecture Notes in Computer Science;2024

3. Statistical Depth for Text Data: An Application to the Classification of Healthcare Data;Mathematics;2023-01-02

4. Supervised Classification of Healthcare Text Data Based on Context-Defined Categories;Mathematics;2022-06-10