Layer Configurations of BERT for Multitask Learning and Data Augmentation-Reference-Cited by-同舟云学术

Layer Configurations of BERT for Multitask Learning and Data Augmentation

Published:2024-01-20 Issue:1 Volume:28 Page:29-40
ISSN:1883-8014
Container-title:Journal of Advanced Computational Intelligence and Intelligent Informatics
language:en
Short-container-title:JACIII

Author:

Pahari Niraj¹^ORCID,Shimada Kazutaka¹

Affiliation:

1. Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan

Abstract

Multitask learning (MTL) and data augmentation are becoming increasingly popular in natural language processing (NLP). These techniques are particularly useful when data are scarce. In MTL, knowledge learned from one task is applied to another. To address data scarcity, data augmentation facilitates by providing additional synthetic data during model training. In NLP, the bidirectional encoder representations from transformers (BERT) model is the default candidate for various tasks. MTL and data augmentation using BERT have yielded promising results. However, a detailed study regarding the effect of using MTL in different layers of BERT and the benefit of data augmentation in these configurations has not been conducted. In this study, we investigate the use of MTL and data augmentation from generative models, specifically for category classification, sentiment classification, and aspect-opinion sequence-labeling using BERT. The layers of BERT are categorized into top, middle, and bottom layers, which are frozen, shared, or unshared. Experiments are conducted to identify the optimal layer configuration for improved performance compared with that of single-task learning. Generative models are used to generate augmented data, and experiments are performed to reveal their effectiveness. The results indicate the effectiveness of the MTL configuration compared with single-task learning as well as the effectiveness of data augmentation using generative models for classification tasks.

Publisher

Fuji Technology Press Ltd.

Reference38 articles.

1. S. Vandenhende, S. Georgoulis, M. Proesmans, D. Dai, and L. Van Gool, “Revisiting multi-task learning in the deep learning era,” arXiv:2004.13379, 2020. https://doi.org/10.1109/tpami.2021.3054719

2. Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Trans. on Knowledge and Data Engineering, Vol.34, No.12, pp. 5586-5609, 2022. https://doi.org/10.1109/TKDE.2021.3070203

3. R. Caruana, “Multitask Learning,” Machine Learning, Vol.28, No.1, pp. 41-75, 1997. https://doi.org/10.1023/A:1007379606734

4. E. Choi, D. Hewlett, J. Uszkoreit, I. Polosukhin, A. Lacoste, and J. Berant, “Coarse-to-Fine Question Answering for Long Documents,” Proc. of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 209-220, 2017. https://doi.org/10.18653/v1/P17-1020

5. A. Wilson, A. Fern, S. Ray, and P. Tadepalli, “Multi-task reinforcement learning: A hierarchical Bayesian approach,” Proc. of the 24th Int. Conf. on Machine Learning, ser. ICML’07, pp. 1015-1022, 2007. https://doi.org/10.1145/1273496.1273624

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Share What You Already Know: Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Mixed Data;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-07-12

2. An Utterance is Enough to the gaze? Gaze Detection from Utterance Information in Multiparty Discussion;2024 International Conference on Activity and Behavior Computing (ABC);2024-05-29