Do Not Have Enough Data? Deep Learning to the Rescue!-Reference-Cited by-同舟云学术

Do Not Have Enough Data? Deep Learning to the Rescue!

Published:2020-04-03 Issue:05 Volume:34 Page:7383-7390
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Anaby-Tavor Ateret,Carmeli Boaz,Goldbraich Esther,Kantor Amir,Kour George,Shlomov Segev,Tepper Naama,Zwerdling Naama

Abstract

Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 102 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multimodal Deep Learning for Classifying Student-generated Questions in Computer-supported Collaborative Learning;Proceedings of the Eleventh ACM Conference on Learning @ Scale;2024-07-09

2. Intent aware data augmentation by leveraging generative AI for stress detection in social media texts;PeerJ Computer Science;2024-07-08

3. Data Augmentation with Knowledge Graph-to-Text and Virtual Adversary for Specialized-Domain Chinese NER;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

4. Cognitive Tracing Data Trails: Auditing Data Provenance in Discriminative Language Models Using Accumulated Discrepancy Score;Cognitive Computation;2024-06-14

5. Semi-Supervised SAR Image Classification via Adaptive Threshold Selection;Journal of the Korea Institute of Military Science and Technology;2024-06-05