Leveraging Pre-trained Checkpoints for Sequence Generation Tasks-Reference-Cited by-同舟云学术

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Published:2020-12 Issue: Volume:8 Page:264-280
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:Transactions of the Association for Computational Linguistics

Author:

Rothe Sascha¹,Narayan Shashi¹,Severyn Aliaksei¹

Affiliation:

1. Google Research.

Abstract

Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2, and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation, Text Summarization, Sentence Splitting, and Sentence Fusion.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00313

Reference57 articles.

1. Learning To Split and Rephrase From Wikipedia Edit History

Cited by 126 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Abstractive text summarization: State of the art, challenges, and improvements;Neurocomputing;2024-10

2. MGCoT: Multi-Grained Contextual Transformer for table-based text generation;Expert Systems with Applications;2024-09

3. Don’t Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems;ACM Transactions on Software Engineering and Methodology;2024-08-16

4. De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model;Nature Communications;2024-08-10

5. Natural language processing with transformers: a review;PeerJ Computer Science;2024-08-07