Masked transformer through knowledge distillation for unsupervised text style transfer-Reference-Cited by-同舟云学术

Masked transformer through knowledge distillation for unsupervised text style transfer

Published:2023-07-25 Issue: Volume: Page:1-36
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

Scalercio Arthur,Paes Aline^ORCID

Abstract

Abstract Text style transfer (TST) aims at automatically changing a text’s stylistic features, such as formality, sentiment, authorial style, humor, and complexity, while still trying to preserve its content. Although the scientific community has investigated TST since the 1980s, it has recently regained attention by adopting deep unsupervised strategies to address the challenge of training without parallel data. In this manuscript, we investigate how relying on sequence-to-sequence pretraining models affects the performance of TST when the pretraining step leverages pairs of paraphrase data. Furthermore, we propose a new technique to enhance the sequence-to-sequence model by distilling knowledge from masked language models. We evaluate our proposals on three unsupervised style transfer tasks with widely used benchmarks: author imitation, formality transfer, and polarity swap. The evaluation relies on quantitative and qualitative analyses and comparisons with the results of state-of-the-art models. For the author imitation and the formality transfer task, we show that using the proposed techniques improves all measured metrics and leads to state-of-the-art (SOTA) results in content preservation and an overall score in the author imitation domain. In the formality transfer domain, we paired with the SOTA method in the style control metric. Regarding the polarity swap domain, we show that the knowledge distillation component improves all measured metrics. The paraphrase pretraining increases content preservation at the expense of harming style control. Based on the results reached in these domains, we also discuss in the manuscript if the tasks we address have the same nature and should be equally treated as TST tasks.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference84 articles.

1. Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering

2. Papineni, K. , Roukos, S. , Ward, T. and Zhu, W. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA. ACL, pp. 311–318. https://aclanthology.org/P02-1040/

3. Paraphrase Diversification Using Counterfactual Debiasing

4. A Neural Attention Model for Abstractive Sentence Summarization

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Advanced hybrid LSTM-transformer architecture for real-time multi-task prediction in engineering systems;Scientific Reports;2024-02-28