SpanBERT: Improving Pre-training by Representing and Predicting Spans-Reference-Cited by-同舟云学术

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Published:2020-12 Issue: Volume:8 Page:64-77
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:Transactions of the Association for Computational Linguistics

Author:

Joshi Mandar¹,Chen Danqi²³,Liu Yinhan³,Weld Daniel S.¹⁴,Zettlemoyer Luke¹³,Levy Omer³

Affiliation:

1. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA.

2. Computer Science Department, Princeton University, Princeton, NJ.

3. Facebook AI Research, Seattle.

4. Allen Institute of Artificial Intelligence, Seattle.

Abstract

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE. 1

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00300

Reference55 articles.

Cited by 671 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HTR-VT: Handwritten text recognition with vision transformer;Pattern Recognition;2025-02

2. Enhancing Turkish Coreference Resolution: Insights from deep learning, dropped pronouns, and multilingual transfer learning;Computer Speech & Language;2025-01

3. Exposing the Achilles’ heel of textual hate speech classifiers using indistinguishable adversarial examples;Expert Systems with Applications;2024-11

4. TPKE-QA: A gapless few-shot extractive question answering approach via task-aware post-training and knowledge enhancement;Expert Systems with Applications;2024-11

5. InA: Inhibition Adaption on pre-trained language models;Neural Networks;2024-10