Affiliation:
1. Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Champaign, IL USA zzeng13@illinois.edu
2. Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Champaign, IL USA spbhat2@illinois.edu
Abstract
Abstract
Idiomatic expressions (IEs), characterized by their non-compositionality, are an important part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today’s state-of-the-art. Prior work has identified deficiencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work, we take a first-principles approach to build idiomaticity into BART using an adapter as a lightweight non-compositional language expert trained on idiomatic sentences. The improved capability over baselines (e.g., BART) is seen via intrinsic and extrinsic methods, where idiom embeddings score 0.19 points higher in homogeneity score for embedding clustering, and up to 25% higher sequence accuracy on the idiom processing tasks of IE sense disambiguation and span detection.
Subject
Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication
Reference68 articles.
1. Potential idiomatic expression (PIE)-english: Corpus for classes of idioms;Adewumi;ArXiv,2021
2. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks;Adi;CoRR,2016
3. Probing linguistic features of sentence-level representations in neural relation extraction;Alt,2020
4. MAD-G: Multilingual adapter generation for efficient cross-lingual transfer;Ansell,2021
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献