A Brief Overview of Universal Sentence Representation Methods: A Linguistic View-Reference-Cited by-同舟云学术

A Brief Overview of Universal Sentence Representation Methods: A Linguistic View

Published:2022-03-26 Issue:3 Volume:55 Page:1-42
ISSN:0360-0300
Container-title:ACM Computing Surveys
language:en
Short-container-title:ACM Comput. Surv.

Author:

Li Ruiqi¹,Zhao Xiang²,Moens Marie-Francine³

Affiliation:

1. KU Leuven, Belgium and National University of Defense Technology, Changsha, China

2. National University of Defense Technology, Changsha, China

3. KU Leuven, Celestijnenlaan, Heverlee, Belgium

Abstract

How to transfer the semantic information in a sentence to a computable numerical embedding form is a fundamental problem in natural language processing. An informative universal sentence embedding can greatly promote subsequent natural language processing tasks. However, unlike universal word embeddings, a widely accepted general-purpose sentence embedding technique has not been developed. This survey summarizes the current universal sentence-embedding methods, categorizes them into four groups from a linguistic view, and ultimately analyzes their reported performance. Sentence embeddings trained from words in a bottom-up manner are observed to have different, nearly opposite, performance patterns in downstream tasks compared to those trained from logical relationships between sentences. By comparing differences of training schemes in and between groups, we analyze possible essential reasons for different performance patterns. We additionally collect incentive strategies handling sentences from other models and propose potentially inspiring future research directions.

Funder

NSFC

NSF of Hunan Province

Science and Technology Innovation Program of Hunan Province

European Research Council

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3482853

Reference164 articles.

1. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks;Adi Yossi;Corr,2016

2. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability

3. SemEval-2014 Task 10: Multilingual Semantic Textual Similarity

4. SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation

5. Eneko Agirre, Daniel M. Cer, Mona T. Diab, and Aitor Gonzalez-Agirre. 2012. SemEval-2012 task 6: a pilot on semantic textual similarity. In Proceedings of the 6th International Workshop on Semantic Evaluation. 385–393.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mining User Privacy Concern Topics from App Reviews;2024

2. A Retrieval-Augmented Generation Strategy to Enhance Medical Chatbot Reliability;Lecture Notes in Computer Science;2024

3. Using the interest theory of rights and Hohfeldian taxonomy to address a gap in machine learning methods for legal document analysis;Humanities and Social Sciences Communications;2023-05-19

4. Link prediction for heterogeneous information networks based on enhanced meta-path aggregation and attention mechanism;International Journal of Machine Learning and Cybernetics;2023-03-28

5. Fall-Attention: An Attention-Based Fall Detection Method for Adjoint Activities;IEEE Transactions on Mobile Computing;2023