Measurement of Text Similarity: A Survey-Reference-Cited by-同舟云学术

Measurement of Text Similarity: A Survey

Published:2020-08-31 Issue:9 Volume:11 Page:421
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Wang Jiapeng,Dong Yihong

Abstract

Text similarity measurement is the basis of natural language processing tasks, which play an important role in information retrieval, automatic question answering, machine translation, dialogue systems, and document matching. This paper systematically combs the research status of similarity measurement, analyzes the advantages and disadvantages of current methods, develops a more comprehensive classification description system of text similarity measurement algorithms, and summarizes the future development direction. With the aim of providing reference for related research and application, the text similarity measurement method is described by two aspects: text distance and text representation. The text distance can be divided into length distance, distribution distance, and semantic distance; text representation is divided into string-based, corpus-based, single-semantic text, multi-semantic text, and graph-structure-based representation. Finally, the development of text similarity is also summarized in the discussion section.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/11/9/421/pdf

Reference63 articles.

1. Semantic Matching in Search

2. Learning deep transformer models for machine translation;Wang;arXiv,2019

Cited by 120 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. INCEPT: A Framework for Duplicate Posts Classification with Combined Text Representations;ACM Transactions on the Web;2024-08-16

2. Mixture-of-languages Routing for Multilingual Dialogues;ACM Transactions on Information Systems;2024-08-05

3. Practical Evaluation of ChatGPT Performance for Radiology Report Generation;Academic Radiology;2024-08

4. Federated Learning Framework for Collaborative Time Series Anomaly Detection on Distributed Machines;2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC);2024-07-02

5. When surface-enhanced Raman spectroscopy meets complex biofluids: A new representation strategy for reliable and comprehensive characterization;Analytica Chimica Acta;2024-07