Interactive optimization of embedding-based text similarity calculations-Reference-Cited by-同舟云学术

Interactive optimization of embedding-based text similarity calculations

Published:2022-08-03 Issue:4 Volume:21 Page:335-353
ISSN:1473-8716
Container-title:Information Visualization
language:en
Short-container-title:Information Visualization

Author:

Witschard Daniel¹^ORCID,Jusufi Ilir¹,Martins Rafael M¹,Kucher Kostiantyn¹²,Kerren Andreas¹²^ORCID

Affiliation:

1. Linnaeus University, Växjö, Sweden

2. Linköping University, Linköping, Sweden

Abstract

Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.

Funder

stiftelsen för miljöstrategisk forskning

Publisher

SAGE Publications

Subject

Computer Vision and Pattern Recognition

Link

http://journals.sagepub.com/doi/pdf/10.1177/14738716221114372

Reference102 articles.

1. Semantic text similarity using corpus-based word similarity and string similarity

2. Measurement of Text Similarity: A Survey

3. Representation Learning: A Review and New Perspectives

4. Sentence Embedding Based Semantic Clustering Approach for Discussion Thread Summarization

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automatic determination of semantic similarity of student answers with the standard one using modern models;Modeling and Analysis of Information Systems;2024-06-13

2. Design of a financial reporting management generation system based on Bi-LSTM model and MultiWord-Embedding method;International Journal of Data Mining and Bioinformatics;2024

3. VA + Embeddings STAR: A State‐of‐the‐Art Report on the Use of Embeddings in Visual Analytics;Computer Graphics Forum;2023-06

4. Visually Guided Network Reconstruction Using Multiple Embeddings;2023 IEEE 16th Pacific Visualization Symposium (PacificVis);2023-04