Chinese long text similarity calculation of semantic progressive fusion based on Bert-Reference-Cited by-同舟云学术

Chinese long text similarity calculation of semantic progressive fusion based on Bert

Published:2024-08-14 Issue:4-5 Volume:24 Page:2213-2225
ISSN:1472-7978
Container-title:Journal of Computational Methods in Sciences and Engineering
language:
Short-container-title:JCM

Author:

Li Xiao¹²,Hu Lanlan¹

Affiliation:

1. School of Computer and Information Engineering, Anyang Normal University, Anyang, Henan, China

2. Key Laboratory of Oracle Bone Inscriptions Information Processing, Ministry of Education, Anyang Normal University, Anyang, Henan, China

Abstract

Text similarity is an important index to measure the similarity between two or more texts. It is widely used in many fields of natural language processing tasks. With the maturity of deep learning technology, a large number of neural network models have been used to calculate text similarity and have achieved good results in similarity calculation task of sentences or short texts. Among them, Bert model has become a research hotspot in this field due to its excellent performance. However, the application effect of existing similarity algorithms on long texts is not ideal, and they cannot truly extract richer semantic information hidden in the structure of long text documents. This paper takes Chinese long text as the research object, proposes a long text similarity calculation method using sentence sequence instead of word level sequence, constructs a long text semantic representation model with semantic progressive fusion, solves the practical problems faced by applications or natural language processing tasks related to long text semantics, in order to breaks through the bottleneck of long text similarity calculation.

Publisher

IOS Press

Reference12 articles.

1. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. in: International Conference on Learning Representations. 2013. arXiv preprintarXiv1301. 3781v3.

2. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. in: Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies. 2019; 1: 4171-4186.

3. Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. OpenAI. 2018.

4. Universal language model fine-tuning for text classification;Howard;56th Annual Meeting of the Association for Computational Linguistics,2018

5. Cho K, Merrienboer BV, Gulcehre C, Bahdanau D, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, pp. 1724-1734.