Affiliation:
1. P.G. Demidov Yaroslavl State University
Abstract
The paper presents the results of a study of modern text models in order to identify, on their basis, the semantic similarity of English-language texts. The task of determining semantic similarity of texts is an important component of many areas of natural language processing: machine translation, information retrieval, question and answer systems, artificial intelligence in education. The authors solved the problem of classifying the proximity of student answers to the teacher’s standard answer. The neural network language models BERT and GPT, previously used to determine the semantic similarity of texts, the new neural network model Mamba, as well as stylometric features of the text were chosen for the study. Experiments were carried out with two text corpora: the Text Similarity corpus from open sources and the custom corpus, collected with the help of philologists. The quality of the problem solution was assessed by precision, recall, and F-measure. All neural network language models showed a similar F-measure quality of about 86% for the larger Text Similarity corpus and 50–56% for the custom corpus. A completely new result was the successful application of the Mamba model. However, the most interesting achievement was the use of vectors of stylometric features of the text, which showed 80% F-measure for the custom corpus and the same quality of problem solving as neural network models for another corpus.
Publisher
P.G. Demidov Yaroslavl State University
Reference25 articles.
1. R. Gao, H. E. Merzdorf, S. Anwar, M. C. Hipwell, and A. Srinivasa, “Automatic assessment of text-based responses in post-secondary education: A systematic review,” Computers and Education: Artificial Intelligence, vol. 6, p. 100206, 2024, doi: 10.1016/j.caeai.2024.100206.
2. J. Wang and Y. Dong, “Measurement of text similarity: a survey,” Information, vol. 11, no. 9, p. 421, 2020, doi: 10.3390/info11090421.
3. A. Rozeva and S. Zerkova, “Assessing semantic similarity of texts--methods and algorithms,” AIP Conference Proceedings, vol. 1910, no. 1, p. 060012, 2017, doi: 10.1063/1.5014006.
4. P. D. Wibisono, A. Asad, and A. Chintan, “Short text similarity measurement methods: a review,” Soft Computing, vol. 25, pp. 4699–4723, 2021, doi: 10.1007/s00500-020-05479-2.
5. N. S. Lagutina, M. V. Tihomirov, and N. K. Mastakova, “Algoritm avtomaticheskogo postroeniya yazykovogo profilya uchashchegosya,” Zametki po informatike i matematike, no. 15, pp. 58–65, 2023.