A Scientific Document Retrieval and Reordering Method by Incorporating HFS and LSD
-
Published:2023-10-12
Issue:20
Volume:13
Page:11207
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Feng Ziyang123, Tian Xuedong123ORCID
Affiliation:
1. School of Cyber Security and Computer, Hebei University, Baoding 071002, China 2. Institute of Intelligent Image and Document Information Processing, Hebei University, Baoding 071002, China 3. Hebei Machine Vision Engineering Research Center, Hebei University, Baoding 071002, China
Abstract
Achieving scientific document retrieval by considering the wealth of mathematical expressions and the semantic text they contain has become an inescapable trend. Current scientific document matching models focus solely on the textual features of expressions and frequently encounter hurdles like proliferative parameters and sluggish reasoning speeds in the pursuit of improved performance. To solve this problem, this paper proposes a scientific document retrieval method founded upon hesitant fuzzy sets (HFS) and local semantic distillation (LSD). Concretely, in order to extract both spatial and semantic features for each symbol within a mathematical expression, this paper introduces an expression analysis module that leverages HFS to establish feature indices. Secondly, to enhance contextual semantic alignment, the method of knowledge distillation is employed to refine the pretrained language model and establish a twin network for semantic matching. Lastly, by amalgamating mathematical expressions with contextual semantic features, the retrieval results can be made more efficient and rational. Experiments were implemented on the NTCIR dataset and the expanded Chinese dataset. The average MAP for mathematical expression retrieval results was 83.0%, and the average nDCG for sorting scientific documents was 85.8%.
Funder
Natural Science Foundation of Hebei Province of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference34 articles.
1. Mansouri, B., Zanibbi, R., and Oard, D.W. (2021, January 11–15). Learning to rank for mathematical formula retrieval. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual. 2. Nishizawa, G., Liu, J., Diaz, Y., Dmello, A., Zhong, W., and Zanibbi, R. (2020, January 14–17). MathSeer: A math-aware search interface with intuitive formula editing, reuse, and lookup. Proceedings of the Advances in Information Retrieval: 42nd European Conference on IR Research—ECIR 2020, Lisbon, Portugal. 3. Mallia, A., Siedlaczek, M., and Suel, T. (2019, January 14–18). An experimental study of index compression and DAAT query processing methods. Proceedings of the Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany. 4. Ni, J., Ábrego, G.H., Constant, N., Ma, J., Hall, K.B., Cer, D., and Yang, Y. (2021). Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv. 5. Mehta, S., Shah, D., Kulkarni, R., and Caragea, C. (2023). Semantic Tokenizer for Enhanced Natural Language Processing. arXiv.
|
|