Investigating Models for the Transcription of Mathematical Formulas in Images
-
Published:2024-01-29
Issue:3
Volume:14
Page:1140
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Feichter Christian1, Schlippe Tim1ORCID
Affiliation:
1. IU International University of Applied Sciences, 99084 Erfurt, Germany
Abstract
The automated transcription of mathematical formulas represents a complex challenge that is of great importance for digital processing and comprehensibility of mathematical content. Consequently, our goal was to analyze state-of-the-art approaches for the transcription of printed mathematical formulas on images into spoken English text. We focused on two approaches: (1) The combination of mathematical expression recognition (MER) models and natural language processing (NLP) models to convert formula images first into LaTeX code and then into text, and (2) the direct conversion of formula images into text using vision-language (VL) models. Since no dataset with printed mathematical formulas and corresponding English transcriptions existed, we created a new dataset, Formula2Text, for fine-tuning and evaluating our systems. Our best system for (1) combines the MER model LaTeX-OCR and the NLP model BART-Base, achieving a translation error rate of 36.14% compared with our reference transcriptions. In the task of converting LaTeX code to text, BART-Base, T5-Base, and FLAN-T5-Base even outperformed ChatGPT, GPT-3.5 Turbo, and GPT-4. For (2), the best VL model, TrOCR, achieves a translation error rate of 42.09%. This demonstrates that VL models, predominantly employed for classical image captioning tasks, possess significant potential for the transcription of mathematical formulas in images.
Reference49 articles.
1. Tan, X., Qin, T., Soong, F., and Liu, T.Y. (2021). A Survey on Neural Speech Synthesis. arXiv. 2. Fu, Y., Liu, T., Gao, M., and Zhou, A. (2020). International Conference on Document Analysis and Recognition, Springer Nature. 3. Pang, N., Yang, C., Zhu, X., Li, J., and Yin, X.C. (2021, January 10–15). Global Context-Based Network with Transformer for Image2latex. Proceedings of the 25th International Conference on Pattern Recognition (ICPR), Milan, Italy. 4. Zhou, M., Cai, M., Li, G., and Li, M. (2023). An End-to-End Formula Recognition Method Integrated Attention Mechanism. Mathematics, 11. 5. Deng, Y., Kanervisto, A., Ling, J., and Rush, A.M. (2017, January 6–11). Image-to-Markup Generation with Coarse-to-Fine Attention. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|