Investigating Models for the Transcription of Mathematical Formulas in Images-Reference-Cited by-同舟云学术

Investigating Models for the Transcription of Mathematical Formulas in Images

Published:2024-01-29 Issue:3 Volume:14 Page:1140
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Feichter Christian¹,Schlippe Tim¹^ORCID

Affiliation:

1. IU International University of Applied Sciences, 99084 Erfurt, Germany

Abstract

The automated transcription of mathematical formulas represents a complex challenge that is of great importance for digital processing and comprehensibility of mathematical content. Consequently, our goal was to analyze state-of-the-art approaches for the transcription of printed mathematical formulas on images into spoken English text. We focused on two approaches: (1) The combination of mathematical expression recognition (MER) models and natural language processing (NLP) models to convert formula images first into LaTeX code and then into text, and (2) the direct conversion of formula images into text using vision-language (VL) models. Since no dataset with printed mathematical formulas and corresponding English transcriptions existed, we created a new dataset, Formula2Text, for fine-tuning and evaluating our systems. Our best system for (1) combines the MER model LaTeX-OCR and the NLP model BART-Base, achieving a translation error rate of 36.14% compared with our reference transcriptions. In the task of converting LaTeX code to text, BART-Base, T5-Base, and FLAN-T5-Base even outperformed ChatGPT, GPT-3.5 Turbo, and GPT-4. For (2), the best VL model, TrOCR, achieves a translation error rate of 42.09%. This demonstrates that VL models, predominantly employed for classical image captioning tasks, possess significant potential for the transcription of mathematical formulas in images.

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/3/1140/pdf

Reference49 articles.

1. Tan, X., Qin, T., Soong, F., and Liu, T.Y. (2021). A Survey on Neural Speech Synthesis. arXiv.

2. Fu, Y., Liu, T., Gao, M., and Zhou, A. (2020). International Conference on Document Analysis and Recognition, Springer Nature.

3. Pang, N., Yang, C., Zhu, X., Li, J., and Yin, X.C. (2021, January 10–15). Global Context-Based Network with Transformer for Image2latex. Proceedings of the 25th International Conference on Pattern Recognition (ICPR), Milan, Italy.

4. Zhou, M., Cai, M., Li, G., and Li, M. (2023). An End-to-End Formula Recognition Method Integrated Attention Mechanism. Mathematics, 11.

5. Deng, Y., Kanervisto, A., Ling, J., and Rush, A.M. (2017, January 6–11). Image-to-Markup Generation with Coarse-to-Fine Attention. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Can VLM Understand Children’s Handwriting? An Analysis on Handwritten Mathematical Equation Recognition;Communications in Computer and Information Science;2024