Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models-Reference-Cited by-同舟云学术

Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models

Published:2024-03-30 Issue:7 Volume:12 Page:1036
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Alahmadi Mohammad D.¹^ORCID,Alshangiti Moayad¹^ORCID

Affiliation:

1. Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 23890, Saudi Arabia

Abstract

The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software development. This study investigates the impact of video quality on the performance of optical character recognition (OCR) engines and the potential of large language models (LLMs) to enhance code extraction accuracy. Our comprehensive empirical analysis utilizes a rich dataset of programming screencasts, involving manual transcription of source code and the application of both traditional OCR engines, like Tesseract and Google Vision, and advanced LLMs, including GPT-4V and Gemini. We investigate the efficacy of image super-resolution (SR) techniques, namely, enhanced deep super-resolution (EDSR) and multi-scale deep super-resolution (MDSR), in improving the quality of low-resolution video frames. The findings reveal significant improvements in OCR accuracy with the use of SR, particularly at lower resolutions such as 360p. LLMs demonstrate superior performance across all video qualities, indicating their robustness and advanced capabilities in diverse scenarios. This research contributes to the field of software engineering by offering a benchmark for code extraction from video tutorials and demonstrating the substantial impact of SR techniques and LLMs in enhancing the readability and reusability of code from these educational resources.

Funder

University of Jeddah

Publisher

MDPI AG

Link

https://www.mdpi.com/2227-7390/12/7/1036/pdf

Reference49 articles.

1. Brandt, J., Guo, P.J., Lewenstein, J., Dontcheva, M., and Klemmer, S.R. (2009, January 4–9). Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY, USA. CHI ’09.

2. Task-specific information retrieval systems for software engineers;Grzywaczewski;J. Comput. Syst. Sci.,2012

3. Storey, M.A., Singer, L., Cleary, B., Figueira Filho, F., and Zagalsky, A. (2014). Future of Software Engineering, FOSE.

4. Documenting and sharing software knowledge using screencasts;MacLeod;Empir. Softw. Eng.,2017

5. Teaching Programming by Revealing Thinking Process: Watching Experts’ Live Coding Videos With Reflection Annotations;Lin;IEEE Trans. Educ.,2022

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers;Mathematics;2024-07-07