Author:
Lowe Michael,Prusa Joseph D.,Leevy Joffrey L.,Khoshgoftaar Taghi M.
Abstract
AbstractOCR2SEQ represents an innovative advancement in Optical Character Recognition (OCR) technology, leveraging a multi-modal generative augmentation strategy to overcome traditional limitations in OCR systems. This paper introduces OCR2SEQ’s unique approach, tailored to enhance data quality for sequence-to-sequence models, especially in scenarios characterized by sparse character sets and specialized vocabularies. At the heart of OCR2SEQ lies a set of novel augmentation techniques designed to simulate realistic text extraction errors. These techniques are adept at generating diverse and challenging data scenarios, thereby substantially improving the training efficacy and accuracy of text-to-text transformers. The application of OCR2SEQ has shown notable improvements in data processing accuracy, particularly in sectors heavily dependent on OCR technologies such as healthcare and library sciences. This paper demonstrates the capability of OCR2SEQ to transform OCR systems by enriching them with augmented, domain-specific data, paving the way for more sophisticated and reliable machine learning interpretations. This advancement in OCR technology, as presented in the study, not only enhances the accuracy and reliability of data processing but also sets a new benchmark in the integration of augmented data for refining OCR capabilities.
Publisher
Springer Science and Business Media LLC
Reference24 articles.
1. Patel C, Patel A, Patel D. Optical character recognition by open source OCR tool tesseract: a case study. Int J Comput Appl. 2012;55(10):50–6.
2. Smith R. An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). IEEE. 2007;2: 629–633.
3. Jockers ML, Underwood T. Text-mining the humanities. A new companion to digital humanities. Wiley Online Library. 2015;291–306.
4. Lihui F, Underwood T. The core issues and latest progress of current digital humanities research: An interview with ted underwood. Foreign Lit Stud. 2021;43(6):1.
5. Cleland I, Han M, Nugent C, Lee H, Zhang S, McClean S, Lee S. Mobile based prompted labeling of large scale activity data. In: Ambient Assisted Living and Active Aging: 5th International Work-Conference, IWAAL 2013, Carrillo, Costa Rica, December 2-6, 2013, Proceedings 5, 2013; 9–17. Springer.