TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models-Reference-Cited by-同舟云学术

TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models

Published:2023-06-26 Issue:11 Volume:37 Page:13094-13102
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Li Minghao,Lv Tengchao,Chen Jingye,Cui Lei,Lu Yijuan,Florencio Dinei,Zhang Cha,Li Zhoujun,Wei Furu

Abstract

Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at https://aka.ms/trocr.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 66 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HTR-VT: Handwritten text recognition with vision transformer;Pattern Recognition;2025-02

2. Hyper-Local Deformable Transformers for Text Spotting on Historical Maps;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24

3. Detecting Omissions in Geographic Maps through Computer Vision;2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR);2024-08-15

4. Sentiment Analysis of YouTube Users on Blackpink Kpop Group Using IndoBERT;INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi;2024-08-01

5. Preprocesado de imagen y OCR para mejorar deteccion de smishing;Jornadas de Automática;2024-07-23