Author:
Li Minghao,Lv Tengchao,Chen Jingye,Cui Lei,Lu Yijuan,Florencio Dinei,Zhang Cha,Li Zhoujun,Wei Furu
Abstract
Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at https://aka.ms/trocr.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
66 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. HTR-VT: Handwritten text recognition with vision transformer;Pattern Recognition;2025-02
2. Hyper-Local Deformable Transformers for Text Spotting on Historical Maps;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24
3. Detecting Omissions in Geographic Maps through Computer Vision;2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR);2024-08-15
4. Sentiment Analysis of YouTube Users on Blackpink Kpop Group Using IndoBERT;INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi;2024-08-01
5. Preprocesado de imagen y OCR para mejorar deteccion de smishing;Jornadas de Automática;2024-07-23