Historical document image analysis using controlled data for pre-training-Reference-Cited by-同舟云学术

Historical document image analysis using controlled data for pre-training

Published:2023-05-10 Issue:3 Volume:26 Page:241-254
ISSN:1433-2833
Container-title:International Journal on Document Analysis and Recognition (IJDAR)
language:en
Short-container-title:IJDAR

Author:

Rahal Najoua,Vögtlin Lars,Ingold Rolf

Abstract

AbstractUsing neural networks for semantic labeling has become a dominant technique for layout analysis of historical document images. However, to train or fine-tune appropriate models, large labeled datasets are needed. This paper addresses the case when only limited labeled data are available and promotes a novel approach using so-called controlled data to pre-train the networks. Two different strategies are proposed: The first addresses the real labeling task by using artificial data; the second uses real data to pre-train the networks with a pretext task. To assess these strategies, a large set of experiments has been carried out on a text line detection and classification task using different variants of U-Net. The observations, obtained from two different datasets, show that globally the approach reduces the training time while offering similar or better performance. Furthermore, the effect is bigger on lightweight network architectures.

Funder

University of Fribourg

Publisher

Springer Science and Business Media LLC

Subject

Computer Science Applications,Computer Vision and Pattern Recognition,Software

Link

https://link.springer.com/content/pdf/10.1007/s10032-023-00437-8.pdf

Reference37 articles.

1. He, Z.: Deep learning in image classification: a survey report. In: 2020 2nd International Conference on Information Technology and Computer Application (ITCA), pp. 174–177. IEEE (2020)

2. Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021)

3. Zaidi, S., Ansari, M., Aslam, A., Kanwal, N., Asghar, M., Lee, B.: A survey of modern deep learning based object detection models. arXiv preprint arXiv:2104.11892 (2021)

4. Würsch, M., Ingold, R., Liwicki, M.: Divaservices—a restful web service for document image analysis methods. Digit. Scholarsh. Humanit. 32(suppl_1), 150–156 (2017). https://doi.org/10.1093/llc/fqw051

5. Vögtlin, L., Drazyk, M., Pondenkandath, V., Alberti, M., Ingold, R.: Generating synthetic handwritten historical documents with OCR constrained GANs. In: International Conference on Document Analysis and Recognition, pp. 610–625. Springer, Berlin (2021)

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Approximate ground truth generation for semantic labeling of historical documents with minimal human effort;International Journal on Document Analysis and Recognition (IJDAR);2024-06-12

2. Digitizing History: Transitioning Historical Paper Documents to Digital Content for Information Retrieval and Mining—A Comprehensive Survey;IEEE Transactions on Computational Social Systems;2024

3. Impact of the ground truth quality for handwriting recognition;Proceedings of the 12th International Symposium on Information and Communication Technology;2023-12-07