Historical Text Line Segmentation Using Deep Learning Algorithms: Mask-RCNN against U-Net Networks
-
Published:2024-03-05
Issue:3
Volume:10
Page:65
-
ISSN:2313-433X
-
Container-title:Journal of Imaging
-
language:en
-
Short-container-title:J. Imaging
Author:
Fizaine Florian Côme12ORCID, Bard Patrick1ORCID, Paindavoine Michel1ORCID, Robin Cécile23, Bouyé Edouard2, Lefèvre Raphaël4, Vinter Annie1ORCID
Affiliation:
1. LEAD-CNRS, Université de Bourgogne, 21000 Dijon, France 2. Archives Départementales de Côte d’Or, 21000 Dijon, France 3. Institut National du Patrimoine, 75002 Paris, France 4. Société Nationale des Chemins de fer Français, 93200 Saint Denis, France
Abstract
Text line segmentation is a necessary preliminary step before most text transcription algorithms are applied. The leading deep learning networks used in this context (ARU-Net, dhSegment, and Doc-UFCN) are based on the U-Net architecture. They are efficient, but fall under the same concept, requiring a post-processing step to perform instance (e.g., text line) segmentation. In the present work, we test the advantages of Mask-RCNN, which is designed to perform instance segmentation directly. This work is the first to directly compare Mask-RCNN- and U-Net-based networks on text segmentation of historical documents, showing the superiority of the former over the latter. Three studies were conducted, one comparing these networks on different historical databases, another comparing Mask-RCNN with Doc-UFCN on a private historical database, and a third comparing the handwritten text recognition (HTR) performance of the tested networks. The results showed that Mask-RCNN outperformed ARU-Net, dhSegment, and Doc-UFCN using relevant line segmentation metrics, that performance evaluation should not focus on the raw masks generated by the networks, that a light mask processing is an efficient and simple solution to improve evaluation, and that Mask-RCNN leads to better HTR performance.
Reference42 articles.
1. Archives, F.N. (1997). Gallica, The BnF Digital Library. 2. Nadeau, C., Haliwell, W., Roberts, K., and Roberts, G. (1980). Psychology of Motor Behavior and Sport, Human Kinetic Publisher. 3. Text line segmentation of historical documents: A survey;Zahour;Int. J. Doc. Anal. Recognit. (IJDAR),2007 4. Diem, M., Kleber, F., Fiel, S., Gruning, T., and Gatos, B. (2017, January 9–15). cBAD: ICDAR2017 Competition on Baseline Detection. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan. 5. Kurar Barakat, B., Cohen, R., Droby, A., Rabaev, I., and El-Sana, J. (2020). Learning-Free Text Line Segmentation for Historical Handwritten Documents. Appl. Sci., 10.
|
|