SHIBR—The Swedish Historical Birth Records: a semi-annotated dataset-Reference-Cited by-同舟云学术

SHIBR—The Swedish Historical Birth Records: a semi-annotated dataset

Published:2021-06-27 Issue:22 Volume:33 Page:15863-15875
ISSN:0941-0643
Container-title:Neural Computing and Applications
language:en
Short-container-title:Neural Comput & Applic

Author:

Cheddad Abbas^ORCID,Kusetogullari Hüseyin,Hilmkil Agrin,Sundin Lena,Yavariabdi Amir,Aouache Mustapha,Hall Johan

Abstract

AbstractThis paper presents a digital image dataset of historical handwritten birth records stored in the archives of several parishes across Sweden, together with the corresponding metadata that supports the evaluation of document analysis algorithms’ performance. The dataset is called SHIBR (the Swedish Historical Birth Records). The contribution of this paper is twofold. First, we believe it is the first and the largest Swedish dataset of its kind provided as open access (15,000 high-resolution colour images of the era between 1800 and 1840). We also perform some data mining of the dataset to uncover some statistics and facts that might be of interest and use to genealogists. Second, we provide a comprehensive survey of contemporary datasets in the field that are open to the public along with a compact review of word spotting techniques. The word transcription file contains 17 columns of information pertaining to each image (e.g., child’s first name, birth date, date of baptism, father's first/last name, mother’s first/last name, death records, town, job title of the father/mother, etc.). Moreover, we evaluate some deep learning models, pre-trained on two other renowned datasets, for word spotting in SHIBR. However, our dataset proved challenging due to the unique handwriting style. Therefore, the dataset could also be used for competitions dedicated to a large set of document analysis problems, including word spotting.

Funder

Stiftelsen för Kunskaps- och Kompetensutveckling

Swedish Foundation for International Cooperation in Research and Higher Education

Blekinge Institute of Technology

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s00521-021-06207-z.pdf

Reference62 articles.

1. H Balk, A Conteh (2011) IMPACT: centre of competence in text digitisation. In: Proceedings of the 2011 workshop on historical document imaging and processing (pp. 155–160)

2. H Balk (2009) Poor access to digitised historical texts: the solutions of the IMPACT project. In: Proceedings of the third workshop on analytics for noisy unstructured text data (pp. 1–1)

3. M Krystyna, AH Qasem (2009) Digitizing the historical periodical collection at the Al-Aqsa Mosque Library in East Jerusalem. In: Proceedings IFLA world library and information Congress, Milan, Italy, August 24

4. Z Zakariah, N Janom, NH Arshad, SS Salleh, SRS Aris (2014) Crowdsourcing: the trend of prior studies. In: Proceedings of the 2014 4th international conference on artificial intelligence with applications in engineering and technology (ICAIET’14). IEEE computer society, USA, 129–133. DOI: https://doi.org/10.1109/ICAIET.2014.30

5. C Clausner, J Hayes, A Antonacopoulos (2019) Crowdsourcing historical tabular data: 1961 Census of England and Wales. In: Proceedings of the 5th international workshop on historical document imaging and processing (HIP’19). Association for Computing Machinery, New York, NY, USA, 42–47. DOI: https://doi.org/10.1145/3352631.3352643.

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On the improvement of handwritten text line recognition with octave convolutional recurrent neural networks;International Journal on Document Analysis and Recognition (IJDAR);2024-02-20

2. Gated Convolution and Stacked Self-Attention Encoder–Decoder-Based Model for Offline Handwritten Ethiopic Text Recognition;Information;2023-12-09

3. KOHTD: Kazakh offline handwritten text dataset;Signal Processing: Image Communication;2022-10

4. Evaluation and Recognition of Handwritten Chinese Characters Based on Similarities;Applied Sciences;2022-08-25

5. Low-Computational-Cost Algorithm for Inclination Correction of Independent Handwritten Digits on Microcontrollers;Electronics;2022-03-29