How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine – Final Notes on Development and Evaluation

Author:

Koistinen Mika,Kettunen Kimmo,Kervinen Jukka

Publisher

Springer International Publishing

Reference27 articles.

1. Kettunen, K., Honkela, T., Lindén, K., Kauppinen, P., Pääkkönen, T., Kervinen, J.: Analyzing and improving the quality of a historical news collection using language technology and statistical machine learning methods. In: IFLA World Library and Information Congress, Lyon (2014). http://www.ifla.org/files/assets/newspapers/Geneva_2014/s6-honkela-en.pdf

2. Kettunen, K., Pääkkönen, T.: Measuring lexical quality of a historical finnish newspaper collection – analysis of garbled OCR data with basic language technology tools and means. In: Calzolari, N., et al. (ed.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (2016). http://www.lrec-conf.org/proceedings/lrec2016/pdf/17_Paper.pdf

3. Pääkkönen, T., Kervinen, J., Nivala, A., Kettunen, K., Mäkelä, E.: Exporting Finnish digitized historical newspaper contents for offline use. D-Lib Mag. 22, July/August 2016 (2016)

4. Pääkkönen, T., Kettunen, K.: Kansalliskirjaston sanomalehtiaineistot: käyttäjät ja tutkijat kesällä 2018. Informaatiotutkimus 37(3), 15–19 (2018). https://doi.org/10.23978/inf.76067

5. Piotrowski, M.: Natural language processing for historical texts. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, San Rafael (2012)

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Quality Inspection Algorithm of Printed Character Based on OCR Technology;2024 Second International Conference on Data Science and Information System (ICDSIS);2024-05-17

2. Gpu-based and streaming-enabled implementation of pre-processing flow towards enhancing optical character recognition accuracy and efficiency;Cluster Computing;2023-09-20

3. Google Tesseract: Optical Character Recognition (OCR) on HDD / SSD Labels Using Machine Vision;2022 14th International Conference on Computer and Automation Engineering (ICCAE);2022-03-25

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3