metaGraphos: a Web-based system for transcribing, proofreading and publishing scanned documents

Author:

Varthis Evagelos,Poulos Marios

Abstract

Purpose This study aims to present metaGraphos, a crowdsourcing system that aids in the transcription and semantic enhancement of scanned documents by using a pool of volunteers or people willing to participate in exchange for a financial reward. Design/methodology/approach The metaGraphos can be used in circumstances where optical character recognition fails to produce satisfactory results, semantic tagging or assigning thematic headings to texts is considered necessary or even when ground-truth data has to be collected in raw form. Findings The system automatically provides a Web-based interface comprising a static HTML page and JavaScript code that displays the scanned images of the document, coupled with the corresponding incomplete texts side by side, allowing users to correct or complete the texts in parallel. Social implications By assisting the parallel transcription and the semantic enhancement of difficult scanned documents, the system further reveals the hidden cultural wealth and aids in knowledge dissemination, a fact that contributes significantly to the academic-scientific dialog and feedback. Originality/value Individual researchers, libraries and organizations in general may benefit from the system because it is cost-effective, practical and simple to set up client–server architecture that provides a reliable way to transcribe texts or revise transcriptions on a large scale.

Publisher

Emerald

Subject

Library and Information Sciences,Museology

Reference42 articles.

1. ABBYY FineReader PDF (2022), “ABBYY FineReader PDF”, available at: https://pdf.abbyy.com/finereader-pdf/

2. Nexus of circular economy and sustainable business performance in the era of digitalization;International Journal of Productivity and Performance Management,2021

3. Experimental evaluation of Arabic OCR systems;PSU Research Review,2017

4. Amazon Mechanical Turk (2022), “Amazon mechanical turk”, available at: www.mturk.com/ (accessed 16 October 2022).

5. Taking the long way around: improving the display of HathiTrust records in primo;Information Technology and Libraries,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3