The Entropy of Digital Texts—The Mathematical Background of Correctness

Author:

Csernoch Mária1ORCID,Nagy Keve1,Nagy Tímea1ORCID

Affiliation:

1. Faculty of Informatics, University of Debrecen, Kassai út 26., 4028 Debrecen, Hungary

Abstract

Based on Shannon’s communication theory, in the present paper, we provide the theoretical background to finding an objective measurement—the text-entropy—that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the formatting, correction, and modification entropy, and based on these values, we are able to tell how correct or how erroneous digital text-based documents are. To present how the theory can be applied to real-world texts, for the present study, three erroneous MS Word documents were selected. With these examples, we can demonstrate how to build their correcting, formatting, and modification algorithms, to calculate the time spent on modification and the entropy of the completed tasks, in both the original erroneous and the corrected documents. In general, it was found that using and modifying properly edited and formatted digital texts requires less or an equal number of knowledge-items. In information theory, it means that less data must be put on the communication channel than in the case of erroneous documents. The analysis also revealed that in the corrected documents not only the quantity of the data is less, but the quality of the data (knowledge pieces) is higher. As the consequence of these two findings, it is proven that the modification time of erroneous documents is severalfold of the correct ones, even in the case of minimal first level actions. It is also proven that to avoid the repetition of the time- and resource-consuming actions, we must correct the documents before their modification.

Funder

KDP-2021 Program of the Ministry for Innovation and Technology from the Source of the National Research, Development and Innovation Fund

Publisher

MDPI AG

Subject

General Physics and Astronomy

Reference90 articles.

1. Johnson, E. (2023, January 21). A Little Learning about Word Processing. Available online: https://www.uv.es/~fores/programa/johnson_wordprocessing2.html.

2. Wolfram, C. (2020). The Math(s) FIX: An Education Blueprint for the AI Age, Wolfram Media, Inc.

3. Rother, M. (2010). Toyota Kata: Managing People for Improvement, Adaptiveness, and Superior Results, McGraw Hill.

4. Liker, J.K. (2021). The Toyota Way: 14 Management Principles from the World’s Greatest Manufacturer, McGraw-Hill.

5. Ben-Ari, M. (1999). Bricolage Forever! In Psychology of Programming Interest Group, University of Leeds.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3