Comparison of Entropy and Dictionary Based Text Compression in English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian-Reference-Cited by-同舟云学术

Comparison of Entropy and Dictionary Based Text Compression in English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian

Published:2020-07-01 Issue:7 Volume:8 Page:1059
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Ignatoski Matea^ORCID,Lerga Jonatan^ORCID,Stanković Ljubiša^ORCID,Daković Miloš^ORCID

Abstract

The rapid growth in the amount of data in the digital world leads to the need for data compression, and so forth, reducing the number of bits needed to represent a text file, an image, audio, or video content. Compressing data saves storage capacity and speeds up data transmission. In this paper, we focus on the text compression and provide a comparison of algorithms (in particular, entropy-based arithmetic and dictionary-based Lempel–Ziv–Welch (LZW) methods) for text compression in different languages (Croatian, Finnish, Hungarian, Czech, Italian, French, German, and English). The main goal is to answer a question: ”How does the language of a text affect the compression ratio?” The results indicated that the compression ratio is affected by the size of the language alphabet, and size or type of the text. For example, The European Green Deal was compressed by 75.79%, 76.17%, 77.33%, 76.84%, 73.25%, 74.63%, 75.14%, and 74.51% using the LZW algorithm, and by 72.54%, 71.47%, 72.87%, 73.43%, 69.62%, 69.94%, 72.42% and 72% using the arithmetic algorithm for the English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian versions, respectively.

Funder

Hrvatska Zaklada za Znanost

European Cooperation in Science and Technology

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/8/7/1059/pdf

Reference33 articles.

1. A New Encoding Decoding Scheme for Text Compression with Embedded Security

2. Finding Patterns in Signals Using Lossy Text Compression

3. Evaluation of Huffman and Arithmetic Algorithms for Multimedia Compression Standards

4. Simple lossless preprocessing algorithms for text compression

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Arithmetic N-gram: an efficient data compression technique;Discover Computing;2024-03-13

2. A hybrid approach to secure and compress data streams in cloud computing environment;Journal of King Saud University - Computer and Information Sciences;2024-03

3. Exploring Text Data Compression: A Comparative Study of Adaptive Huffman and LZW Approaches;BIO Web of Conferences;2024

4. Single and Binary Performance Comparison of Data Compression Algorithms for Text Files;Bitlis Eren Üniversitesi Fen Bilimleri Dergisi;2023-09-28

5. Performance evaluation of various source code techniques;THE FOURTH SCIENTIFIC CONFERENCE FOR ELECTRICAL ENGINEERING TECHNIQUES RESEARCH (EETR2022);2023