The relationship between greedy parsing and symbolwise text compression

Author:

Bell Timothy C.1,Witten Ian H.2

Affiliation:

1. Univ. of Canterbury, Christchurch, New Zealand

2. Univ. of Waikato, Hamilton, New Zealand

Abstract

Text compression methods can be divided into two classes: symbolwise and parsing . Symbolwise methods assign codes to individual symbols, while parsing methods assign codes to groups of consecutive symbols (phrases). The set of phrases available to a parsing method is referred to as a dictionary . The vast majority of parsing methods in the literature use greedy parsing (including nearly all variations of the popular Ziv-Lempel methods). When greedy parsing is used, the coder processes a string from left to right, at each step encoding as many symbols as possible with a phrase from the dictionary. This parsing strategy is not optimal, but an optimal method cannot guarantee a bounded coding delay. An important problem in compression research has been to establish the relationship between symbolwise methods and parsing methods. This paper extends prior work that shows that there are symbolwise methods that simulate a subset of greedy parsing methods. We provide a more general algorithm that takes any nonadaptive greedy parsing method and constructs a symbolwise method that achieves exactly the same compression. Combined with the existence of symbolwise equivalents for two of the most significant adaptive parsing methods, this result gives added weight to the idea that research aimed at increasing compression should concentrate on symbolwise methods, while parsing methods should be chosen for speed or temporary storage considerations.

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software

Reference42 articles.

1. Better OPM/L text compression;~BELL T. C.;IEEE Trans. Comm. COM-34, 12, pp. ~1176 1182.,1986

2. ~BELL T. C. 1987. A unifying theory and improvements for existing approaches to text compres-sion. Ph.D. dissertation. Department of Computer Science Univ. Canterbury New Zealand. ~BELL T. C. 1987. A unifying theory and improvements for existing approaches to text compres-sion. Ph.D. dissertation. Department of Computer Science Univ. Canterbury New Zealand.

3. ~BELL T. C. CLEARY J. G. AND WITTEN 1. H. 1990. Text Compression. Prentice-Hall Englewood ~Cliffs N.J. ~BELL T. C. CLEARY J. G. AND WITTEN 1. H. 1990. Text Compression. Prentice-Hall Englewood ~Cliffs N.J.

4. ~BELL T. C. AND MOFFAT A. M. 1989. A note on the DMC data compression scheme Cotrtplxt. ~J. 32 1 (Feb.) 16-20. 10.1093/comjnl/32.1.16 ~BELL T. C. AND MOFFAT A. M. 1989. A note on the DMC data compression scheme Cotrtplxt. ~J. 32 1 (Feb.) 16-20. 10.1093/comjnl/32.1.16

5. ~BENTLEY J. L. SLEATOR D. D. TARJAN R. E. AND WEI V. K. 1986. A locally adaptive data ~compression scheme. Commun. A('M 29 4 (Apr.) 320-330. 10.1145/5684.5688 ~BENTLEY J. L. SLEATOR D. D. TARJAN R. E. AND WEI V. K. 1986. A locally adaptive data ~compression scheme. Commun. A('M 29 4 (Apr.) 320-330. 10.1145/5684.5688

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. On optimal parsing for LZ78-like compressors;Theoretical Computer Science;2018-02

2. Compressing Big Data: When the Rate of Convergence to the Entropy Matters;Mathematical Aspects of Computer and Information Sciences;2016

3. Pattern Matching in Compressed Texts and Images;Foundations and Trends® in Signal Processing;2013

4. Dictionary-symbolwise flexible parsing;Journal of Discrete Algorithms;2012-07

5. Dictionary-Symbolwise Flexible Parsing;Lecture Notes in Computer Science;2011

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3