Affiliation:
1. Univ. of Chicago, Chicago, IL
Abstract
The emergence of the CD-ROM as a storage medium for full-text databases raises the question of the maximum size database that can be contained by this medium. As an example, the problem of storing the Trésor de la Langue Française on a CD-ROM is examined in this paper. The text alone of this database is 700 megabytes long, more than a CD-ROM can hold. In addition, the dictionary and concordance needed to access these data must be stored. A further constraint is that some of the material is copyrighted, and it is desirable that such material be difficult to decode except through software provided by the system. Pertinent approaches to compression of the various files are reviewed, and the compression of the text is related to the problem of data encryption: Specifically, it is shown that, under simple models of text generation, Huffman encoding produces a bit-string indistinguishable from a representation of coin flips.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference31 articles.
1. Processing truncated terms in document retrieval systems;BRATLE~ P.;Inf. Process. Manage.,1982
2. Full text systems and research in the humanities;CHOUEKA Y;Computers and the Humanities,1980
3. Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus;CHOUEKA Y.;J. Assoc. Literary and Linguistic Comput.,1983
Cited by
30 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Effect of Data Compression on Cipher Text Aiming Secure and Improved Data Storage;Information and Communication Technology for Competitive Strategies (ICTCS 2021);2022-06-23
2. Integrated encryption in dynamic arithmetic compression;Information and Computation;2021-08
3. On the Randomness of Compressed Data;Information;2020-04-07
4. HEliOS;Proceedings of the 16th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services;2019-11-12
5. Integrated Encryption in Dynamic Arithmetic Compression;Language and Automata Theory and Applications;2017