Affiliation:
1. Univ. of Sheffield, Sheffield, U.K.
Abstract
Using direct access computer files of bibliographic information, an attempt is made to overcome one of the problems often associated with information retrieval, namely, the maintenance and use of large dictionaries, the greater part of which is used only infrequently. A novel method is presented, which maps the hyperbolic frequency distribution of text characteristics onto a rectangular distribution. This is more suited to implementation on storage devices.
This method treats text as a string of characters rather than words bounded by spaces, and chooses subsets of strings such that their frequencies of occurrence are more even than those of word types. The members of this subset are then used as index keys for retrieval. The rectangular distribution of key frequencies results in a much simplified file organization and promises considerable cost advantages.
Publisher
Association for Computing Machinery (ACM)
Cited by
31 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Similarity methods in chemoinformatics;Annual Review of Information Science and Technology;2009
2. Applications of n‐grams in textual information systems;Journal of Documentation;1998-03-01
3. Recursive hashing functions for
n
-grams;ACM Transactions on Information Systems;1997-07
4. Highlights: Language- and domain-independent automatic indexing terms for abstracting;Journal of the American Society for Information Science;1995-04
5. Text compression methods;Journal of Soviet Mathematics;1991-08