Affiliation:
1. Univ. of Denver, Denver, CO
2. Univ. of Colorado at Boulder, Boulder
Abstract
Given a finite set of texts
S
= {
w
1, … ,
w
k
} over some fixed finite alphabet Σ, a complete inverted file for
S
is an abstract data type that provides the functions
find
(
w
), which returns the longest prefix of
w
that occurs (as a subword of a word) in
S
;
freq
(
w
), which returns the number of times
w
occurs in
S
; and
locations
(
w
), which returns the set of positions where
w
occurs in
S
. A data structure that implements a complete inverted file for
S
that occupies linear space and can be built in linear time, using the uniform-cost RAM model, is given. Using this data structure, the time for each of the above query functions is optimal. To accomplish this, techniques from the theory of finite automata and the work on suffix trees are used to build a deterministic finite automaton that recognizes the set of all subwords of the set
S
. This automaton is then annotated with additional information and compacted to facilitate the desired query functions. The result is a data structure that is smaller and more flexible than the suffix tree.
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software
Reference22 articles.
1. Linear size finite automata for the set of all subwords of a word: An outline of results;BLUMER A.;Bull. Fur. Assoc. Theoret. Comput. Sci.,1983
Cited by
148 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献