Affiliation:
1. University of Illinois at Urbana-Champaign, Urbana, IL
Abstract
The popularity of XML has exacerbated the need for an easy-to-use, high precision query interface for XML data. When traditional document-oriented keyword search techniques do not suffice, natural language interfaces and keyword search techniques that take advantage of XML structure make it very easy for ordinary users to query XML databases. Unfortunately, current approaches to processing these queries rely heavily on heuristics that are intuitively appealing but ultimately ad hoc. These approaches often retrieve false positive answers, overlook correct answers, and cannot rank answers appropriately. To address these problems for data-centric XML, we proposecoherency ranking(CR), a domain- and database design-independent ranking method for XML keyword queries that is based on an extension of the concepts of data dependencies and mutual information. With coherency ranking, the results of a keyword query are invariant under a class of equivalency-preserving schema reorganizations. We analyze the way in which previous approaches to XML keyword search approximate coherency ranking, and present efficient algorithms to process queries and rank their answers using coherency ranking. Our empirical evaluation with two real-world XML data sets shows that coherency ranking has better precision and recall and provides better ranking than all previous approaches.
Funder
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Reference67 articles.
1. A normal form for XML documents
2. Effective XML Keyword Search with Relevance Oriented Ranking
3. Berkeley DB. 2008. http://www.oracle.com/technology/products/berkeley-db. Berkeley DB. 2008. http://www.oracle.com/technology/products/berkeley-db.
Cited by
26 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献