Indexing temporal information for web pages-Reference-Cited by-同舟云学术

Indexing temporal information for web pages

Published:2011 Issue:3 Volume:8 Page:711-737
ISSN:1820-0214
Container-title:Computer Science and Information Systems
language:en
Short-container-title:COMSIS J

Author:

Jin Peiquan¹,Chen Hong¹,Zhao Xujian¹,Li Xiaowen¹,Yue Lihua¹

Affiliation:

1. School of Computer Science and Technology, University of Science and Technology of China, Hefei, China

Abstract

Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In this paper, we first present a framework of temporal-textual Web search engine. And then, we concentrate on designing a new hybrid index structure for temporal and textual information of Web pages. In particular, we propose to integrate B+-tree, inverted file and a typical temporal index called MAP21-Tree, to handle temporal-textual queries. We study five mechanisms to implement a hybrid index structure for temporal-textual queries, which use different ways to organize the inverted file, B+-tree and MAP-21 tree. After a theoretic analysis on the performance of those five index structures, we conduct experiments on both simulated and real data sets to make performance comparison. The experimental results show that among all the index schemes the first-inverted-file-then-MAP21-tree index structure has the best query performance and thus is an acceptable choice to be the temporal-textual index for future time-aware search engines.

Publisher

National Library of Serbia

Subject

General Computer Science

Reference5 articles.

1. Indexing valid time databases via B/sup +/-trees

2. Automatic TIMEX2 tagging of Korean news

3. Inverted files for text search engines

4. Designing access methods for bitemporal databases

5. Supporting temporal text-containment queries in temporal document databases

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research on Digital Serialization Method of Program Code;2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC);2019-07

2. A Unified Index for Spatio-Temporal Keyword Queries;Proceedings of the 25th ACM International on Conference on Information and Knowledge Management;2016-10-24

3. Focused crawling enhanced by CBP–SLC;Knowledge-Based Systems;2013-10