Abstract
Search, exploration and social experience on the Web has recently undergone tremendous changes with search engines, web portals and social networks offering a different perspective on information discovery and consumption. This new perspective is aimed at capturing user intents, and providing richer and highly connected experiences. The new battleground revolves around technologies for the ingestion, disambiguation and enrichment of entities from a variety of structured and unstructured data sources - we refer to this process as knowledge base synthesis. This paper presents the design, implementation and production deployment of the Web Of Objects (WOO) system, a Hadoop-based platform tackling such challenges. WOO has been designed and implemented to enable various products in Yahoo! to synthesize knowledge bases (KBs) of entities relevant to their domains. Currently, the implementation of WOO we describe is used by various Yahoo! properties such as Intonow, Yahoo! Local, Yahoo! Events and Yahoo! Search. This paper highlights: (i) challenges that arise in designing, building and operating a platform that handles multi-domain, multi-version, and multi-tenant disambiguation of web-scale knowledge bases (hundreds of millions of entities), (ii) the architecture and technical solutions we devised, and (iii) an evaluation on real-world production datasets.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献