Affiliation:
1. Tsinghua University
2. University of Massachusetts Amherst
3. University of Illinois at Urbana-Champaign
Abstract
In this article, we study the problem of Web user profiling, which is aimed at finding, extracting, and fusing the “semantic”-based user profile from the Web. Previously, Web user profiling was often undertaken by creating a list of keywords for the user, which is (sometimes even highly) insufficient for main applications. This article formalizes the profiling problem as several subtasks: profile extraction, profile integration, and user interest discovery. We propose a combination approach to deal with the profiling tasks. Specifically, we employ a classification model to identify relevant documents for a user from the Web and propose a Tree-Structured Conditional Random Fields (TCRF) to extract the profile information from the identified documents; we propose a unified probabilistic model to deal with the name ambiguity problem (several users with the same name) when integrating the profile information extracted from different sources; finally, we use a probabilistic topic model to model the extracted user profiles, and construct the user interest model. Experimental results on an online system show that the combination approach to different profiling tasks clearly outperforms several baseline methods. The extracted profiles have been applied to expert finding, an important application on the Web. Experiments show that the accuracy of expert finding can be improved (ranging from +6% to +26% in terms of MAP) by taking advantage of the profiles.
Funder
Chinese Young Faculty Research Fund
National Natural Science Foundation of China
Ministry of Science and Technology of the People's Republic of China
Publisher
Association for Computing Machinery (ACM)
Reference60 articles.
1. Automatic ontology-based knowledge extraction from Web documents
2. Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrieval. ACM Press New York NY. Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrieval . ACM Press New York NY.
3. Formal models for expert finding in enterprise corpora
4. A probabilistic framework for semi-supervised clustering
Cited by
97 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献