A Combination Approach to Web User Profiling

Author:

Tang Jie1,Yao Limin2,Zhang Duo3,Zhang Jing1

Affiliation:

1. Tsinghua University

2. University of Massachusetts Amherst

3. University of Illinois at Urbana-Champaign

Abstract

In this article, we study the problem of Web user profiling, which is aimed at finding, extracting, and fusing the “semantic”-based user profile from the Web. Previously, Web user profiling was often undertaken by creating a list of keywords for the user, which is (sometimes even highly) insufficient for main applications. This article formalizes the profiling problem as several subtasks: profile extraction, profile integration, and user interest discovery. We propose a combination approach to deal with the profiling tasks. Specifically, we employ a classification model to identify relevant documents for a user from the Web and propose a Tree-Structured Conditional Random Fields (TCRF) to extract the profile information from the identified documents; we propose a unified probabilistic model to deal with the name ambiguity problem (several users with the same name) when integrating the profile information extracted from different sources; finally, we use a probabilistic topic model to model the extracted user profiles, and construct the user interest model. Experimental results on an online system show that the combination approach to different profiling tasks clearly outperforms several baseline methods. The extracted profiles have been applied to expert finding, an important application on the Web. Experiments show that the accuracy of expert finding can be improved (ranging from +6% to +26% in terms of MAP) by taking advantage of the profiles.

Funder

Chinese Young Faculty Research Fund

National Natural Science Foundation of China

Ministry of Science and Technology of the People's Republic of China

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference60 articles.

1. Automatic ontology-based knowledge extraction from Web documents

2. Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrieval. ACM Press New York NY. Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrieval . ACM Press New York NY.

3. Formal models for expert finding in enterprise corpora

4. A probabilistic framework for semi-supervised clustering

Cited by 97 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. An enterprise portrait tag extraction method based on context embedding and knowledge distillation;Soft Computing;2024-07-18

2. Understanding user intent modeling for conversational recommender systems: a systematic literature review;User Modeling and User-Adapted Interaction;2024-06-06

3. Knowledge-driven profile dynamics;Artificial Intelligence;2024-06

4. Artificial Intelligence Algorithms for Expert Identification in Medical Domains: A Scoping Review;European Journal of Investigation in Health, Psychology and Education;2024-04-28

5. An Anchor Learning Approach for Citation Field Learning;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3