Efficient and Effective Academic Expert Finding on Heterogeneous Graphs through ( <i>k</i> , 𝒫)-Core based Embedding-Reference-Cited by-同舟云学术

Efficient and Effective Academic Expert Finding on Heterogeneous Graphs through ( k , 𝒫)-Core based Embedding

Published:2023-03-22 Issue:6 Volume:17 Page:1-35
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Wang Yuxiang¹^ORCID,Liu Jun¹^ORCID,Xu Xiaoliang¹^ORCID,Ke Xiangyu²^ORCID,Wu Tianxing³^ORCID,Gou Xiaoxuan¹^ORCID

Affiliation:

1. Hangzhou Dianzi University, Zhejiang Province, China

2. Zhejiang University, Hangzhou Zhejiang Province, China

3. Southeast University, Nanjing, Jiangsu Province, China

Abstract

Expert finding is crucial for a wealth of applications in both academia and industry. Given a user query and trove of academic papers, expert finding aims at retrieving the most relevant experts for the query, from the academic papers. Existing studies focus on embedding-based solutions that consider academic papers’ textual semantic similarities to a query via document representation and extract the top- n experts from the most similar papers. Beyond implicit textual semantics, however, papers’ explicit relationships (e.g., co-authorship) in a heterogeneous graph (e.g., DBLP) are critical for expert finding, because they help improve the representation quality. Despite their importance, the explicit relationships of papers generally have been ignored in the literature. In this article, we study expert finding on heterogeneous graphs by considering both the explicit relationships and implicit textual semantics of papers in one model. Specifically, we define the cohesive ( k , 𝒫)-core community of papers w.r.t. a meta-path 𝒫 (i.e., relationship) and propose a ( k , 𝒫)-core based document embedding model to enhance the representation quality. Based on this, we design a proximity graph-based index (PG-Index) of papers and present a threshold algorithm (TA)-based method to efficiently extract top- n experts from papers returned by PG-Index. We further optimize our approach in two ways: (1) we boost effectiveness by considering the ( k , 𝒫)-core community of experts and the diversity of experts’ research interests, to achieve high-quality expert representation from paper representation; and (2) we streamline expert finding, going from “extract top- n experts from top- m ( m> n ) semantically similar papers” to “directly return top- n experts”. The process of returning a large number of top- m papers as intermediate data is avoided, thereby improving the efficiency. Extensive experiments using real-world datasets demonstrate our approach’s superiority.

Funder

National NSF of China

Primary R&D Plan of Zhejiang

Center-initiated Research Project of Zhejiang Lab

Fundamental Research Funds for the Provincial Universities of Zhejiang

Project for the Doctor of Entrepreneurship and Innovation in Jiangsu Province

Fundamental Research Funds for the Central Universities, and ZhiShan Young Scholar Program of Southeast University

Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3578365

Reference73 articles.

1. 2021. HuggingFace. Retrieved from https://github.com/huggingface/transformers. Accessed May 12 2021.

2. Fawaz Alarfaj, Udo Kruschwitz, David Hunter, and Chris Fox. 2012. Finding the right supervisor: Expert-finding in a university domain. In Proceedings of the NAACL. 1–6.

3. Zipf Distribution of U.S. Firm Sizes

4. Krisztian Balog, Leif Azzopardi, and Maarten De Rijke. 2006. Formal models for expert finding in enterprise corpora. In Proceedings of the SIGIR. 43–50.

5. Expertise Retrieval

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Scalable Community Search over Large-scale Graphs based on Graph Transformer;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10

2. Routing-Guided Learned Product Quantization for Graph-Based Approximate Nearest Neighbor Search;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

3. Scalable Community Search with Accuracy Guarantee on Attributed Graphs;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

4. Efficient and effective (k,P)-core-based community search over attributed heterogeneous information networks;Information Sciences;2024-03

5. Random Walk-Based Community Key-Members Search Over Large Graphs;2023