Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation-Reference-Cited by-同舟云学术

Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation

Published:2024-06-08 Issue: Volume: Page:
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Xiao Meng¹^ORCID,Wu Min²^ORCID,Qiao Ziyue³^ORCID,Fu Yanjie⁴^ORCID,Ning Zhiyuan⁵^ORCID,Du Yi⁵^ORCID,Zhou Yuanchun⁵^ORCID

Affiliation:

1. 1.Computer Network Information Center, Chinese Academy of Sciences, Beijing; 2.University of Chinese Academy of Sciences, Beijing, China

2. Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore

3. School of Computing and Information Technology, Great Bay University, China

4. Arizona State University, School of Computing and AI, United States, USA

5. Computer Network Information Center, Chinese Academy of Sciences; 2.University of Chinese Academy of Sciences, Beijing, China

Abstract

The objective of topic inference in research proposals aims to obtain the most suitable disciplinary division from the discipline system defined by a funding agency. The agency will subsequently find appropriate peer review experts from their database based on this division. Automated topic inference can reduce human errors caused by manual topic filling, bridge the knowledge gap between funding agencies and project applicants, and improve system efficiency. Existing methods focus on modeling this as a hierarchical multi-label classification problem, using generative models to iteratively infer the most appropriate topic information. However, these methods overlook the gap in scale between interdisciplinary research proposals and non-interdisciplinary ones, leading to an unjust phenomenon where the automated inference system categorizes interdisciplinary proposals as non-interdisciplinary, causing unfairness during the expert assignment. How can we address this data imbalance issue under a complex discipline system and hence resolve this unfairness? In this paper, we implement a topic label inference system based on a Transformer encoder-decoder architecture. Furthermore, we utilize interpolation techniques to create a series of pseudo-interdisciplinary proposals from non-interdisciplinary ones during training based on non-parametric indicators such as cross-topic probabilities and topic occurrence probabilities. This approach aims to reduce the bias of the system during model training. Finally, we conduct extensive experiments on a real-world dataset to verify the effectiveness of the proposed method. The experimental results demonstrate that our training strategy can significantly mitigate the unfairness generated in the topic inference task. To improve the reproducibility of our research, we have released accompanying code by Dropbox. 1 .

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3671149

Reference49 articles.

1. An improved algorithm for neural network classification of imbalanced training sets

2. Guillaume P Archambault, Yongyi Mao, Hongyu Guo, and Richong Zhang. 2019. Mixup as directional adversarial training. arXiv preprint arXiv:1906.06875 (2019).

3. David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems 32 (2019).

4. Enriching Word Vectors with Subword Information

5. Xunxin Cai, Meng Xiao, Zhiyuan Ning, and Yuanchun Zhou. 2023. Resolving the Imbalance Issue in Hierarchical Disciplinary Topic Inference via LLM-based Data Augmentation. 2023 IEEE International Conference on Data Mining Workshops (ICDMW) (2023).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automated taxonomy alignment via large language models: bridging the gap between knowledge domains;Scientometrics;2024-07-26