Novel approach for quantitative and qualitative authors research profiling using feature fusion and tree-based learning approach

Author:

Umer Muhammad1ORCID,Aljrees Turki2ORCID,Ullah Saleem1ORCID,Bashir Ali Kashif3ORCID

Affiliation:

1. Department of Computer Science, Khwaja Fareed University of Engineering & IT, Rahim Yar Khan, Punjab, Pakistan

2. Department of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin, Saudi Arabia

3. Department of Computing and Mathematics, The Manchester Metropolitan University, Manchester, United Kingdom

Abstract

Article citation creates a link between the cited and citing articles and is used as a basis for several parameters like author and journal impact factor, H-index, i10 index, etc., for scientific achievements. Citations also include self-citation which refers to article citation by the author himself. Self-citation is important to evaluate an author’s research profile and has gained popularity recently. Although different criteria are found in the literature regarding appropriate self-citation, self-citation does have a huge impact on a researcher’s scientific profile. This study carries out two cases in this regard. In case 1, the qualitative aspect of the author’s profile is analyzed using hand-crafted feature engineering techniques. The sentiments conveyed through citations are integral in assessing research quality, as they can signify appreciation, critique, or serve as a foundation for further research. Analyzing sentiments within in-text citations remains a formidable challenge, even with the utilization of automated sentiment annotations. For this purpose, this study employs machine learning models using term frequency (TF) and term frequency-inverse document frequency (TF-IDF). Random forest using TF with Synthetic Minority Oversampling Technique (SMOTE) achieved a 0.9727 score of accuracy. Case 2 deals with quantitative analysis and investigates direct and indirect self-citation. In this study, the top 2% of researchers in 2020 is considered as a baseline. For this purpose, the data of the top 25 Pakistani researchers are manually retrieved from this dataset, in addition to the citation information from the Web of Science (WoS). The self-citation is estimated using the proposed model and results are compared with those obtained from WoS. Experimental results show a substantial difference between the two, as the ratio of self-citation from the proposed approach is higher than WoS. It is observed that the citations from the WoS for authors are overstated. For a comprehensive evaluation of the researcher's profile, both direct and indirect self-citation must be included.

Funder

University of Hafr-Al Batin

Publisher

PeerJ

Subject

General Computer Science

Reference66 articles.

1. A macro study of self-citation;Aksnes;Scientometrics,2003

2. Citations, citation indicators, and research quality: an overview of basic concepts and theories;Aksnes;Sage Open,2019

3. Sentiment analysis of citations using sentence structure-based features;Athar,2011

4. Context-enhanced citation sentiment detection;Athar,2012

5. Comparison of self-citation patterns in wos and scopus databases based on national scientific production in slovenia (1996–2020);Budimir;Scientometrics,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3