A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study

Author:

Chen YifeiORCID,Li XiaoyingORCID,Li AihuaORCID,Li YongjieORCID,Yang XuemeiORCID,Lin ZiluoORCID,Yu ShiruiORCID,Tang XiaoliORCID

Abstract

Background The normalization of institution names is of great importance for literature retrieval, statistics of academic achievements, and evaluation of the competitiveness of research institutions. Differences in authors’ writing habits and spelling mistakes lead to various names of institutions, which affects the analysis of publication data. With the development of deep learning models and the increasing maturity of natural language processing methods, training a deep learning–based institution name normalization model can increase the accuracy of institution name normalization at the semantic level. Objective This study aimed to train a deep learning–based model for institution name normalization based on the feature fusion of affiliation data from multisource literature, which would realize the normalization of institution name variants with the help of authority files and achieve a high specification accuracy after several rounds of training and optimization. Methods In this study, an institution name normalization–oriented model was trained based on bidirectional encoder representations from transformers (BERT) and other deep learning models, including the institution classification model, institutional hierarchical relation extraction model, and institution matching and merging model. The model was then trained to automatically learn institutional features by pretraining and fine-tuning, and institution names were extracted from the affiliation data of 3 databases to complete the normalization process: Dimensions, Web of Science, and Scopus. Results It was found that the trained model could achieve at least 3 functions. First, the model could identify the institution name that is consistent with the authority files and associate the name with the files through the unique institution ID. Second, it could identify the nonstandard institution name variants, such as singular forms, plural changes, and abbreviations, and update the authority files. Third, it could identify the unregistered institutions and add them to the authority files, so that when the institution appeared again, the model could identify and regard it as a registered institution. Moreover, the test results showed that the accuracy of the normalization model reached 93.79%, indicating the promising performance of the model for the normalization of institution names. Conclusions The deep learning–based institution name normalization model trained in this study exhibited high accuracy. Therefore, it could be widely applied in the evaluation of the competitiveness of research institutions, analysis of research fields of institutions, and construction of interinstitutional cooperation networks, among others, showing high application value.

Publisher

JMIR Publications Inc.

Subject

Health Informatics,Medicine (miscellaneous)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3