Visualization of Materials Science Topics in Publications of Institutional Repository using Natural Language Processing

Author:

Dieb Sae,Sodeyama Keitaro,Tanifuji Mikiko

Abstract

SAMURAI (NIMS 2022), a directory service of the National Institute for Materials Science (NIMS) researchers in Japan was launched in 2009 following the development of NIMS institutional repository (Tanifuji et al. 2019). The concept is to synchronize between profile information of researchers and their publications which are self-archived in the repository system. The SAMURAI was renewed in 2017 with interoperable functions with ORCID. SAMURAI supports various links to not only individual articles and patents, but also to databases such as KAKEN (Database of Grants-in-Aid for Scientific Research by NII). The service has yielded fully identified authors of journal articles from research members of NIMS by implementing a unique ResearcherID. Through this directory, NIMS is promoting materials research, supporting management of its researchers activities, and introducing NIMS researchers and their work to the public. In this work, we present an application to describe each researcher's output topics automatically from the archived research papers in the repository, by implementing materials science specific natural language processing developed in our study (Dieb et al. 2021) that visualizes the research trend of each SAMURAI researchers. The approach can maximize information absorbance for general audience and fully corresponds to open science policy. A list of publications' digital object identifiers (DOIs DOI 2022) for each researcher was constructed from his profile in SAMURAI. (In SAMURAI, the DOIs are stored in a PostgreSQL database). Using the DOI, recent publications were retrieved from NIMS text data mining platform (TDM-PF) in their XML format which were mainly available from 2003. Representative topic terms from their research publications that are related to materials science and engineering were extracted. We utilize term frequency analysis and automatic extraction for materials names to extract these necessary informative terms. Additionally, domain knowledge resources such as dictionaries were used. Data was preprocessed using noise reduction such as removing general English language stop words and physical units filtering. Such words do not have significance on their own. Word cloud approach was used for visualization (Fig. 1). This work brings us an opportunity to apply our NLP experience to mine information from research papers for public knowledge as a step towards data-driven materials science.

Publisher

Pensoft Publishers

Subject

General Medicine

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3