Comparing research trends through author-provided keywords with machine extracted terms: A ML algorithm approach using publications data on neurological disorders
-
Published:2023-05-16
Issue:1
Volume:3
Page:
-
ISSN:2709-3158
-
Container-title:Iberoamerican Journal of Science Measurement and Communication
-
language:
-
Short-container-title:Iberoamerican Journal of Science Measurement and Communication
Author:
Tiwari Priya,Chaudhary Saloni,Majhi Debasis,Mukherjee Bhaskar
Abstract
Objective. This study aimed to identify the primary research areas, countries, and organizational involvement in publications on neurological disorders through an analysis of human-assigned keywords. These results were then compared with unsupervised and machine-algorithm-based extracted terms from the title and abstract of the publications to gain knowledge about deficiencies of both techniques. This has enabled us to understand how far machine-derived terms through titles and abstracts can be a substitute for human-assigned keywords of scientific research articles.
Design/Methodology/Approach. While significant research areas on neurological disorders were identified from the author-provided keywords of downloaded publications of Web of Science and PubMed, these results were compared by the terms extracted from titles and abstracts through unsupervised based models like VOSviewer and machine-algorithm-based techniques like YAKE and CounterVectorizer.
Results/Discussion. We observed that the post-covid-19 era witnessed more research on various neurological disorders, but authors still chose more generic terms in the keyword list than specific ones. The unsupervised extraction tool, like VOSviewer, identified many other extraneous and insignificant terms along with significant ones. However, our self-developed machine learning algorithm using CountVectorizer and YAKE provided precise results subject to adding more stop-words in the dictionary of the stop-word list of the NLTK tool kit.
Conclusion. We observed that although author provided keywords play a vital role as they are assigned in a broader sense by the author to increase readability, these concept terms lacked specificity for in-depth analysis. We suggested that the ML algorithm being more compatible with unstructured data was a valid alternative to the author-generated keywords for more accurate results.
Originality/Value. To our knowledge, this is the first-ever study that compared the results of author-provided keywords with machine-extracted terms with real datasets, which may be an essential lead in the machine learning domain. Replicating these techniques with large datasets from different fields may be a valuable knowledge resource for experts and stakeholders.
Reference20 articles.
1. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2020). YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509, 257–289. doi: 10.1016/j.ins.2019.09.013 2. Cheng, Q., Wang, J., Lu, W., Huang, Y., & Bu, Y. (2020). Keyword-citation-keyword network: A new perspective of discipline knowledge structure analysis. Scientometrics, 124(3), 1923–1943. doi: 10.1007/s11192-020-03576-5 3. Duvvuru, A., Radhakrishnan, S., More, D., Kamarthi, S., & Sultornsanee, S. (2013). Analyzing Structural & Temporal Characteristics of Keyword System in Academic Research Articles. Procedia Computer Science, 20, 439–445. doi: 10.1016/j.procs.2013.09.300 4. Graham, E. L., Clark, J. R., Orban, Z. S., Lim, P. H., Szymanski, A. L., Taylor, C., … Koralnik, I. J. (2021). Persistent neurologic symptoms and cognitive dysfunction in non-hospitalized Covid-19 “long haulers.” Annals of Clinical and Translational Neurology, 8(5), 1073–1085. doi: 10.1002/acn3.51350 5. Huang, T.-Y., & Zhao, B. (2019). Measuring popularity of ecological topics in a temporal dynamical knowledge network. PLOS ONE, 14(1), e0208370. doi: 10.1371/journal.pone.0208370
Cited by
23 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|