Abstract
Purpose
The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989 to 2021 by applying Latent Dirichlet Allocation to the abstracts. Dominant topics in the corpus of text, the posterior probability of different terms in the topics and the publication proportions of the topics were discussed in the article.
Design/methodology/approach
Abstracts and other details of the studied articles are collected from WoS database by the authors. Data preprocessing is performed before the analysis. “ldatuning” from the R package is applied after preprocessing of text for deciding subjects in light of factual elements. Twenty topics are decided to extract as latent topics through four metrics methods.
Findings
It is observed that medical science, agriculture, research and development and chemistry-related topics dominate the subject categories as a whole. “Irrigation” and “mortality and health care” have a significant growth in the publication proportion from 2019 to 2021. For the most occurring latent topics, it is seen that terms like “activity” and “acid” carry higher posterior probability.
Practical implications
Topic models permit us to rapidly and efficiently address higher perspective inquiries without human mediation and are also helpful in information retrieval and document clustering. The unique feature of this study has highlighted how the growth of the universe of knowledge for a specific country can be studied using the LDA topic model.
Originality/value
This study will create an incentive for text analysis and information retrieval areas of research. The results of this paper gave an understanding of the writing development of the Sri Lankan authors in different subject spaces and over the period. Trends and intensity of publications from the Sri Lankan authors on different latent topics help to trace the interests and mostly practiced areas in different domains.
Subject
Library and Information Sciences
Reference31 articles.
1. LDA based topic modeling of journal abstracts,2015
2. On finding the natural number of topics with Latent Dirichlet Allocation: some observations;Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),2010
3. Smart literature review: a practical topic modelling approach to exploratory literature review;Journal of Big Data,2019
4. Latent Dirichlet Allocation;Journal of Machine Learning Research,2003
5. Text mining of open-ended questions in self-assessment of university teachers: an LDA topic modelling approach;IEEE Access,2020