BACKGROUND
Background: Latent Dirichlet Allocation (LDA) is a tool for rapidly synthesising meaning from ‘big data’, but outputs can be sensitive to decisions made during the analytic pipeline. This review will focus on the complex analytical practices specific to LDA, which existing practical guides for conducting LDA have not addressed.
OBJECTIVE
Objectives: This scoping review will use key analytical steps (data selection, data pre-processing, and data analysis) as a framework to understand the methodological approaches being used in psychology research utilising LDA.
METHODS
Methods: Four psychology and health databases were searched. Studies were included if they used LDA to analyse written words and focussed on a psychological construct/issue. The data charting processes was constructed and employed based on common data selection, pre-processing, and data analysis steps.
RESULTS
Results: Forty-seven studies were included. These explored a range of research areas and most sourced their data from social media platforms. While some studies reported on pre-processing and data analytic steps taken, most studies did not provide sufficient detail for reproducibility. Furthermore, debate surrounding the necessity of certain pre-processing and data analysis steps is revealed.
CONCLUSIONS
Conclusions: Findings highlight the growing use of LDA in psychological science. However, there is a need to improve analytical reporting standards, and identify comprehensive and evidence based best practice recommendations. To work towards this, we have developed an LDA Preferred Reporting Checklist which will allow for consistent documentation of LDA analytic decisions, and reproducible research outcomes.