Abstract
AbstractBackgroundEarly diagnosis of dengue fever is important for individual treatment and monitoring disease prevalence in the population. To assist diagnosis, previous studies have proposed classification models to detect dengue from symptoms and clinical measurements. However, there has been little exploration of whether existing models can be used to make predictions for new populations.MethodsWe trained logistic regression models on five publicly available dengue datasets from previous studies, using three explanatory variables identified as important in prior work: age, white blood cell count, and platelet count. These five datasets were collected at different times in different locations, with a variety of disease rates and patient ages. A model was trained on each dataset, and predictive performance was evaluated on both the original (training) dataset, and the other (test) datasets from different studies.ResultsIn-sample area under the receiver operating characteristic curve (AUC) values for the logistic regression models ranged from 0.74 to 0.89, while out-of-sample AUCs ranged from 0.55 to 0.89. Matching age ranges in training/test datasets increased AUC values and balanced the sensitivity and specificity. Adjusting the predicted probabilities to account for differences in dengue prevalence improved calibration in 20/28 training-test pairs.ConclusionsThe in-sample performance of the logistic regression model was consistent with previous dengue classifiers, suggesting the chosen model is a good choice in a variety of settings and has decent overall performance. However, adjustments are required to make predictions on new datasets. Practitioners can use existing dengue classifiers in new settings but should be careful with different patient ages and disease rates.Author summaryDengue fever is an acute mosquito-borne infection with a substantial and growing disease burden. Early diagnosis of dengue fever is important for treatment and disease monitoring, but gold-standard tests are not always readily available. To supplement diagnostic tests, previous studies have investigated the use of classifiers trained on common diagnostic measurements such as symptoms, blood work, and demographic variables. However, comparisons of existing methods are limited, and in particular there has been little exploration of whether existing models can be used to make predictions for new populations. Generalizability of existing models to new settings could save the substantial time and effort required to collect new data and fit a model, but this generalizability may be limited by differences between populations. In this study, we assess performance of logistic regression models on five publicly available dengue datasets from previous studies, using three explanatory variables identified as important in prior work: age, white blood cell count, and platelet count. Our results show that it can be possible for practitioners to use existing models in new settings, but care is needed to account for differences in patient demographics and dengue prevalence.
Publisher
Cold Spring Harbor Laboratory