Abstract
Purpose
The Youth Risk Behavior Survey (YRBS) among high school students includes standard questions about sexual identity and sex of sexual contacts, but these questions are not consistently included in every state that conducts the survey. This study aimed to develop and apply a method to predict state-level proportions of high school students identifying as lesbian, gay, or bisexual (LGB) or reporting any same-sex sexual contacts in those states that did not include these questions in their 2017 YRBS.
Methods
We used state-level high school YRBS data from 2013, 2015, and 2017. We defined two primary outcomes relating to self-reported LGB identity and reported same-sex sexual contacts. We developed machine learning models to predict the two outcomes based on other YRBS variables, and comparing different modeling approaches. We used a leave-one-out cross-validation approach and report results from best-performing models.
Results
Modern ensemble models outperformed traditional linear models at predicting state-level proportions for the two outcomes, and we identified prediction methods that performed well across different years and prediction tasks. Predicted proportions of respondents reporting LGB identity in states that did not include direct measurement ranged between 9.4% and 12.9%. Predicted proportions of respondents reporting any same-sex contacts, where not directly observed, ranged between 7.0% and 10.4%.
Conclusion
Comparable population estimates of sexual minority adolescents can raise awareness among state policy makers and the public about what proportion of youth may be exposed to disparate health risks and outcomes associated with sexual minority status. This information can help decision makers in public health and education agencies design, implement and evaluate community and school interventions to improve the health of LGB youth.
Funder
Centers for Disease Control and Prevention
Publisher
Public Library of Science (PLoS)