Affiliation:
1. Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
2. Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China
3. Institute of HIV/AIDS/STI Prevention and Control, Jiangsu Provincial Center for Diseases Control and Prevention, Nanjing, Jiangsu 210009, China
Abstract
Objective. To establish a machine learning model for identifying patients coinfected with hepatitis B virus (HBV) and human immunodeficiency virus (HIV) through two sexual transmission routes in Jiangsu, China. Methods. A total of 14197 HIV cases transmitted by homosexual and heterosexual routes were recruited. After data processing, 12469 cases (HIV and HBV, 1033; HIV, 11436) were left for further analysis, including 7849 cases with homosexual transmission and 4620 cases with heterosexual transmission. Univariate logistic regression was used to select variables with significant
value and odds ratio for multivariable analysis. In homosexual transmission and heterosexual transmission groups, 10 and 6 variables were selected, respectively. For identifying HIV individuals coinfected with HBV, a machine learning model was constructed with four algorithms, including Decision Tree, Random Forest, AdaBoost with decision tree (AdaBoost), and extreme gradient boosting decision tree (XGBoost). The detective value of each variable was calculated using the optimal machine learning algorithm. Results. AdaBoost algorithm showed the highest efficiency in both transmission groups (homosexual transmission group:
,
,
,
, and
; heterosexual transmission group:
,
,
,
, and
). Calculated by AdaBoost algorithm, the detective value of PLA was the highest in homosexual transmission group, followed by CR, AST, HB, ALT, TBIL, leucocyte, age, marital status, and treatment condition; in the heterosexual transmission group, the detective value of PLA was the highest (consistent with the condition in the homosexual group), followed by ALT, AST, TBIL, leucocyte, and symptom severity. Conclusions. The univariate logistics regression combined with the AdaBoost algorithm could accurately screen the risk factors of HBV in HIV coinfection without invasive testing. Further studies are needed to evaluate the utility and feasibility of this model in various settings.
Funder
Priority Academic Program Development of Jiangsu Higher Education Institutions
Subject
General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献