Author:
Tao Yanyun,Zhang Yuzhen,Jiang Bin
Abstract
Abstract
Background
Vitamin K antagonist (warfarin) is the most classical and widely used oral anticoagulant with assuring anticoagulant effect, wide clinical indications and low price. Warfarin dosage requirements of different patients vary largely. For warfarin daily dosage prediction, the data imbalance in dataset leads to inaccurate prediction on the patients of rare genotype, who usually have large stable dosage requirement. To balance the dataset of patients treated with warfarin and improve the predictive accuracy, an appropriate partition of majority and minority groups, together with an oversampling method, is required.
Method
To solve the data-imbalance problem mentioned above, we developed a clustering-based oversampling technique denoted as DBCSMOTE, which combines density-based spatial clustering of application with noise (DBCSCAN) and synthetic minority oversampling technique (SMOTE). DBCSMOTE automatically finds the minority groups by acquiring the association between samples in terms of the clinical features/genotypes and the warfarin dosage, and creates an extended dataset by adding the new synthetic samples of majority and minority groups. Meanwhile, two ensemble models, boosted regression tree (BRT) and random forest (RF), which are built on the extended dataset generateed by DBCSMOTE, accomplish the task of warfarin daily dosage prediction.
Results
DBCSMOTE and the comparison methods were tested on the datasets derived from our Hospital and International Warfarin Pharmacogenetics Consortium (IWPC). As the results, DBCSMOTE-BRT obtained the highest R-squared (R2) of 0.424 and the smallest mean squared error (mse) of 1.08. In terms of the percentage of patients whose predicted dose of warfarin is within 20% of the actual stable therapeutic dose (20%-p), DBCSMOTE-BRT can achieve the largest value of 47.8% among predictive models. The more important thing is that DBCSMOTE saved about 68% computational time to achieve the same or better performance than the Evolutionary SMOTE, which was the best oversampling method in warfarin dose prediction by far. Meanwhile, in warfarin dose prediction, it is discovered that DBCSMOTE is more effective in integrating BRT than RF for warfarin dose prediction.
Conclusion
Our finding is that the genotypes, CYP2C9 and VKORC1, no doubt contribute to the predictive accuracy. It was also discovered left atrium diameter, glutamic pyruvic transaminase and serum creatinine included in the model actually improved the predictive accuracy; When congestive heart failure, diabetes mellitus and valve replacement were absent in DBCSMOTE-BRT/RF, the predictive accuracy of DBCSMOTE-BRT/RF decreased. The oversampling ratio and number of minority clusters have a large impact on the effect of oversampling. According to our test, the predictive accuracy was high when the number of minority clusters was 6 ~ 8. The oversampling ratio for small minority clusters should be large (> 1.2) and for large minority clusters should be small (< 0.2). If the dataset becomes larger, the DBCSMOTE would be re-optimized and its BRT/RF model should be re-trained. DBCSMOTE-BRT/RF outperformed the current commonly-used tool called Warfarindosing. As compared to Evolutionary SMOTE-BRT and RF models, DBCSMOTE-BRT and RF models take only a small computational time to achieve the same or higher performance in many cases. In terms of predictive accuracy, RF is not as good as BRT. However, RF still has a powerful ability in generating a highly accurate model as the dataset increases; the software “WarfarinSeer v2.0” is a test version, which packed DBCSMOTE-BRT/RF. It could be a convenient tool for clinical application in warfarin treatment.
Publisher
Springer Science and Business Media LLC
Subject
Genetics(clinical),Genetics
Reference30 articles.
1. Kirchhof P, Benussi S, Kotecha D, et al. 2016 ESC Guidelines for the management of atrial fibrillation developed in collaboration with EACTS. Europace. 2016;18(11):1609:1678.
2. Valgimigli M, Bueno H, Byrne AR, et al. ESC focused update on dual antiplatelet therapy in coronary artery disease developed in collaboration with EACTS: the task force for dual antiplatelet therapy in coronary artery disease of the European Society of Cardiology (ESC) and of the European Association for Cardio-Thoracic Surgery (EACTS). Eur Heart J, Aug. 2017;26:2017.
3. Johnson JA, Caudle KE, Gong L, et al. Clinical Pharmacogenetics Implementation Consortium (CPIC) Guideline for Pharmacogenetics-Guided Warfarin Dosing: 2017 Update. Clin Pharmacol Ther. 2017;102(3):397:404.
4. Gage BF, Eby C, Milligan PE, et al. Use of pharmacogenetics and clinical factors to predict the maintenance dose of warfarin. Thromb Haemost. 2004;91(1):87–94.
5. Fung E, Patsopoulos NA, Belknap SM, et al. Effect of Genetic Variants, Especially CYP2C9 and VKORC1, on the Pharmacology of Warfarin. Semin Thromb Hemost. 2012;38(8):893–904.
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献