Author:
Li Runchuan,Shen Shengya,Chen Gang,Xie Tiantian,Ji Shasha,Zhou Bing,Wang Zongmin
Abstract
Abstract
Background: In the field of diagnostic CVD, the predecessors used a large amount of data with no missing two-category data, and obtained good results. However, in the process of electronic input of historical data, a large number of data attribute values are missing, and there are multiple levels of disease risk. Goal: On the data set of imbalance and a large number of missing values, this paper focuses on the five levels of cardiovascular disease. Methods: A new prediction model of Adaboost+RF is constructed by using the information gain ratio to analyze the feature contribution degree of the data set. The performance of this model is evaluated with Precision, Recall, F-measure and ROC Area values. Results: The results show that the four key indicators of the Adaboost+RF model on five-categories unbalanced datasets in Precision, Recall, F1 and AUC values, which are 40.9%, 49.3%, 41.4% and 71.6%. Conclusion: The experiment results demonstrate that the four key indicators of the Adaboost+RF model on five-category unbalanced missing datasets are better than other machine learning algorithms.
Reference21 articles.
1. Prevention of cardiovascular diseases;Prochaska,2018
2. China cardiovascular diseases report 2015: a summary;Chen;J Geriatr Cardiol,2017
3. Revolutionizing medicine in the 21st century through systems approaches;Hood;Biotech J,2012
4. Integration from proteins to organs: the Physiome Project;Hunter;NAT REV MOL CELL BIO,2003
5. Can machine-learning improve cardiovascular risk prediction using routine clinical data?;Weng;PloS one,2017
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献