Author:
Öztornaci R. Onur,Syed Hamzah,Morris Andrew P.,Taşdelen Bahar
Abstract
AbstractMachine learning (ML) methods for uncovering single nucleotide polymorphisms (SNPs) in genome-wide association study (GWAS) data that can be used to predict disease outcomes are becoming increasingly used in genetic research. Two issues with the use of ML models are finding the correct method for dealing with imbalanced data and data training. This article compares three ML models to identify SNPs that predict type 2 diabetes (T2D) status using the Support vector machine SMOTE (SVM SMOTE), The Adaptive Synthetic Sampling Approach (ADASYN), Random under sampling (RUS) on GWAS data from elderly male participants (165 cases and 951 controls) from the Uppsala Longitudinal Study of Adult Men (ULSAM). It was also applied to SNPs selected by the SMOTE, SVM SMOTE, ADASYN, and RUS clumping method. The analysis was performed using three different ML models: (i) support vector machine (SVM), (ii) multilayer perceptron (MLP) and (iii) random forests (RF). The accuracy of the case-control classification was compared between these three methods. The best classification algorithm was a combination of MLP and SMOTE (97% accuracy). Both RF and SVM achieved good accuracy results of over 90%. Overall, methods used against unbalanced data, all three ML algorithms were found to improve prediction accuracy.
Publisher
Cold Spring Harbor Laboratory
Reference119 articles.
1. Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records;BMC medical informatics and decision making,2013
2. Alpaydin, E. (2020). Introduction to machine learning: MIT press.
3. Prediction of repeated-dose intravenous ketamine response in major depressive disorder using the GWAS-based machine learning approach;Journal of Psychiatric Research,2021
4. Evolving diverse ensembles using genetic programming for classification with unbalanced data;IEEE Transactions on Evolutionary Computation,2012
5. Random forests;Machine learning,2001
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献