Statistical Machine-Learning Methods for Genomic Prediction Using the SKM Library
Author:
Montesinos López Osval1, Mosqueda González Brandon2, Montesinos López Abelardo3, Crossa José456ORCID
Affiliation:
1. Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico 2. Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Mexico City 07738, Mexico 3. Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico 4. International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Estado de Mexico, Mexico 5. Colegio de Postgraduados, Montecillo 56230, Estado de Mexico, Mexico 6. Centre for Crop & Food Innovation, Food Futures Institute, Murdoch University, Murdoch 6150, Australia
Abstract
Genomic selection (GS) is revolutionizing plant breeding. However, because it is a predictive methodology, a basic understanding of statistical machine-learning methods is necessary for its successful implementation. This methodology uses a reference population that contains both the phenotypic and genotypic information of genotypes to train a statistical machine-learning method. After optimization, this method is used to make predictions of candidate lines for which only genotypic information is available. However, due to a lack of time and appropriate training, it is difficult for breeders and scientists of related fields to learn all the fundamentals of prediction algorithms. With smart or highly automated software, it is possible for these professionals to appropriately implement any state-of-the-art statistical machine-learning method for its collected data without the need for an exhaustive understanding of statistical machine-learning methods and programing. For this reason, we introduce state-of-the-art statistical machine-learning methods using the Sparse Kernel Methods (SKM) R library, with complete guidelines on how to implement seven statistical machine-learning methods that are available in this library for genomic prediction (random forest, Bayesian models, support vector machine, gradient boosted machine, generalized linear models, partial least squares, feed-forward artificial neural networks). This guide includes details of the functions required to implement each of the methods, as well as others for easily implementing different tuning strategies, cross-validation strategies, and metrics to evaluate the prediction performance and different summary functions that compute it. A toy dataset illustrates how to implement statistical machine-learning methods and facilitate their use by professionals who do not possess a strong background in machine learning and programing.
Funder
Bill & Melinda Gates Foundation USAID projects CIMMYT CRP Foundation for Research Levy on Agricultural Products Agricultural Agreement Research Fund International Wheat Yield Partnership (IWYP) Hub Project Heat and Drought Wheat Improvement Consortium
Subject
Genetics (clinical),Genetics
Reference36 articles.
1. Random forests;Breiman;Mach. Learn.,2001 2. Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., and Welling, M. (2017). Modeling Relational Data with Graph Convolutional Networks. arXiv. 3. Yin, C., Xiang, J., Zhang, H., Wang, J., Yin, Z., and Kim, J.-U. (2015, January 21–23). A New SVM Method for Short Text Classification Based on Semi-Supervised Learning. Proceedings of the 2015 4th International Conference on Advanced Information Technology and Sensor Application (AITS), Harbin, China. 4. Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-M. (2020). Computer Vision—ECCV 2020, Springer International Publishing. 5. Dong, F., Wang, H., Li, L., Guo, Y., Bissyandé, T.F., Liu, T., Xu, G., and Klein, J. (2018, January 4–9). FraudDroid: Automated ad fraud detection for android apps. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Lake Buena Vista, FL, USA.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|