Statistical Machine-Learning Methods for Genomic Prediction Using the SKM Library

Author:

Montesinos López Osval1,Mosqueda González Brandon2,Montesinos López Abelardo3,Crossa José456ORCID

Affiliation:

1. Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico

2. Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Mexico City 07738, Mexico

3. Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico

4. International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Estado de Mexico, Mexico

5. Colegio de Postgraduados, Montecillo 56230, Estado de Mexico, Mexico

6. Centre for Crop & Food Innovation, Food Futures Institute, Murdoch University, Murdoch 6150, Australia

Abstract

Genomic selection (GS) is revolutionizing plant breeding. However, because it is a predictive methodology, a basic understanding of statistical machine-learning methods is necessary for its successful implementation. This methodology uses a reference population that contains both the phenotypic and genotypic information of genotypes to train a statistical machine-learning method. After optimization, this method is used to make predictions of candidate lines for which only genotypic information is available. However, due to a lack of time and appropriate training, it is difficult for breeders and scientists of related fields to learn all the fundamentals of prediction algorithms. With smart or highly automated software, it is possible for these professionals to appropriately implement any state-of-the-art statistical machine-learning method for its collected data without the need for an exhaustive understanding of statistical machine-learning methods and programing. For this reason, we introduce state-of-the-art statistical machine-learning methods using the Sparse Kernel Methods (SKM) R library, with complete guidelines on how to implement seven statistical machine-learning methods that are available in this library for genomic prediction (random forest, Bayesian models, support vector machine, gradient boosted machine, generalized linear models, partial least squares, feed-forward artificial neural networks). This guide includes details of the functions required to implement each of the methods, as well as others for easily implementing different tuning strategies, cross-validation strategies, and metrics to evaluate the prediction performance and different summary functions that compute it. A toy dataset illustrates how to implement statistical machine-learning methods and facilitate their use by professionals who do not possess a strong background in machine learning and programing.

Funder

Bill & Melinda Gates Foundation

USAID projects

CIMMYT CRP

Foundation for Research Levy on Agricultural Products

Agricultural Agreement Research Fund

International Wheat Yield Partnership (IWYP) Hub Project

Heat and Drought Wheat Improvement Consortium

Publisher

MDPI AG

Subject

Genetics (clinical),Genetics

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3