A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets

Author:

Montesinos-López Osval A.1,Pulido-Carrillo Cristian Daniel1,Montesinos-López Abelardo2,Larios Trejo Jesús Antonio3,Montesinos-López José Cricelio4ORCID,Agbona Afolabi56,Crossa José78910

Affiliation:

1. Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico

2. Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico

3. Facultad de Ciencias de la Educación, Universidad de Colima, Colima 28040, Mexico

4. Department of Public Health Sciences, University of California Davis, Davis, CA 95616, USA

5. International Institute of Tropical Agriculture (IITA), Ibadan 200113, Nigeria

6. Molecular & Environmental Plant Sciences, Texas A&M University, College Station, TX 77843, USA

7. International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Texcoco 52640, Mexico

8. Louisiana State University, Baton Rouge, LA 70803, USA

9. Distinguished Scientist Fellowship Program and Department of Statistics and Operations Research, King Saud University, Riyah 11451, Saudi Arabia

10. Colegio de Postgraduados, Montecillos 56230, Mexico

Abstract

Genomic selection (GS) is changing plant breeding by significantly reducing the resources needed for phenotyping. However, its accuracy can be compromised by mismatches between training and testing sets, which impact efficiency when the predictive model does not adequately reflect the genetic and environmental conditions of the target population. To address this challenge, this study introduces a straightforward method using binary-Lasso regression to estimate β coefficients. In this approach, the response variable assigns 1 to testing set inputs and 0 to training set inputs. Subsequently, Lasso, Ridge, and Elastic Net regression models use the inverse of these β coefficients (in absolute values) as weights during training (WLasso, WRidge, and WElastic Net). This weighting method gives less importance to features that discriminate more between training and testing sets. The effectiveness of this method is evaluated across six datasets, demonstrating consistent improvements in terms of the normalized root mean square error. Importantly, the model’s implementation is facilitated using the glmnet library, which supports straightforward integration for weighting β coefficients.

Funder

Bill and Melinda Gates Foundation

Publisher

MDPI AG

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3