Affiliation:
1. Florida Agricultural and Mechanical University
Abstract
Abstract
Plant breeding is gaining importance as a sustainable tool to address the challenges posed by a growing global population and enhance food security. Advanced high-throughput omics technologies are utilized to accelerate crop improvement and develop resilient varieties with higher yield performance. These technologies generate vast genetic data, which can be exploited to manipulate key plant characteristics for crop improvement. The integration of big data and AI in plant breeding has the potential to revolutionize the field and increase food security. By using branching data (phenotype) of 1918 soybean accessions and 42k SNP polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN regression, ANN regression, Autoencoders regression, and MLP regression) and seven machine learning models (e.g., SVR, XGBoost regression, Random Forest regression, LightGBM regression, GPS regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R2 (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, ANN, and Autoencoder outperformed other models and could obtain a better prediction accuracy if they were used for phenotype prediction. To support the evaluation of deep learning methods, feature importance and GO enrichment analyses were conducted. After comprehensively comparing four feature importance algorithms, there was no significant difference among the feature importance ranking score among these four algorithms, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The genes identified by the SVR model plus SHAP importance combination clearly grouped into three clusters on the soybean whole genome. Our GO enrichment results also confirmed the prediction accuracy of this methods combination. The results of this study offer valuable insights for AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.
Publisher
Research Square Platform LLC
Reference53 articles.
1. Human population growth and the demographic transition;Bongaarts J;Philosophical Transactions of the Royal Society B: Biological Sciences,2009
2. Dimensions of global population projections: what do we know about future population trends and structures?;Lutz W;Philosophical Transactions of the Royal Society B: Biological Sciences,2010
3. Searchinger T, Waite R, Hanson C, Ranganathan J, Dumas P, Matthews E, Klirs C. Creating a sustainable food future: A menu of solutions to feed nearly 10 billion people by 2050. 2019, Final report.
4. How does climate change alter agricultural strategies to support food security?;Thornton PK;Intl Food Policy Res Inst,2014
5. Interactions between climate change and land use change on biodiversity: attribution problems, risks, and opportunities;Oliver TH;Wiley Interdisciplinary Reviews: Climate Change,2014