Author:
Novielli Pierfrancesco,Romano Donato,Pavan Stefano,Losciale Pasquale,Stellacci Anna Maria,Diacono Domenico,Bellotti Roberto,Tangaro Sabina
Abstract
BackgroundAdvances in DNA sequencing revolutionized plant genomics and significantly contributed to the study of genetic diversity. However, predicting phenotypes from genomic data remains a challenge, particularly in the context of plant breeding. Despite significant progress, accurately predicting phenotypes from high-dimensional genomic data remains a challenge, particularly in identifying the key genetic factors influencing these predictions. This study aims to bridge this gap by integrating explainable artificial intelligence (XAI) techniques with advanced machine learning models. This approach is intended to enhance both the predictive accuracy and interpretability of genotype-to-phenotype models, thereby improving their reliability and supporting more informed breeding decisions.ResultsThis study compares several ML methods for genotype-to-phenotype prediction, using data available from an almond germplasm collection. After preprocessing and feature selection, regression models are employed to predict almond shelling fraction. Best predictions were obtained by the Random Forest method (correlation = 0.727 ± 0.020, an R2 = 0.511 ± 0.025, and an RMSE = 7.746 ± 0.199). Notably, the application of the SHAP (SHapley Additive exPlanations) values algorithm to explain the results highlighted several genomic regions associated with the trait, including one, having the highest feature importance, located in a gene potentially involved in seed development.ConclusionsEmploying explainable artificial intelligence algorithms enhances model interpretability, identifying genetic polymorphisms associated with the shelling percentage. These findings underscore XAI’s efficacy in predicting phenotypic traits from genomic data, highlighting its significance in optimizing crop production for sustainable agriculture.
Reference57 articles.
1. Using deep learning to predict plant growth and yield in greenhouse environments;Alhnaity,2019
2. The curse (s) of dimensionality;Altman;Nat. Methods,2018
3. Benchmarking parametric and machine learning models for genomic prediction of complex traits;Azodi;G3: Genes Genomes Genet.,2019
4. Epigenetics: connecting environment and genotype to phenotype and disease;Barros;J. Dental Res.,2009
5. Random forests;Breiman;Mach. Learn.,2001