Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits-Reference-Cited by-同舟云学术

Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits

Published:2019-11-01 Issue:11 Volume:9 Page:3691-3702
ISSN:2160-1836
Container-title:G3 Genes|Genomes|Genetics
language:en
Short-container-title:

Author:

Azodi Christina B¹^ORCID,Bolger Emily²,McCarren Andrew³^ORCID,Roantree Mark³,de los Campos Gustavo⁴⁵⁶,Shiu Shin-Han¹⁷^ORCID

Affiliation:

1. Department of Plant Biology

2. Department of Mathematics, Moravian College, Bethlehem, PA

3. Insight Centre for Data Analytics, School of Computing, Dublin City University, Dublin 9, Ireland

4. Department of Epidemiology & Biostatistics

5. Department of Statistics & Probability

6. Institute for Quantitative Health Science and Engineering, and

7. Department of Computational, Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, 48824

Abstract

Abstract The usefulness of genomic prediction in crop and livestock breeding programs has prompted efforts to develop new and improved genomic prediction algorithms, such as artificial neural networks and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and six non-linear algorithms. First, we found that hyperparameter selection was necessary for all non-linear algorithms and that feature selection prior to model training was critical for artificial neural networks when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple algorithms (i.e., ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits. Although artificial neural networks did not perform best for any trait, we identified strategies (i.e., feature selection, seeded starting weights) that boosted their performance to near the level of other algorithms. Our results highlight the importance of algorithm selection for the prediction of trait values.

Publisher

Oxford University Press (OUP)

Subject

Genetics (clinical),Genetics,Molecular Biology

Link

http://academic.oup.com/g3journal/article-pdf/9/11/3691/37177268/g3journal3691.pdf

Reference70 articles.

1. Deep learning for computational biology.;Angermueller;Mol. Syst. Biol.,2016

2. Genomic selection accuracies within and between environments and small breeding groups in white spruce.;Beaulieu;BMC Genomics,2014

3. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser.;Benjamini;B Stat Methodol,1995

4. Application of high-dimensional feature selection: evaluation for genomic prediction in man

Cited by 119 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Applications of Artificial Intelligence for Heat Stress Management in Ruminant Livestock;Sensors;2024-09-11

2. Explainable artificial intelligence for genotype-to-phenotype prediction in plant breeding: a case study with a dataset from an almond germplasm collection;Frontiers in Plant Science;2024-09-09

3. Comparison of machine learning methods for genomic prediction of selected Arabidopsis thaliana traits;PLOS ONE;2024-08-28

4. Improving genomic prediction of rhizomania resistance in sugar beet (Beta vulgaris L.) by implementing epistatic effects and feature selection;F1000Research;2024-08-28

5. Prediction of plant complex traits via integration of multi-omics data;Nature Communications;2024-08-10