An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat-Reference-Cited by-同舟云学术

An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat

Published:2019-10-23 Issue:2 Volume:109 Page:251-277
ISSN:0885-6125
Container-title:Machine Learning
language:en
Short-container-title:Mach Learn

Author:

Grinberg Nastasiya F.^ORCID,Orhobor Oghenejokpeme I.,King Ross D.

Abstract

AbstractIn phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.

Funder

Biotechnology and Biological Sciences Research Council

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

http://link.springer.com/content/pdf/10.1007/s10994-019-05848-5.pdf

Reference93 articles.

1. Alexandrov, N., Tai, S., Wang, W., Mansueto, L., Palis, K., Fuentes, R. R., et al. (2015). Snp-seek database of SNPs derived from 3000 rice genomes. Nucleic Acids Research, 43(D1), D1023–D1027.

2. Ando, R. K., & Tong, Z. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853.

3. Armstead, I., Donnison, I., Aubry, S., Harper, J., Hörtensteiner, S., James, C., et al. (2007). Cross-species identification of Mendel’s I locus. Science, 315(5808), 73.

4. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300.

5. Bloom, J. S., Ehrenreich, I. M., Loo, W. T., Lite, T.-L. V. o, & Kruglyak, L. (2013). Finding the sources of missing heritability in a yeast cross. Nature, 494(7436), 234–237.

Cited by 94 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Different applications of machine learning approaches in materials science and engineering: Comprehensive review;Engineering Applications of Artificial Intelligence;2024-09

2. HASCH - A high-throughput amplicon-based SNP-platform for medicinal cannabis and industrial hemp genotyping applications;BMC Genomics;2024-08-29

3. Genomics‐based plant disease resistance prediction using machine learning;Plant Pathology;2024-08-29

4. Analyzing Medicago spp. seed morphology using GWAS and machine learning;Scientific Reports;2024-07-30

5. A practical introduction to holo-omics;Cell Reports Methods;2024-07