Combining phenotypic and genomic data to improve prediction of binary traits-Reference-Cited by-同舟云学术

Combining phenotypic and genomic data to improve prediction of binary traits

Published:2022-09-01 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Jarquin Diego,Roy Arkaprava,Clarke Bertrand,Ghosal Subhashis

Abstract

AbstractPlant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here ‘main traits’) of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or ‘phenotypes’) that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the phenotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypes due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.

Publisher

Cold Spring Harbor Laboratory

Reference30 articles.

1. R. Burden and D. Faires . Numerical Analysis, 9th Ed. Boston, MA: Brooks/Cole, 2011.

2. A multiple-phenotype imputation method for genetic studies

3. J. Dennis and R. Schnabel . Numerical methods for unconstrained optimization and nonlinear equations. Philadelphia, PA: SIAM, 1996.

4. Z. Desta and R. Ortiz . Genomic selection: genome-wide prediction in plant improvement Trends in Plant Science 19 (2020): 592–601.

5. B. Diers , J. Specht , K. Rainey , P. Cregan , Q. Song , V. Ramasubramanian , G. Graef , R. Nelson , W. Schapaugh , D. Wang , G. Shannon , L. McHale , S. Kantartzi , A. Xavier , R. Mian , R. Stupar , J. Michno , A. Charles , W. Goettel , R. Ward , C. Fox , A. Lipka , D. Hyten , T. Cary , and W. Beavis . Genetic architecture of soybean yield and agronomic traits G3: Genes, Genomes, Genetics 8: 3367–3375.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrating and optimizing genomic, weather, and secondary trait data for multiclass classification;Frontiers in Genetics;2023-03-29