Author:
Akdemir Deniz,Knox Ron,Isidro-Sánchez Julio
Abstract
AbstractPrivate and public breeding programs, as well as companies and universities, have developed different genomics technologies which have resulted in the generation of unprecedented amounts of sequence data, which bring new challenges in terms of data management, query, and analysis. The magnitude and complexity of these datasets bring new challenges but also an opportunity to use the data available as a whole. Detailed phenotype data, combined with increasing amounts of genomic data, have an enormous potential to accelerate the identification of key traits to improve our understanding of quantitative genetics. Data harmonization enables cross-national and international comparative research, facilitating the extraction of new scientific knowledge. In this paper, we address the complex issue of combining high dimensional and unbalanced omics data. More specifically, we propose a covariance-based method for combining partial datasets in the genotype to phenotype spectrum. This method can be used to combine partially overlapping relationship/covariance matrices. Here, we show with applications that our approach might be advantageous to feature imputation based approaches; we demonstrate how this method can be used in genomic prediction using heterogenous marker data and also how to combine the data from multiple phenotypic experiments to make inferences about previously unobserved trait relationships. Our results demonstrate that it is possible to harmonize datasets to improve available information across gene-banks, data repositories or other data resources.Key messageSeveral covariance matrices obtained from independent experiments can be combined as long as these matrices are partially overlapping. We demonstrate the usefulness of this methodology with applications in combining data from several partially linked genotypic and phenotypic experiments.Author contribution statement–DA: Conception or design of the work, statistics, R programs, simulations, drafting the article, and critical revision of the article.–JIS: R programs, graphs, drafting the article, critical revision of the article.–RK: Critical revision of the article.
Publisher
Cold Spring Harbor Laboratory
Reference62 articles.
1. Locally Epistatic Genomic Relationship Matrices for Genomic Association and Prediction
2. Deniz Akdemir , Mohamed Somo , and Julio Isidro Sanchez . CovCombR: Combine Partial Covariance or Relationship Matrices, 2020. URL https://CRAN.R-project.org/package=CovCombR. R package version 1.0.
3. Linking the international wheat genome sequencing consortium bread wheat reference genome sequence to wheat genetic and phenomic data;Genome biology,2018
4. Theodore W. Anderson . An Introduction to Multivariate Statistical Analysis, 2nd Edition. Wiley, sep 1984a. ISBN 0471889873. URL https://www.xarg.org/ref/a/0471889873/.
5. TW Anderson . An Introduction to Multivariate. Wiley & Sons, 1984b.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献