Author:
Quinn Thomas P.,Erb Ionas,Richardson Mark F.,Crowley Tamsyn M.
Abstract
AbstractMotivationAlthough seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g., gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e., library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that renders invalid many conventional analyses, including distance measures, correlation coefficients, and multivariate statistical models.ResultsThe purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study.
Publisher
Cold Spring Harbor Laboratory
Reference76 articles.
1. The Statistical Analysis of Compositional Data;Journal of the Royal Statistical Society. Series B (Methodological),1982
2. J Aitchison . The Statistical Analysis of Compositional Data. Chapman & Hall, Ltd., London, UK, UK, 1986.
3. J. Aitchison . A concise guide to compositional data analysis. 2nd Compositional Data Analysis Workshop; Girona, Italy, 2003.
4. John Aitchison . The single principle of compositional data analysis, continuing fallacies, confusions and misunderstandings and some suggested remedies. Proceedings of CoDaWork’08, The 3rd Compositional Data Analysis Workshop; Girona, Spain, 2008.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献