Author:
Kumar M. Senthil,Slud Eric V.,Okrah Kwame,Hicks Stephanie C.,Hannenhalli Sridhar,Bravo Héctor Corrada
Abstract
AbstractCount data derived from high-throughput DNA sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.
Publisher
Cold Spring Harbor Laboratory
Reference54 articles.
1. Next-generation DNA sequencing
2. SEQUENCES, SEQUENCES, AND SEQUENCES
3. DNA sequencing: bench to bedside and beyond
4. A decade’s perspective on DNA sequencing technology
5. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2016). R Foundation for Statistical Computing. https://www.R-project.org/
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献