A novel normalization and differential abundance test framework for microbiome data-Reference-Cited by-同舟云学术

A novel normalization and differential abundance test framework for microbiome data

Published:2020-04-20 Issue:13 Volume:36 Page:3959-3965
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Ma Yuanjing¹,Luo Yuan²,Jiang Hongmei¹

Affiliation:

1. Department of Statistics, Northwestern University, Evanston, IL 60208, USA

2. Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA

Abstract

Abstract Motivation Microbial communities have been proved to have close relationship with many diseases. The identification of differentially abundant microbial species is clinically meaningful for finding disease-related pathogenic or probiotic bacteria. However, certain characteristics of microbiome data have hurdled the accuracy and effectiveness of differential abundance analysis. The abundances or counts of microbiome species are usually on different scales and exhibit zero-inflation and over-dispersion. Normalization is a crucial step before the differential abundance test. However, existing normalization methods typically try to adjust counts on different scales to a common scale by constructing size factors with the assumption that count distributions across samples are equivalent up to a certain percentile. These methods often yield undesirable results when differentially abundant species are of low to medium abundance level. For differential abundance analysis, existing methods often use a single distribution to model the dispersion of species which lacks flexibility to catch a single species’ distinctiveness. These methods tend to detect a lot of false positives and often lack of power when the effect size is small. Results We develop a novel framework for differential abundance analysis on sparse high-dimensional marker gene microbiome data. Our methodology relies on a novel network-based normalization technique and a two-stage zero-inflated mixture count regression model (RioNorm2). Our normalization method aims to find a group of relatively invariant microbiome species across samples and conditions in order to construct the size factor. Another contribution of the paper is that our testing approach can take under-sampling and over-dispersion into consideration by separating microbiome species into two groups and model them separately. Through comprehensive simulation studies, the performance of our method is consistently powerful and robust across different settings with different sample size, library size and effect size. We also demonstrate the effectiveness of our novel framework using a published dataset of metastatic melanoma and find biological insights from the results. Availability and implementation The R package ‘RioNorm2’ can be installed from Github athttps://github.com/yuanjing-ma/RioNorm2. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Northwestern University Information Technology

National Science Foundation

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa255/33372660/btaa255.pdf

Reference27 articles.

1. Differential expression analysis for sequence count data;Anders;Genome Biol,2010

2. Statistical design and analysis of RNA sequencing data;Auer;Genetics,2010

3. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments;Bullard;Bioinformatics,2010

4. An omnibus test for differential distribution analysis of microbiome sequencing data;Chen;Bioinformatics,2018

5. The pervasive effects of an antibiotic on the human gut microbiota, as revealed by deep 16S rRNA sequencing;Dethlefsen;PLoS Biol,2008

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Associations between wastewater gut microbiome and community obesity rates: Potential microbial biomarkers for surveillance;Soil & Environmental Health;2024-05

2. MK-BMC: a Multi-Kernel framework with Boosted distance metrics for Microbiome data for Classification;Bioinformatics;2024-01-01

3. Simple and flexible sign and rank-based methods for testing for differential abundance in microbiome studies;PLOS ONE;2023-09-26

4. Statistical normalization methods in microbiome data with application to microbiome cancer research;Gut Microbes;2023-08-25

5. Latent Dirichlet Allocation modeling of environmental microbiomes;PLOS Computational Biology;2023-06-08