NPmatch: Latent Batch Effects Correction of Omics data by Nearest-Pair Matching-Reference-Cited by-同舟云学术

NPmatch: Latent Batch Effects Correction of Omics data by Nearest-Pair Matching

Published:2024-05-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Zito Antonino^ORCID,Martinelli Axel,Masiero Mauro,Akhmedov Murat,Kwee Ivo^ORCID

Abstract

AbstractMotivationBatch effects (BEs) are a predominant source of noise in omics data and often mask real biological signals. BEs remain common in existing datasets. Current methods for BE correction mostly rely on specific assumptions or complex models, and may not detect and adjust BEs adequately, impacting downstream analysis and discovery power. To address these challenges we developed NPmatch, a nearest-neighbor matching-based method that adjusts BEs satisfactorily and outperforms current methods in a wide range of datasets.ResultsWe assessed distinct metrics and graphical readouts, and compared our method to commonly used BE correction methods. NPmatch demonstrates overall superior performance in correcting for BEs while preserving biological differences than existing methods. Altogether, our method proves to be a valuable BE correction approach to maximize discovery in biomedical research, with applicability in clinical research where latent BEs are often dominant.Data availability and implementationNPmatch is freely available on Github (https://github.com/bigomics/NPmatch) and on Omics Playground (https://bigomics.ch/omics-playground). The datasets underlying this article are the following: GSE120099, GSE82177, GSE162760, GSE171343, GSE153380, GSE163214, GSE182440, GSE163857, GSE117970, GSE173078, GSE10846. All these datasets are publicly available and can be freely accessed on the Gene Expression Omnibus (GEO) repository.

Publisher

Cold Spring Harbor Laboratory

Reference35 articles.

1. Chromatin-based, in cis and in trans regulatory rewiring underpins distinct oncogenic transcriptomes in multiple myeloma;Nat Commun,2021

2. The use of bootstrapping when using propensity‐score matching without replacement: a simulation study

3. NCBI GEO: mining millions of expression profiles--database and tools

4. ELAVL4, splicing, and glutamatergic dysfunction precede neuron loss in MAPT mutation cerebral organoids

5. Human Tumor-Associated Macrophage and Monocyte Transcriptional Landscapes Reveal Cancer-Specific Reprogramming, Biomarkers, and Therapeutic Targets