Interpretable machine learning models for single-cell ChIP-seq imputation-Reference-Cited by-同舟云学术

Interpretable machine learning models for single-cell ChIP-seq imputation

Published:2019-12-20 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Albrecht Steffen^ORCID,Andreani Tommaso^ORCID,Andrade-Navarro Miguel A.^ORCID,Fontaine Jean-Fred^ORCID

Abstract

AbstractMotivationSingle-cell ChIP-seq (scChIP-seq) analysis is challenging due to data sparsity. High degree of data sparsity in biological high-throughput single-cell data is generally handled with imputation methods that complete the data, but specific methods for scChIP-seq are lacking. We present SIMPA, a scChIP-seq data imputation method leveraging predictive information within bulk data from ENCODE to impute missing protein-DNA interacting regions of target histone marks or transcription factors.ResultsImputations using machine learning models trained for each single cell, each target, and each genomic region accurately preserve cell type clustering and improve pathway-related gene identification on real data. Results on simulated data show that 100 input genomic regions are already enough to train single-cell specific models for the imputation of thousands of undetected regions. Furthermore, SIMPA enables the interpretation of machine learning models by revealing interaction sites of a given single cell that are most important for the imputation model trained for a specific genomic region. The corresponding feature importance values derived from promoter-interaction profiles of H3K4me3, an activating histone mark, highly correlate with co-expression of genes that are present within the cell-type specific pathways. An imputation method that allows the interpretation of the underlying models facilitates users to gain an even deeper understanding of individual cells and, consequently, of sparse scChIP-seq datasets.Availability and implementationOur interpretable imputation algorithm was implemented in Python and is available at https://github.com/salbrec/SIMPA

Publisher

Cold Spring Harbor Laboratory

Reference30 articles.

1. scRMD: Imputation for single cell RNA-seq data via robust matrix decomposition;Bioinformatics,2020

2. An integrated encyclopedia of DNA elements in the human genome

3. Davies, D. L. and Bouldin, D. W. (1979) “A cluster separation measure,” IEEE transactions on pattern analysis and machine intelligence, (2), pp. 224–227.

4. PREDICTD parallel epigenomics data imputation with cloud-based tensor decomposition;Nature communications,2018

5. netNMF-sc: leveraging gene–gene interactions for imputation and dimensionality reduction in single-cell expression analysis

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep learning applications in single-cell genomics and transcriptomics data analysis;Biomedicine & Pharmacotherapy;2023-09

2. Deep Learning Applications in Single-Cell Omics Data Analysis;2021-11-27