Interpretable Online Network Dictionary Learning for Inferring Long-Range Chromatin Interactions

Author:

Rana Vishal,Peng Jianhao,Pan ChaoORCID,Lyu Hanbaek,Cheng AlbertORCID,Kim Minji,Milenkovic Olgica

Abstract

AbstractDictionary learning (DL), implemented via matrix factorization (MF), is commonly used in computational biology to tackle ubiquitous clustering problems. The method is favored due to its conceptual simplicity and relatively low computational complexity. However, DL algorithms produce results that lack interpretability in terms of real biological data. Additionally, they are not optimized for graph-structured data and hence often fail to handle them in a scalable manner.In order to address these limitations, we propose a novel DL algorithm calledonline convex network dictionary learning(online cvxNDL). Unlike classical DL algorithms, online cvxNDL is implemented via MF and designed to handle extremely large datasets by virtue of its online nature. Importantly, it enables the interpretation of dictionary elements, which serve as cluster representatives, through convex combinations of real measurements. Moreover, the algorithm can be applied to data with a network structure by incorporating specialized subnetwork sampling techniques.To demonstrate the utility of our approach, we apply cvxNDL on 3D-genome RNAPII ChIA-Drop data with the goal of identifying important long-range interaction patterns (long-range dictionary elements). ChIA-Drop probes higher-order interactions, and produces data in the form of hypergraphs whose nodes represent genomic fragments. The hyperedges represent observed physical contacts. Our hypergraph model analysis has the objective of creating an interpretable dictionary of long-range interaction patterns that accurately represent global chromatin physical contact maps. Through the use of dictionary information, one can also associate the contact maps with RNA transcripts and infer cellular functions.To accomplish the task at hand, we focus on RNAPII-enriched ChIA-Drop data fromDrosophila MelanogasterS2 cell lines. Our results offer two key insights. First, we demonstrate that online cvxNDL retains the accuracy of classical DL (MF) methods while simultaneously ensuring unique interpretability and scalability. Second, we identify distinct collections of proximal and distal interaction patterns involving chromatin elements shared by related processes across different chromosomes, as well as patterns unique to specific chromosomes. To associate the dictionary elements with biological properties of the corresponding chromatin regions, we employ Gene Ontology (GO) enrichment analysis and perform multiple RNA coexpression studies.Availability and ImplementationThe code and test datasets are available at:\https://github.com/rana95vishal/chromatin_DL/Author summaryWe introduce a novel method for dictionary learning termedonline convex Network Dictionary Learning(online cvxNDL). The method operates in an online manner and utilizes representative subnetworks of a network dataset as dictionary elements. A key feature of online cvxNDL is its ability to work with graph-structured data and generate dictionary elements that represent convex combinations of real data points, thus ensuring interpretability.Online cvxNDL is used to investigate long-range chromatin interactions in S2 cell lines ofDrosophila Melanogasterobtained through RNAPII ChIA-Drop measurements represented as hypergraphs. The results show that dictionary elements can accurately and efficiently reconstruct the original interactions present in the data, even when subjected to convexity constraints. To shed light on the biological relevance of the identified dictionaries, we perform Gene Ontology enrichment and RNA-seq coexpression analyses. These studies uncover multiple long-range interaction patterns that are chromosome-specific. Furthermore, the findings affirm the significance of convex dictionaries in representing TADs cross-validated by imaging methods (such as 3-color FISH (fluorescence in situ hybridization)).

Publisher

Cold Spring Harbor Laboratory

Reference37 articles.

1. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries

2. Sparse representation for color image restoration;IEEE Transactions on image processing,2007

3. Non-negative matrix factorization with α-divergence;Pattern Recognition Letters,2008

4. Multitask sparse nonnegative matrix factorization for joint spectral–spatial hyperspectral imagery denoising;IEEE Transactions on Geoscience and Remote Sensing,2014

5. Community detection algorithm based on nonnegative matrix factorization and pairwise constraints;Physica A: Statistical Mechanics and its Applications,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3