Abstract
AbstractMotivationAnalysis of open chromatin regions across multiple samples from two or more distinct conditions can determine altered gene regulatory patterns associated with biological phenotypes and complex traits. The ATAC-seq assay allows for tractable genome-wide open chromatin profiling of large numbers of samples. Stable, broadly applicable genomic annotations of open chromatin regions are not available. Thus, most studies first identify open regions using peak calling methods for each sample independently. These are then heuristically combined to obtain a consensus peak set. Reconciling sample-specific peak resultspost hocfrom larger cohorts is particularly challenging, and informative spatial features specific to open chromatin signals are not leveraged effectively.ResultsWe propose a novel method,ROCCO, that determines consensus open chromatin regions across multiple samples simultaneously.ROCCOemploys robust summary statistics and solves a constrained optimization problem formulated to account for both enrichment and spatial dependence of open chromatin signal data. We show this formulation admits attractive theoretical and conceptual properties as well as superior empirical performance compared to current methodology.Availability and ImplementationSource code, documentation, and usage demos forROCCOare available on GitHub at:https://github.com/nolan-h-hamilton/ROCCO.ROCCOcan also be installed as a standalone binary utility usingpip/PyPI.Contactnolanh@email.unc.eduortsfurey@email.unc.edu.Supplementary InformationSupplementary material is available with this submission. Additional resources that may aid readers are available in theROCCOGitHubrepository.
Publisher
Cold Spring Harbor Laboratory