Abstract
AbstractIdentification of histone modification from datasets that contain high-throughput sequencing data is difficult. Although multiple methods have been developed to identify histone modification, most of these methods are not specific for histone modification but are general methods that aim to identify protein binding to the genome. In this study, tensor decomposition (TD) and principal component analysis (PCA)-based unsupervised feature extraction with optimized standard deviation were successfully applied to gene expression and DNA methylation. The proposed method was used to identify histone modification. Histone modification along the genome is binned within the region of lengthL. Considering principal components (PCs) or singular value vectors (SVVs) that TD or PCA attributes to samples, we can select PCs or SVVs attributed to regions. The selected PCs and SVVs further attributeP-values to regions, and adjusted P-values are used to select regions. The proposed method identified various histone modifications successfully and outperformed various state-of-the-art methods. This method is expected to serve as ade factostandard method to identify histone modification.
Publisher
Cold Spring Harbor Laboratory