A Weighted Two-stage Sequence Alignment Framework to Identify DNA Motifs from ChIP-exo Data-Reference-Cited by-同舟云学术

A Weighted Two-stage Sequence Alignment Framework to Identify DNA Motifs from ChIP-exo Data

Published:2023-04-08 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Li Yang^ORCID,Wang Yizhong,Wang Cankun^ORCID,Fennell Anne^ORCID,Ma Anjun^ORCID,Jiang Jing,Liu Zhaoqian,Ma Qin^ORCID,Liu Bingqiang^ORCID

Abstract

ABSTRACTIdentifying precise transcription factor binding sites (TFBS) or regulatory DNA motifs plays a fundamental role in researching transcriptional regulatory mechanisms in cells and in helping construct regulatory networks. Current algorithms developed for motif searching focus on the analysis of ChIP-enriched peaks but are not able to integrate the ChIP signal in nucleotide resolution. We present a weighted two-stage alignment tool (TESA). Our framework implements an analysis workflow from experimental datasets to TFBS prediction results. It employs a binomial distribution model and graph searching model with ChIP-exonuclease (ChIP-exo) reads depth and sequence data. TESA can effectively measure the possibility for each position to be an actual TFBS in a given promoter sequence and predict statistically significant TFBS sequence segments. The algorithm substantially improves prediction accuracy and extends the scope of applicability of existing approaches. We apply the framework to a collection of 20 ChIP-exo datasets of E. coli from proChIPdb and evaluate the prediction performance through comparison with three existing programs. The performance evaluation against the compared programs indicates that TESA is more accurate for identifying regulatory motifs in prokaryotic genomes.

Publisher

Cold Spring Harbor Laboratory

Reference25 articles.

1. ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery

2. Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data

3. An algorithmic perspective of de novo cisregulatory motif finding based on ChIP-seq data;Briefings in Bioinformatics,2017

4. Deciphering epigenomic code for cell differentiation using deep learning

5. RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GMean—a semi-supervised GRU and K-mean model for predicting the TF binding site;Scientific Reports;2024-01-30

2. CEMIG: prediction of the cis-regulatory motif using the de Bruijn graph from ATAC-seq;Briefings in Bioinformatics;2023-11-22

3. CEMIG: Prediction of thecis-regulatory motif using the De Bruijn graph from ATAC-seq;2023-05-28