PhyloScan: identification of transcription factor binding sites using cross-species evidence

Author:

Carmack C Steven,McCue Lee Ann,Newberg Lee A,Lawrence Charles E

Abstract

Abstract Background When transcription factor binding sites are known for a particular transcription factor, it is possible to construct a motif model that can be used to scan sequences for additional sites. However, few statistically significant sites are revealed when a transcription factor binding site motif model is used to scan a genome-scale database. Methods We have developed a scanning algorithm, PhyloScan, which combines evidence from matching sites found in orthologous data from several related species with evidence from multiple sites within an intergenic region, to better detect regulons. The orthologous sequence data may be multiply aligned, unaligned, or a combination of aligned and unaligned. In aligned data, PhyloScan statistically accounts for the phylogenetic dependence of the species contributing data to the alignment and, in unaligned data, the evidence for sites is combined assuming phylogenetic independence of the species. The statistical significance of the gene predictions is calculated directly, without employing training sets. Results In a test of our methodology on synthetic data modeled on seven Enterobacteriales, four Vibrionales, and three Pasteurellales species, PhyloScan produces better sensitivity and specificity than MONKEY, an advanced scanning approach that also searches a genome for transcription factor binding sites using phylogenetic information. The application of the algorithm to real sequence data from seven Enterobacteriales species identifies novel Crp and PurR transcription factor binding sites, thus providing several new potential sites for these transcription factors. These sites enable targeted experimental validation and thus further delineation of the Crp and PurR regulons in E. coli. Conclusion Better sensitivity and specificity can be achieved through a combination of (1) using mixed alignable and non-alignable sequence data and (2) combining evidence from multiple sites within an intergenic region.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computational Theory and Mathematics,Molecular Biology,Structural Biology

Reference49 articles.

1. Stormo GD: DNA Binding Sites: Representation and Discovery. Bioinformatics. 2000, 16 (1): 16-23.

2. Quandt K, Frech K, Karas H, Wingender E, Werner T: MatInd and MatInspector: New Fast and Versatile Tools for Detection of Consensus Matches in Nucleotide Sequence Data. Nucleic Acids Res. 1995, 23 (23): 4878-4884.

3. Hertz GZ, Hartzell GW, Stormo GD: Identification of Consensus Patterns in Unaligned DNA Sequences Known to be Functionally Related. Comput Appl Biosci. 1990, 6 (2): 81-92.

4. Chen QK, Hertz GZ, Stormo GD: MATRIX SEARCH 1.0: A Computer Program that Scans DNA Sequences for Transcriptional Elements using a Database of Weight Matrices. Comput Appl Biosci. 1995, 11 (5): 563-566.

5. Prestridge DS: SIGNAL SCAN 4.0: Additional Databases and Sequence Formats. Comput Appl Biosci. 1996, 12 (2): 157-160.

Cited by 22 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3