BATTER: Accurate Prediction of Rho-dependent and Rho-independent Transcription Terminators in Metagenomes

Author:

Jin Yunfan,Ma Hongli,Xu Zhenjiang ZechORCID,Lu Zhi JohnORCID

Abstract

AbstractTranscription terminators mark the 3’ ends of both coding and noncoding transcripts in bacteria and play crucial roles in gene regulations (such as controlling the stoichiometry of gene expression and conditionally switching off gene expression by inducing premature termination). Recently developed experimental 3’ end mapping techniques greatly improved the current understanding of bacteria transcription termination, but these methods cannot detect transcripts that are unexpressed in the limited experimental conditions and cannot utilize the vast amount of information embedded in the rapidly growing metagenome data. Computational approaches can relieve these problems, but the development of suchin-silicomethods lags behind the experimental techniques. Previous computational tools are limited to predicting rho-independent terminators (RITs) and are primarily optimized for a few model species. The prediction of rho-dependent terminators (RDTs) which lack obvious consensus sequence patterns, and terminators in diverse non-model bacteria species still presents significant challenges.To address these challenges, we introduce BATTER (BActeriaTranscriptThree primeEndRecognizer), a computational tool for predicting both RITs and RDTs in diverse bacteria species that allows metagenome-scale scanning. We developed a data augmentation pipeline by leveraging available high throughput 3’ end mapping data in 17 bacteria species, and a large collection of 42,905 species-level representative bacteria genomes. Taking advantage context sensitive natural language processing techniques, we trained a BERT-CRF model, using both local features and context information for tagging terminators in genomic sequences.Systematic evaluations demonstrated our model’s superiority: at a false positive rate of 0.1/kilobase, BATTER achieves a sensitivity of 0.924 for predictingE. coliRDTs; and a sensitivity of 0.756 for predicting terminators on term-seq dataset of oral microbiome, outperforming the best existing tool by 0.153. Based on BATTER’s predictions, we systematically analyzed the clade-specific properties of bacteria terminators. The practical utility of BATTER was exemplified through two case studies: identifying functional transcripts from metatranscriptome data and discovering candidate noncoding RNAs related to antimicrobial resistance. As far as we know, BATTER is the first tool simultaneously predicting RITs and RDTs in diverse bacteria species. BATTER is available athttps://github.com/lulab/BATTER.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3