Abstract
AbstractRNA-seq and its 5’-enrichment-based methods for prokaryotes have enabled the base-exact identification of transcription starting sites (TSSs) and have improved gene expression analysis. Computational methods analyze this experimental data to identify TSSs and classify them based on proximal annotated genes. While some TSSs cannot be classified at all (orphan TSSs), other TSSs are found on the reverse strand of known genes (antisense TSSs), but are not associated with the direct transcription of any known gene. Here, we introduceTSS-Captur, a novel pipeline, that uses computational approaches to characterize genomic regions starting from experimentally confirmed, but unclassified TSSs. By analyzing experimental TSS data,TSS-Capturcharacterizes unclassified signals, hence complementing prokaryotic genome annotation tools and enhancing the bacterial transcriptome understanding.TSS-Capturclassifies extracted transcripts into coding or non-coding genes and predicts for each putative transcript its transcription termination site. For non-coding genes, the secondary structure is computed. Furthermore, putative promoter regions are analyzed to identify enriched motifs. An interactive report allows a seamless data exploration. We validatedTSS-Capturwith aCampylobacter jejunidataset and characterized unlabeled non-coding RNAs inStreptomyces coelicolor. Besides its usage over the command-line,TSS-Capturis available as a web-application to enhance its user accessibility and explorative capabilities.
Publisher
Cold Spring Harbor Laboratory