Abstract
Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.
Funder
National Human Genome Research Institute
genomic innovator award
H2020 European Research Council
jpcofund2 personalised medicine for neurodegenerative diseases project
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference27 articles.
1. Why transcription factor binding sites are ten nucleotides long;AJ Stewart;Genetics,2012
2. Modeling the specificity of protein—dna interactions;GD Stormo;Quantitative Biology,2013
3. Human enhancers are fragile and prone to deactivating mutations;S Li;Mol Bio Evol,2015
4. Negative selection maintains transcription factors binding motifs in human cancer;IE Vorontsov;BMC genomics,2016
5. Mutation hotspots at CTCF binding sites coupled to chromosomal instability in gastrointestinal cancers;YA Guo;Nature communications,2018
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献