Author:
Heron Mark,Soeding Johannes
Abstract
1AbstractEukaryotic genomes are compacted into nucleosomes, 147-bp of DNA wrapped around histone proteins. Nucleosomes can hinder transcription factors and other DNA-binding proteins from accessing the genome. This competition at promoters and enhancers regulates gene expression. Therefore, a quantitative understanding of gene regulation requires the quantitative prediction of nucleosome binding affinities. However, little is known for certain about the sequence preference of nucleosomes.Here we develop an integrated model of nucleosome binding and genome-wide measurements thereof. Our model learns similar nucleosome sequence preferences from MNase-Seq and CC-Seq datasets.We find that modelling the positional uncertainty of MNase-Seq deconvolves the commonly described smooth 10-bp-periodic sequence preference into a position-specific pattern more closely resembling the pattern obtained from high-resolution CC-Seq data. By analysing the CC-Seq data we reveal the strong preference of A/T at +/− 3 bp from the dyad as an experimental bias. Our integrated model can separate this bias of CC-Seq from the true nucleosome binding preference.Our results show that nucleosomes have position-specific sequence preferences, which probably play an important role in their competition with transcription factors. Furthermore, our comparison of diverse datasets shows that the experimental biases have a similar strength as the signal of nucleosome-positioning measurements. Validating nucleosome models on experiments with similar biases overestimates their prediction quality of the true nucleosome binding.There are still many open questions about the sequence preference of nucleosomes and our approach will need to be extended to answer them. Only integrated models that combine the thermodynamics of nucleosome binding with experimental errors can deconvolve the two and learn the true preferences of nucleosomes.
Publisher
Cold Spring Harbor Laboratory