Abstract
AbstractVariation within human genomes is distributed unevenly and variants show spatial clustering. DNA-replication related template switching is a poorly known mutational mechanism capable of causing major chromosomal rearrangements as well as creating short inverted sequence copies that appear as local mutation clusters in sequence comparisons. We reanalyzed haplotype-resolved genome assemblies representing 25 human populations and multinucleotide variants aggregated from 140,000 human sequencing experiments. We found local template switching to explain thousands of complex mutation clusters across the human genome, the loci segregating within and between populations with a small number appearing as de novo mutations. We developed computational tools for genotyping candidate template switch loci using short-read sequencing data and for identification of template switch events using both short-read data and genotype data. These tools will enable building a catalogue of affected loci and studying the cellular mechanisms behind template switching both in healthy organisms and in disease. Strikingly, we noticed that widely-used analysis pipelines for short-read sequencing data – capable of identifying single nucleotide changes – may miss TSM-origin inversions of tens of base pairs, potentially invalidating medical genetic studies searching for causative alleles behind genetic diseases.Author summaryMutations are not randomly distributed in genomes and they often appear as clusters of nearby changes. We earlier showed that a poorly known mechanism in DNA replication can create short inverted copies of nearby sequence and that these events then show as mutation clusters in sequence comparison. Using the latest DNA sequencing and variation data we show that the human genome contains thousands of mutation clusters consistent with this mechanism and that novel mutations are created at a significant rate. Strikingly we observed that widely used methods for processing DNA sequencing data may completely miss these mutations. This has significance e.g. in medical genetic studies aiming to identify mutations causing genetic diseases.
Publisher
Cold Spring Harbor Laboratory
Reference55 articles.
1. The Sequence of the Human Genome
2. Initial sequencing and analysis of the human genome
3. Nurk S , Koren S , Rhie A , Rautiainen M , Bzikadze AV , Mikheenko A , et al. The complete sequence of a human genome; 2021.
4. Aganezov S , Yan SM , Soto DC , Kirsche M , Zarate S , Avdeyev P , et al. A complete reference genome improves analysis of human genetic variation; 2021.
5. The mutational constraint spectrum quantified from variation in 141,456 humans