Abstract
AbstractRepetitive DNA sequences can form non-canonical structures such as H-DNA which is an intramolecular triplex DNA structure. The new Telomere-to-Telomere (T2T) genome assembly for the human genome has eliminated gaps, enabling the examination of highly repetitive regions including centromeric and pericentromeric repeats and ribosomal DNA arrays. This gapless assembly allows for the examination of the distribution of H-DNA sequences in parts of the human genome that were not previously annotated. We find that H-DNA appears once every 30,000 bps in the human genome. Its distribution is highly inhomogeneous with H-DNA motif hotspots being detectable in acrocentric chromosomes. Ribosomal DNA arrays in acrocentric chromosomes are the genomic element with the highest H-DNA enrichment, with 13.22% of total H-DNA motifs being found in ribosomal DNA arrays, representing a 42.65-fold enrichment over what would be expected by chance. Across the acrocentric chromosomes we report that 55.87% of all H-DNA motifs found in these chromosomes are in rDNA array loci. The H-DNA motifs are primarily found in the intergenic spacer regions of the ribosomal DNA arrays, generating repeated clusters. We also discover that binding sites for PRDM9, a protein that regulates the formation of double-strand breaks and determines the meiotic recombination hotspots in humans and most mammals, are over 5-fold enriched for H-DNA motifs. Finally, we provide evidence that our findings are consistent in other non-human great ape genomes. We conclude that ribosomal DNA arrays are the most enriched genomic loci for H-DNA sequences in human and other great ape genomes.
Publisher
Cold Spring Harbor Laboratory