Abstract
AbstractThe East African Rift Lakes, namely Lake Malawi, Victoria, and Tanganyika, host a remarkable diversity of cichlid fishes, representing one of nature’s most striking vertebrate radiations. Despite rich phenotypic diversity, single nucleotide polymorphism (SNP)-based sequencing studies have revealed little sequence divergence between cichlids, with 0.1 to 0.25% pairwise divergence within Lake Malawi. These studies were based on aligning short reads to a single linear reference genome, which ignores the contribution of larger scale structural variants (SVs). To complement existing SNP-based studies, we adopted a pangenomic approach by constructing a multiassembly graph of haplochromine cichlids in Lake Malawi. We produced six new long read genome assemblies, alongside two publicly available ones, to span most of the major eco-morphological clades in the lake. This approach not only identifies longer SVs, but also visually represents complex and nested variation. Strikingly, the SV landscape is dominated by large insertions, many exclusive to individual assemblies. From a pangenomic perspective, we observed an exceptional amount of extra sequence, totaling up to 33.1% additional bases with respect to a single cichlid genome. Approximately 4.73 to 9.86% of the cichlid assemblies were estimated to be interspecies structural variation, suggesting substantial genomic diversity underappreciated in previous SNP-based studies. While coding regions remain highly conserved, our analysis uncovers a significant contribution of SVs from transposable element (TE) insertions, especially DNA, LINE, and LTR transposons. These findings underscore the intricate interplay of evolutionary forces shaping cichlid genome diversity, including both small nucleotide mutations and large TE-derived sequence alterations.
Publisher
Cold Spring Harbor Laboratory