Abstract
AbstractAtlantic and Pacific herring are sister species that diverged about 2 million years ago. Here we describe a genome comparison of the two species that reveal high genome-wide differentiation as expected for two distinct species but with islands of remarkably low genetic differentiation as measured by anFSTanalysis. However, this is not caused by low interspecies sequence divergence but an exceptionally high estimated intraspecies nucleotide diversity. These high diversity regions are not enriched for repeats but are highly enriched for immune trait-related genes. This enrichment includes classical immunity genes, such as immunoglobulin, T-cell receptor and major histocompatibility complex genes, but also a substantial number of genes with a role in the innate immune system. An analysis of long-read based assemblies from two Atlantic herring individuals revealed extensive copy number variation at these genomic regions, indicating that the elevated intraspecies nucleotide diversities was partially due to the cross-mapping of short reads. This study demonstrates that copy number expansion and variation is a characteristic feature of immune trait loci in herring and that this genetic diversity is likely to contribute to resistance to infectious diseases in extremely abundant species, such as the Atlantic and Pacific herring. Another important implication is that these loci are blind spots in classical genome-wide screens for genetic differentiation using short-read data, not only in herring, likely also in other species harboring qualitatively similar variation at immune trait loci.SignificanceThis study has revealed an extensive copy number variation and high nucleotide diversity at genes related to the immune system in Atlantic herring. Our analysis of previously published data in teleost species indicate that this is probably a widespread pattern among vertebrates. We also document that population genetic parameters estimated using short-read sequencing data are unreliable for these regions due to their complexity. They will also appear as blind spots in genome scans for regions of genetic differentiation based onFSTstatistics due to the very high within population nucleotide diversity. We show how long-read data can be used to decipher the gene organization and genetic diversity at these regions.
Publisher
Cold Spring Harbor Laboratory