Abstract
AbstractBackgroundIndian cattle known asdesicattle, renowned for their adaptability to harsh environments and diverse phenotypic traits, represent a valuable genetic resource. While reference genome assemblies have been instrumental in advancing cattle genomics, they often fail to capture the full spectrum of genetic variation present within diverse populations. To address this limitation, we aimed to construct a pangenome fordesicattle by identifying and characterizing Non-Reference Novel Sequences (NRNS).FindingsWe sequenced 68desicattle genomes representing seven distinct breeds, generating 48.35 billion short reads. A PanGenome Analysis (PanGA) pipeline was developed in Bash scripts to process these data to identify NRNS missing in the reference genome. A total of 13,065 NRNS with a cumulative length of ∼41 Mbp were identified that exhibited substantial variation across the population. These NRNS were found to be exclusive to Indiandesicattle, matching only 4.1% with the Chinese indicine pangenome. However, a significant proportion (∼40%) of NRNS displayed ancestral origins within the Bos genus. These sequences were enriched in genic regions, suggesting functional roles, and were associated with quantitative trait loci (QTLs), particularly for milk production. Compared to a single reference genome, the pangenome approach significantly enhanced read mapping accuracy, reduced spurious SNP calls, and facilitated the discovery of novel genetic variants.ConclusionsThis study has successfully established a within-species cattle pangenome specifically focused ondesicattle breeds from India. Our findings highlight the importance of pangenome based analyses for understanding the complex genetic architecture ofdesicattle.
Publisher
Cold Spring Harbor Laboratory