Abstract
AbstractUnderstanding population structure is essential for conservation genetics, as it provides insights into population connectivity and supports the development of targeted strategies to preserve genetic diversity and adaptability. While Principal Component Analysis (PCA) is a common linear dimensionality reduction method in genomics, the utility of non-linear techniques like t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) for revealing population genetic structures has been largely investigated in humans and model organisms but less so in wild animals. Our study bridges this gap by applying UMAP and t-SNE, alongside PCA, to medium and low-coverage whole-genome sequencing data from the scimitar oryx, once extinct in the wild, and the Galápagos giant tortoises, facing various threats. By estimating genotype likelihoods from coverages as low as 0.5x, we demonstrate that UMAP and t-SNE outperform PCA in identifying genetic structure at reduced genomic coverage levels. This finding underscores the potential of these methods in conservation genomics, particularly when combined with cost-effective, low-coverage sequencing. We also provide detailed guidance on hyperparameter tuning and implementation, facilitating the broader application of these techniques in wildlife genetics research to enhance biodiversity conservation efforts.
Publisher
Cold Spring Harbor Laboratory